Copyright	Joshua Simmons 2017
License	BSD3
Maintainer	joshua.simmons@emptypath.com
Stability	unstable
Safe Haskell	Safe
Language	Haskell2010

Data.SuffixArray.Internal

Description

Internal implementation details, unstable and not to be relied upon for any reason.

Synopsis

Documentation

data Alpha a Source #

A character in a string (or set of strings) we're going to compute the suffix array of. Includes Sentinal markers for the end of strings.

Constructors

Sentinal Int	Used to mark the end of a string. The `Int` parameter is used to encode which string this is the end of, in cases where there are multiple.
Alpha a	An actual character in the string.

Instances

Eq a => Eq (Alpha a) Source #
Methods (==) :: Alpha a -> Alpha a -> Bool # (/=) :: Alpha a -> Alpha a -> Bool #
Ord a => Ord (Alpha a) Source #
Methods compare :: Alpha a -> Alpha a -> Ordering # (<) :: Alpha a -> Alpha a -> Bool # (<=) :: Alpha a -> Alpha a -> Bool # (>) :: Alpha a -> Alpha a -> Bool # (>=) :: Alpha a -> Alpha a -> Bool # max :: Alpha a -> Alpha a -> Alpha a # min :: Alpha a -> Alpha a -> Alpha a #
Show a => Show (Alpha a) Source #
Methods showsPrec :: Int -> Alpha a -> ShowS # show :: Alpha a -> String # showList :: [Alpha a] -> ShowS #

naive :: Ord a => [[a]] -> [Int] Source #

A naively implemented suffix array implementation which will be used for correctness checking and possibly to benchmark against. Shouldn't usually be used in production code, as it is quite slow in the worst case. In cases with few identical suffixes, it can actually perform quite well. See benchmarks for some details.

worst case O(n^2 lg n) time (where n is the sum of the string lengths + the number of strings)

naiveOne :: Ord a => [a] -> [Int] Source #

Convenience wrapper around naive for a single string.

worst case O(n^2 lg n) time (where n is the length of the string)

naiveLcp :: Ord a => [[a]] -> [Int] Source #

A naively implemented LCP implementation, used for correctness testing the actual algorithm.

The Longest Common Prefix list gives the longest common prefix of each suffix and the previous suffix, with the suffixes in lexicographic order.

worst case O(n^2 lg n) time (where n is the sum of the string lengths + the number of strings)

The LCP part is an extra O(n^2) in addition to the work of computing the suffix array in the first place.

naiveLcpOne :: Ord a => [a] -> [Int] Source #

Convenience wrapper around naiveLcp for a single string.

worst case O(n^2 lg n) time (where n is the length of the string)

The LCP part is an extra O(n^2) in addition to the work of computing the suffix array in the first place.

prepare :: [[a]] -> [Alpha a] Source #

Prepare a list of strings to compute the suffix array of them. Just wraps every character in Alpha and adds Sentinals to the end of each string, and concatenates it together.

prepareOne :: [a] -> [Alpha a] Source #

Convenience function to prepare a single string.

rank :: Ord a => [a] -> [Int] Source #

Take a sorted list of elements and replace each value with an Int such that any comparisons between elements in the original list would yield exactly the same result in the output list.

i.e.: let rs = rank xs in all [ (xs!!i) compare (xs!!j) == (rs!!i) compare (rs!!j) | let idx = [0 .. length xs - 1], i <- idx, j <- idx ]

suffixes :: [a] -> [[a]] Source #

Yields the non-empty suffixes of a list in order of decreasing length.

This differs from tails in that it does not include the empty list at the end.