Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
Data.Suffix
Description
Implementation
Suffix array construction uses the SAIS algorithm described by
- Ge Nong, Sen Zhang, and Wai Hong Chan, "Linear Suffix Array Construction by Almost Pure Induced-Sorting", 2009 Data Compression Conference, https://doi.org/10.1109/DCC.2009.42
LCP array construction uses the \(\varPhi\)-algorithm described by
- Juha Kärkkäinen, Giovanni Manzini, and Simon J. Puglisi "Permuted Longest-Common-Prefix Array", Annual Symposium on Combinatorial Pattern Matching, 2009, https://doi.org/10.1007/978-3-642-02441-2_17
The search algorithms used are described by
- Udi Manber and Gene Myers, "Suffix arrays: a new method for on-line string searches", First annual ACM-SIAM symposium on Discrete algorithms, 1990, pp. 319-327, https://dl.acm.org/doi/10.5555/320176.320218
Synopsis
- buildSuffixArray :: Intn i => Int -> Pull Int -> SuffixArray i
- newtype SuffixArray i = SuffixArray (PrimArray i)
- search :: (Ord a, Intn i) => Pull a -> SuffixArray i -> Pull a -> (Int, Int)
- buildLCPArray :: (Eq a, Intn i) => Pull a -> SuffixArray i -> LCPArray i
- newtype LCPArray i = LCPArray (PrimArray i)
- buildLRLCPArray :: Intn i => LCPArray i -> LRLCPArrays i
- data LRLCPArrays i = LRLCPArrays !(PrimArray i) !(PrimArray i)
- searchLRLCP :: (Ord a, Intn i) => Pull a -> SuffixArray i -> LRLCPArrays i -> Pull a -> (Int, Int)
- foldSuffixTree :: (Intn i, Monad m) => (Int -> m a) -> (Int -> m b) -> (Int -> b -> a -> m b) -> (Int -> b -> m a) -> SuffixArray i -> LCPArray i -> m a
- data Pull a = Pull !Int (Int -> a)
- pullFromByteString :: ByteString -> Pull Word8
- pullFromPrimArray :: Prim a => PrimArray a -> Pull a
- pullFromArray :: Array a -> Pull a
- pullFromArrayLike :: (arr -> Int) -> (arr -> Int -> a) -> arr -> Pull a
- class (Prim i, Integral i) => Intn i
Suffix array
Arguments
:: Intn i | |
=> Int | The alphabet size \(k\). |
-> Pull Int | Input sequence of length \(n\). Indexing is assumed to
be \(O(1)\). Elements must be in |
-> SuffixArray i | Output type |
\(O(n + k)\). Build a suffix array from a sequence.
On 64-bit systems, a SuffixArray Int32
requires half the memory as a
SuffixArray Int
.
newtype SuffixArray i Source #
Suffix array.
Constructors
SuffixArray (PrimArray i) |
Instances
(Show i, Prim i) => Show (SuffixArray i) Source # | |
Defined in Data.Suffix Methods showsPrec :: Int -> SuffixArray i -> ShowS # show :: SuffixArray i -> String # showList :: [SuffixArray i] -> ShowS # | |
NFData (SuffixArray i) Source # | |
Defined in Data.Suffix Methods rnf :: SuffixArray i -> () # | |
(Eq i, Prim i) => Eq (SuffixArray i) Source # | |
Defined in Data.Suffix Methods (==) :: SuffixArray i -> SuffixArray i -> Bool # (/=) :: SuffixArray i -> SuffixArray i -> Bool # | |
(Ord i, Prim i) => Ord (SuffixArray i) Source # | |
Defined in Data.Suffix Methods compare :: SuffixArray i -> SuffixArray i -> Ordering # (<) :: SuffixArray i -> SuffixArray i -> Bool # (<=) :: SuffixArray i -> SuffixArray i -> Bool # (>) :: SuffixArray i -> SuffixArray i -> Bool # (>=) :: SuffixArray i -> SuffixArray i -> Bool # max :: SuffixArray i -> SuffixArray i -> SuffixArray i # min :: SuffixArray i -> SuffixArray i -> SuffixArray i # |
Arguments
:: (Ord a, Intn i) | |
=> Pull a | Sequence of length \(n\). Indexing is assumed to
be \(O(1)\). |
-> SuffixArray i | Suffix array for the above sequence |
-> Pull a | Pattern of length \(m\) |
-> (Int, Int) | An |
\(O(m \log n)\). Search for a pattern in a sequence using its suffix array.
Note: For typical inputs, the worst case is unlikely and the running time is
close to \(O(m + \log n)\). To get guaranteed \(O(m + \log n)\) running time
consider using searchLRLCP
instead.
Longest common prefix array
Arguments
:: (Eq a, Intn i) | |
=> Pull a | Sequence of length |
-> SuffixArray i | Suffix array for the above sequence |
-> LCPArray i |
\(O(n)\). Build a longest common prefix array from a sequence and its suffix array.
The LCP array has the same length as the sequence, \(n\). The \(0\)-th element of the LCP array is \(0\). The \(i\)-th element of the LCP array for \(0 < i < n\) is the longest common prefix of the \(i\)-th and \((i-1)\)-th suffix in the suffix array.
Longest common prefix array.
Instances
(Show i, Prim i) => Show (LCPArray i) Source # | |
NFData (LCPArray i) Source # | |
Defined in Data.Suffix | |
(Eq i, Prim i) => Eq (LCPArray i) Source # | |
(Ord i, Prim i) => Ord (LCPArray i) Source # | |
LLCP and RLCP arrays
buildLRLCPArray :: Intn i => LCPArray i -> LRLCPArrays i Source #
\(O(n)\). Build LLCP and RLCP arrays from an LCP array, for use in
searchLRLCP
.
data LRLCPArrays i Source #
LLCP and RLCP arrays.
Constructors
LRLCPArrays !(PrimArray i) !(PrimArray i) |
Instances
(Show i, Prim i) => Show (LRLCPArrays i) Source # | |
Defined in Data.Suffix Methods showsPrec :: Int -> LRLCPArrays i -> ShowS # show :: LRLCPArrays i -> String # showList :: [LRLCPArrays i] -> ShowS # | |
NFData (LRLCPArrays i) Source # | |
Defined in Data.Suffix Methods rnf :: LRLCPArrays i -> () # | |
(Eq i, Prim i) => Eq (LRLCPArrays i) Source # | |
Defined in Data.Suffix Methods (==) :: LRLCPArrays i -> LRLCPArrays i -> Bool # (/=) :: LRLCPArrays i -> LRLCPArrays i -> Bool # | |
(Ord i, Prim i) => Ord (LRLCPArrays i) Source # | |
Defined in Data.Suffix Methods compare :: LRLCPArrays i -> LRLCPArrays i -> Ordering # (<) :: LRLCPArrays i -> LRLCPArrays i -> Bool # (<=) :: LRLCPArrays i -> LRLCPArrays i -> Bool # (>) :: LRLCPArrays i -> LRLCPArrays i -> Bool # (>=) :: LRLCPArrays i -> LRLCPArrays i -> Bool # max :: LRLCPArrays i -> LRLCPArrays i -> LRLCPArrays i # min :: LRLCPArrays i -> LRLCPArrays i -> LRLCPArrays i # |
Arguments
:: (Ord a, Intn i) | |
=> Pull a | Sequence of length \(n\). Indexing is assumed to
be \(O(1)\). |
-> SuffixArray i | Suffix array for the above sequence |
-> LRLCPArrays i | LLCP and RLCP arrays for the above sequence |
-> Pull a | Pattern sequence of length \(m\) |
-> (Int, Int) | An |
\(O(m + \log n)\). Search for a pattern in a sequence using its suffix array and LLCP and RLCP arrays.
Suffix tree
Arguments
:: (Intn i, Monad m) | |
=> (Int -> m a) | Leaf. The |
-> (Int -> m b) | Internal node, initialize. The |
-> (Int -> b -> a -> m b) | Internal node, combine with the value from a
child. The |
-> (Int -> b -> m a) | Internal node, finalize. The |
-> SuffixArray i | A suffix array derived from a sequence |
-> LCPArray i | The LCP array for the same sequence and suffix array |
-> m a |
\(O(n)\). Bottom-up strict monadic fold of the suffix tree formed by the given suffix array and LCP array.
The tree folded over can be considered to be equivalent to the following definition.
data SuffixTree = Leaf Int -- ^ The suffix index | Branch Int -- ^ The string depth of this node [SuffixTree] -- ^ The children of this node
This tree has exactly \(n+1\) leaves and at most \(n\) internal nodes. The suffix with index \(n\), i.e. the empty suffix, is a leaf in this tree, though it is not present in the suffix array.
The calls to the leaf function and the internal node finalizer taken together form a post-order traversal of the tree. For an internal node, the values from its children are combined left-to-right.
The \(O(n)\) bound assumes that monadic bind and all given functions are \(O(1)\).
Pull
A pull array. Serves as a simple interface for array-like structures.
Note: For good performance, create Pull
s right before supplying them to
functions in this module.
Constructors
Pull | |
pullFromByteString :: ByteString -> Pull Word8 Source #
Pull from a ByteString
(from the bytestring package).
pullFromPrimArray :: Prim a => PrimArray a -> Pull a Source #
Pull from a PrimArray
(from the primitive package).
pullFromArray :: Array a -> Pull a Source #
Pull from an Array
(from the primitive package).
Arguments
:: (arr -> Int) | Size function |
-> (arr -> Int -> a) | Indexing function |
-> arr | The structure |
-> Pull a |
Pull elements from any array-like structure.