suffixtree-0.2: Efficient, lazy suffix tree implementationContentsIndex
Data.SuffixTree
Portabilityportable
Stabilityexperimental
Maintainerbos@serpentine.com
Contents
Types
Construction
Querying
Traversal
Other useful functions
Description

A lazy, efficient suffix tree implementation.

Since many function names (but not the type name) clash with Prelude names, this module is usually imported qualified, e.g.

  import Data.SuffixTree (STree)
  import qualified Data.SuffixTree as T

The implementation is based on the first of those described in the following paper:

This implementation constructs the suffix tree lazily, so subtrees are not created until they are traversed. Two construction functions are provided, constructWith for sequences composed of small alphabets, and construct for larger alphabets.

Estimates are given for performance. The value k is a constant; n is the length of a query string; and t is the number of elements (nodes and leaves) in a suffix tree.

Synopsis
type Alphabet a = [a]
type Edge a = (Prefix a, STree a)
data Prefix a
data STree a
= Node [Edge a]
| Leaf
constructWith :: Eq a => Alphabet a -> [a] -> STree a
construct :: Ord a => [a] -> STree a
elem :: Eq a => [a] -> STree a -> Bool
findEdge :: Eq a => [a] -> STree a -> Maybe (Edge a, Int)
findTree :: Eq a => [a] -> STree a -> Maybe (STree a)
findPath :: Eq a => [a] -> STree a -> [Edge a]
countLeaves :: STree a -> Int
countRepeats :: Eq a => [a] -> STree a -> Int
foldr :: (Prefix a -> b -> b) -> b -> STree a -> b
foldl :: (a -> Prefix b -> a) -> a -> STree b -> a
fold :: (a -> a) -> (a -> a) -> (Prefix b -> a -> a -> a) -> (a -> a) -> a -> STree b -> a
mkPrefix :: [a] -> Prefix a
prefix :: Prefix a -> [a]
suffixes :: [a] -> [[a]]
Types
type Alphabet a = [a]
The list of symbols that constructWith can possibly see in its input.
type Edge a = (Prefix a, STree a)
An edge in the suffix tree.
data Prefix a
The prefix string associated with an Edge. Use mkPrefix to create a value of this type, and prefix to deconstruct one.
show/hide Instances
Functor Prefix
Eq a => Eq (Prefix a)
Ord a => Ord (Prefix a)
Show a => Show (Prefix a)
data STree a
The suffix tree type. The implementation is exposed to ease the development of custom traversal functions. Note that (Prefix a, STree a) pairs are not stored in any order.
Constructors
Node [Edge a]
Leaf
show/hide Instances
Functor STree
??? a => Show (STree a)
Construction
constructWith :: Eq a => Alphabet a -> [a] -> STree a
O(k n log n). Constructs a suffix tree using the given alphabet. The performance of this function is linear in the size k of the alphabet. That makes this function suitable for small alphabets, such as DNA nucleotides. For an alphabet containing more than a few symbols, construct is usually several orders of magnitude faster.
construct :: Ord a => [a] -> STree a
O(n log n). Constructs a suffix tree.
Querying
elem :: Eq a => [a] -> STree a -> Bool
O(n). Indicates whether the suffix tree contains the given sublist. Performance is linear in the length n of the sublist.
findEdge :: Eq a => [a] -> STree a -> Maybe (Edge a, Int)

O(n). Finds the given subsequence in the suffix tree. On failure, returns Nothing. On success, returns the Edge in the suffix tree at which the subsequence ends, along with the number of elements to drop from the prefix of the Edge to get the "real" remaining prefix.

Here is an example:

> find "ssip" (construct "mississippi")
Just ((mkPrefix "ppi",Leaf),1)

This indicates that the edge (mkPrefix "ppi",Leaf) matches, and that we must strip 1 character from the string "ppi" to get the remaining prefix string "pi".

Performance is linear in the length n of the query list.

findTree :: Eq a => [a] -> STree a -> Maybe (STree a)

O(n). Finds the subtree rooted at the end of the given query sequence. On failure, returns Nothing.

Performance is linear in the length n of the query list.

findPath :: Eq a => [a] -> STree a -> [Edge a]

O(n). Returns the path from the Edge in the suffix tree at which the given subsequence ends, up to the root of the tree. If the subsequence is not found, returns the empty list.

Performance is linear in the length of the query list.

countLeaves :: STree a -> Int

O(t). Count the number of leaves in a tree.

Performance is linear in the number t of elements in the tree.

countRepeats :: Eq a => [a] -> STree a -> Int

O(n + r). Count the number of times a sequence is repeated in the input sequence that was used to construct the suffix tree.

Performance is linear in the length n of the input sequence, plus the number of times r the sequence is repeated.

Traversal
foldr :: (Prefix a -> b -> b) -> b -> STree a -> b
O(t). Folds the edges in a tree, using post-order traversal. Suitable for lazy use.
foldl
:: (a -> Prefix b -> a)step function (evaluated strictly)
-> ainitial state
-> STree b
-> a
O(t). Folds the edges in a tree, using pre-order traversal. The step function is evaluated strictly.
fold
:: (a -> a)downwards state transformer
-> (a -> a)upwards state transformer
-> (Prefix b -> a -> a -> a)edge state transformer
-> (a -> a)leaf state transformer
-> ainitial state
-> STree btree
-> a

O(t). Generic fold over a tree.

A few simple examples.

countLeaves == fold id id (const const) (1+) 0
countEdges = fold id id (\_ a _ -> a+1) id 0

This more complicated example generates a tree of the same shape, but new type, with annotated leaves.

data GenTree a b = GenNode [(Prefix a, GenTree a b)]
                 | GenLeaf b
                   deriving (Show)
 
gentree :: STree a -> GenTree a Int
gentree = fold reset id fprefix reset leaf
    where leaf = GenLeaf 1
          reset = const leaf
          fprefix p t (GenLeaf _) = GenNode [(p, t)]
          fprefix p t (GenNode es) = GenNode ((p, t):es)
 
Other useful functions
mkPrefix :: [a] -> Prefix a
O(1). Construct a Prefix value.
prefix :: Prefix a -> [a]
O(n). Obtain the list stored in a Prefix.
suffixes :: [a] -> [[a]]

O(n). Returns all non-empty suffixes of the argument, longest first. Behaves as follows:

suffixes xs == init (tails xs)
Produced by Haddock version 0.8