SuffixStructures-0.0.1.0: Suffix array construction

Safe HaskellNone
LanguageHaskell2010

Data.SuffixStructure.ESA

Description

The suffix array data structure. Supports (de-) serialization via aeson,cereal,binary.

Reading and writing to and from specialized "bio" formats is currently open.

TODO compression during serialization? TODO versioning? TODO read sam/bam format? TODO what about mmap for really large indices?

Synopsis

Documentation

data SA Source

The Suffix Array data type, together with the longest common prefix table.

TODO skip table? TODO inverse suffix array?

TODO maybe parametrize on the Int type (Int,Int64,Int32,Word's) This will require better specialization of operations in NaiveArray and elsewhere. Otherwise performance drops quite noticable by x5 to x10.

Constructors

SA 

Fields

sa :: !(Vector Int)

the actual suffix array using 8byte Ints

lcp :: !(Vector Int8)

1byte longest common prefix vector, negative number indicates to look at lcpLong

lcpLong :: !(IntMap Int)

lcp's that are unusual long, but this is sparse

lcpAt :: SA -> Int -> Int Source

Automatically check lcp and lcpLong to return the real prefix length in Int (as opposed to Int8 storage of lcp).