| Copyright | (c) Matthew Mosior 2022 |
|---|---|
| License | BSD-style |
| Maintainer | mattm.github@gmail.com |
| Portability | portable |
| Safe Haskell | Safe-Inferred |
| Language | Haskell2010 |
Data.FMIndex.Internal
Description
WARNING
This module is considered internal.
The Package Versioning Policy does not apply.
The contents of this module may change in any way whatsoever and without any warning between minor versions of this package.
Authors importing this library are expected to track development closely.
All credit goes to the author(s)/maintainer(s) of the containers library for the above warning text.
Description
Various data structures and custom data types to describe the
Full-text Minute-space index (FM-index)
and the Inverse FM-index implementations, namely seqToOccCKB, seqToOccCKT, seqToCcB, seqToCcT, seqFromFMIndexB, and seqFromFMIndexT.
The FM-index implementations rely heavily upon Seq provided by the containers,
STRef and associated functions in the stref library,
and runST in the Control.Monad.ST library.
Example FM-index Output
The below example is taken from this wikipedia page.
Given the following input, "abracadabra":
and
Given the following Burrows-Wheeler matrix (BWM) of the input "abracadabra":
| I | F | L | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | $ | a | b | r | a | c | a | d | a | b | r | a |
| 2 | a | $ | a | b | r | a | c | a | d | a | b | r |
| 3 | a | b | r | a | $ | a | b | r | a | c | a | d |
| 4 | a | b | r | a | c | a | d | a | b | r | a | $ |
| 5 | a | c | a | d | a | b | r | a | $ | a | b | r |
| 6 | a | d | a | b | r | a | $ | a | b | r | a | c |
| 7 | b | r | a | $ | a | b | r | a | c | a | d | a |
| 8 | b | r | a | c | a | d | a | b | r | a | $ | a |
| 9 | c | a | d | a | b | r | a | $ | a | b | r | a |
| 10 | d | a | b | r | a | $ | a | b | r | a | c | a |
| 11 | r | a | $ | a | b | r | a | c | a | d | a | b |
| 12 | r | a | c | a | d | a | b | r | a | $ | a | b |
The FM-index output of the Burrows-Wheeler transform of the input is:
C[c] of "ard$rcaaaabb"
| c | $ | a | b | c | d | r |
| C[c] | 0 | 1 | 6 | 8 | 9 | 10 |
and
Occ(c,k) of "ard$rcaaaabb"
| a | r | d | $ | r | c | a | a | a | a | b | b | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
| $ | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| a | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 | 5 | 5 | 5 |
| b | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 |
| c | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| d | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| r | 0 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Synopsis
- newtype FMIndexB = FMIndexB (CcB, OccCKB)
- newtype FMIndexT = FMIndexT (CcT, OccCKT)
- newtype OccCKB = OccCKB (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString)))
- newtype OccCKT = OccCKT (Seq (Maybe Text, Seq (Int, Int, Maybe Text)))
- newtype CcB = CcB (Seq (Int, Maybe ByteString))
- newtype CcT = CcT (Seq (Int, Maybe Text))
- type PBOccCKSeqB = Seq (Maybe ByteString)
- type OccCKSeqB = Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))
- type STOccCKSeqB s a = STRef s OccCKSeqB
- updateSTOccCKSeqAB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> (Int, Int, Maybe ByteString) -> ST s ()
- updateSTOccCKSeqBB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> Maybe ByteString -> ST s ()
- emptySTOccCKSeqB :: ST s (STOccCKSeqB s a)
- type STOccCKILB s a = STRef s (Seq (Maybe ByteString))
- loadSTOccCKILB :: STOccCKILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s ()
- emptySTOccCKILB :: ST s (STOccCKILB s a)
- type STOccCKCounterB s a = STRef s Int
- updateSTOccCKCounterB :: STOccCKCounterB s Int -> Int -> ST s ()
- emptySTOccCKCounterB :: ST s (STOccCKCounterB s Int)
- seqToOccCKB :: PBOccCKSeqB -> ST s OccCKSeqB
- type PTOccCKSeqT = Seq (Maybe Text)
- type OccCKSeqT = Seq (Maybe Text, Seq (Int, Int, Maybe Text))
- type STOccCKSeqT s a = STRef s OccCKSeqT
- updateSTOccCKSeqAT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> (Int, Int, Maybe Text) -> ST s ()
- updateSTOccCKSeqBT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> Maybe Text -> ST s ()
- emptySTOccCKSeqT :: ST s (STOccCKSeqT s a)
- type STOccCKILT s a = STRef s (Seq (Maybe Text))
- loadSTOccCKILT :: STOccCKILT s (Maybe Text) -> Seq (Maybe Text) -> ST s ()
- emptySTOccCKILT :: ST s (STOccCKILT s a)
- type STOccCKCounterT s a = STRef s Int
- updateSTOccCKCounterT :: STOccCKCounterT s Int -> Int -> ST s ()
- emptySTOccCKCounterT :: ST s (STOccCKCounterT s Int)
- seqToOccCKT :: PTOccCKSeqT -> ST s OccCKSeqT
- type PBCcSeqB = Seq (Maybe ByteString)
- type CcSeqB = Seq (Int, Maybe ByteString)
- type STCcSeqB s a = STRef s CcSeqB
- updateSTCcSeqB :: STCcSeqB s (Seq (Int, Maybe ByteString)) -> (Int, Maybe ByteString) -> ST s ()
- emptySTCcSeqB :: ST s (STCcSeqB s a)
- type STCcILB s a = STRef s (Seq (Maybe ByteString))
- loadSTCcILB :: STCcILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s ()
- emptySTCcILB :: ST s (STCcILB s a)
- type STCcCounterB s a = STRef s Int
- updateSTCcCounterB :: STCcCounterB s Int -> Int -> ST s ()
- emptySTCcCounterB :: ST s (STCcCounterB s Int)
- seqToCcB :: PBCcSeqB -> ST s CcSeqB
- type PTCcSeqT = Seq (Maybe Text)
- type CcSeqT = Seq (Int, Maybe Text)
- type STCcSeqT s a = STRef s CcSeqT
- updateSTCcSeqT :: STCcSeqT s (Seq (Int, Maybe Text)) -> (Int, Maybe Text) -> ST s ()
- emptySTCcSeqT :: ST s (STCcSeqT s a)
- type STCcILT s a = STRef s (Seq (Maybe Text))
- loadSTCcILT :: STCcILT s (Maybe Text) -> Seq (Maybe Text) -> ST s ()
- emptySTCcILT :: ST s (STCcILT s a)
- type STCcCounterT s a = STRef s Int
- updateSTCcCounterT :: STCcCounterT s Int -> Int -> ST s ()
- emptySTCcCounterT :: ST s (STCcCounterT s Int)
- seqToCcT :: PTCcSeqT -> ST s CcSeqT
- type FFMIndexSeqB = Seq (Maybe ByteString)
- seqFromFMIndexB :: FMIndexB -> FFMIndexSeqB
- type FFMIndexSeqT = Seq (Maybe Text)
- seqFromFMIndexT :: FMIndexT -> FFMIndexSeqT
Documentation
Basic FMIndex (ByteString) data type.
Basic FMIndex (Text) data type.
Basic OccCKB (ByteString) data type.
Constructors
| OccCKB (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) |
Instances
| Generic OccCKB Source # | |
| Read OccCKB Source # | |
| Show OccCKB Source # | |
| Eq OccCKB Source # | |
| Ord OccCKB Source # | |
| type Rep OccCKB Source # | |
Defined in Data.FMIndex.Internal type Rep OccCKB = D1 ('MetaData "OccCKB" "Data.FMIndex.Internal" "text-compression-0.1.0.15-1U3drBT2M3HZ6U4DNXh3m" 'True) (C1 ('MetaCons "OccCKB" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString)))))) | |
Basic OccCKT (Text) data type.
Instances
| Generic OccCKT Source # | |
| Read OccCKT Source # | |
| Show OccCKT Source # | |
| Eq OccCKT Source # | |
| Ord OccCKT Source # | |
| type Rep OccCKT Source # | |
Defined in Data.FMIndex.Internal type Rep OccCKT = D1 ('MetaData "OccCKT" "Data.FMIndex.Internal" "text-compression-0.1.0.15-1U3drBT2M3HZ6U4DNXh3m" 'True) (C1 ('MetaCons "OccCKT" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Seq (Maybe Text, Seq (Int, Int, Maybe Text)))))) | |
Basic C[c] table (ByteString) data type.
Constructors
| CcB (Seq (Int, Maybe ByteString)) |
Basic C[c] table (Text) data type.
type PBOccCKSeqB = Seq (Maybe ByteString) Source #
Abstract PBOccCKSeqB type utilizing a Seq.
type STOccCKSeqB s a = STRef s OccCKSeqB Source #
Abstract data type representing a OccCKSeqB in the (strict) ST monad.
updateSTOccCKSeqAB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> (Int, Int, Maybe ByteString) -> ST s () Source #
State function to update OccCKSeqB
with each step of the OccCK.
updateSTOccCKSeqBB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> Maybe ByteString -> ST s () Source #
State function to update OccCKSeqB
with each step of the OccCK.
emptySTOccCKSeqB :: ST s (STOccCKSeqB s a) Source #
State function to create empty STOccCKSeqB type.
type STOccCKILB s a = STRef s (Seq (Maybe ByteString)) Source #
Abstract STOccCKILB and associated state type.
loadSTOccCKILB :: STOccCKILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s () Source #
State function to load list into STOccCKILB.
emptySTOccCKILB :: ST s (STOccCKILB s a) Source #
State function to create empty STOccCKILB type.
type STOccCKCounterB s a = STRef s Int Source #
Abstract STOccCKCounterB and associated state type.
updateSTOccCKCounterB :: STOccCKCounterB s Int -> Int -> ST s () Source #
State function to update STOccCKCounterB.
emptySTOccCKCounterB :: ST s (STOccCKCounterB s Int) Source #
State function to create empty STOccCKCounterB type.
seqToOccCKB :: PBOccCKSeqB -> ST s OccCKSeqB Source #
Strict state monad function.
type PTOccCKSeqT = Seq (Maybe Text) Source #
Abstract PTOccCKSeqT type utilizing a Seq.
type STOccCKSeqT s a = STRef s OccCKSeqT Source #
Abstract data type representing a OccCKSeqT in the (strict) ST monad.
updateSTOccCKSeqAT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> (Int, Int, Maybe Text) -> ST s () Source #
State function to update OccCKSeqT
with each step of the OccCK.
updateSTOccCKSeqBT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> Maybe Text -> ST s () Source #
State function to update OccCKSeqT
with each step of the OccCK.
emptySTOccCKSeqT :: ST s (STOccCKSeqT s a) Source #
State function to create empty STOccCKSeqT type.
type STOccCKILT s a = STRef s (Seq (Maybe Text)) Source #
Abstract STOccCKILT and associated state type.
loadSTOccCKILT :: STOccCKILT s (Maybe Text) -> Seq (Maybe Text) -> ST s () Source #
State function to load list into STOccCKILT.
emptySTOccCKILT :: ST s (STOccCKILT s a) Source #
State function to create empty STOccCKILT type.
type STOccCKCounterT s a = STRef s Int Source #
Abstract STOccCKCounterT and associated state type.
updateSTOccCKCounterT :: STOccCKCounterT s Int -> Int -> ST s () Source #
State function to update STOccCKCounterT.
emptySTOccCKCounterT :: ST s (STOccCKCounterT s Int) Source #
State function to create empty STOccCKCounterT type.
seqToOccCKT :: PTOccCKSeqT -> ST s OccCKSeqT Source #
Strict state monad function.
type STCcSeqB s a = STRef s CcSeqB Source #
Abstract data type representing a CcSeqB in the (strict) ST monad.
updateSTCcSeqB :: STCcSeqB s (Seq (Int, Maybe ByteString)) -> (Int, Maybe ByteString) -> ST s () Source #
State function to update CcSeqB
with each step of the C[c].
type STCcILB s a = STRef s (Seq (Maybe ByteString)) Source #
Abstract STCcILB and associated state type.
loadSTCcILB :: STCcILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s () Source #
State function to load list into STCcILB.
type STCcCounterB s a = STRef s Int Source #
Abstract STCcCounterB and associated state type.
updateSTCcCounterB :: STCcCounterB s Int -> Int -> ST s () Source #
State function to update STCcCounterB.
emptySTCcCounterB :: ST s (STCcCounterB s Int) Source #
State function to create empty STCcCounterT type.
type STCcSeqT s a = STRef s CcSeqT Source #
Abstract data type representing a CcSeqT in the (strict) ST monad.
updateSTCcSeqT :: STCcSeqT s (Seq (Int, Maybe Text)) -> (Int, Maybe Text) -> ST s () Source #
State function to update CcSeqT
with each step of the C[c].
loadSTCcILT :: STCcILT s (Maybe Text) -> Seq (Maybe Text) -> ST s () Source #
State function to load list into STCcILT.
type STCcCounterT s a = STRef s Int Source #
Abstract STCcCounterT and associated state type.
updateSTCcCounterT :: STCcCounterT s Int -> Int -> ST s () Source #
State function to update STCcCounterT.
emptySTCcCounterT :: ST s (STCcCounterT s Int) Source #
State function to create empty STCcCounterT type.
type FFMIndexSeqB = Seq (Maybe ByteString) Source #
Abstract FFMIndexSeqB type utilizing a Seq.
seqFromFMIndexB :: FMIndexB -> FFMIndexSeqB Source #
Simple Inverse FMIndex function.
type FFMIndexSeqT = Seq (Maybe Text) Source #
Abstract FFMIndexSeqT type utilizing a Seq.
seqFromFMIndexT :: FMIndexT -> FFMIndexSeqT Source #
Simple Inverse FMIndex function.