Copyright | (c) Matthew Mosior 2022 |
---|---|
License | BSD-style |
Maintainer | mattm.github@gmail.com |
Portability | portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
WARNING
This module is considered internal.
The Package Versioning Policy does not apply.
The contents of this module may change in any way whatsoever and without any warning between minor versions of this package.
Authors importing this library are expected to track development closely.
All credit goes to the author(s)/maintainer(s) of the containers library for the above warning text.
Description
Various data structures and custom data types to describe the
Full-text Minute-space index (FM-index)
and the Inverse FM-index implementations, namely seqToOccCKB
, seqToOccCKT
, seqToCcB
, seqToCcT
, seqFromFMIndexB
, and seqFromFMIndexT
.
The FM-index implementations rely heavily upon Seq
provided by the containers,
STRef
and associated functions in the stref library,
and runST
in the Control.Monad.ST library.
Example FM-index Output
The below example is taken from this wikipedia page.
Given the following input, "abracadabra":
and
Given the following Burrows-Wheeler matrix (BWM) of the input "abracadabra":
I | F | L | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | $ | a | b | r | a | c | a | d | a | b | r | a |
2 | a | $ | a | b | r | a | c | a | d | a | b | r |
3 | a | b | r | a | $ | a | b | r | a | c | a | d |
4 | a | b | r | a | c | a | d | a | b | r | a | $ |
5 | a | c | a | d | a | b | r | a | $ | a | b | r |
6 | a | d | a | b | r | a | $ | a | b | r | a | c |
7 | b | r | a | $ | a | b | r | a | c | a | d | a |
8 | b | r | a | c | a | d | a | b | r | a | $ | a |
9 | c | a | d | a | b | r | a | $ | a | b | r | a |
10 | d | a | b | r | a | $ | a | b | r | a | c | a |
11 | r | a | $ | a | b | r | a | c | a | d | a | b |
12 | r | a | c | a | d | a | b | r | a | $ | a | b |
The FM-index output of the Burrows-Wheeler transform of the input is:
C[c] of "ard$rcaaaabb"
c | $ | a | b | c | d | r |
C[c] | 0 | 1 | 6 | 8 | 9 | 10 |
and
Occ(c,k) of "ard$rcaaaabb"
a | r | d | $ | r | c | a | a | a | a | b | b | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
$ | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
a | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 4 | 5 | 5 | 5 |
b | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 |
c | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
d | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
r | 0 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Synopsis
- newtype FMIndexB = FMIndexB (CcB, OccCKB)
- newtype FMIndexT = FMIndexT (CcT, OccCKT)
- newtype OccCKB = OccCKB (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString)))
- newtype OccCKT = OccCKT (Seq (Maybe Text, Seq (Int, Int, Maybe Text)))
- newtype CcB = CcB (Seq (Int, Maybe ByteString))
- newtype CcT = CcT (Seq (Int, Maybe Text))
- type PBOccCKSeqB = Seq (Maybe ByteString)
- type OccCKSeqB = Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))
- type STOccCKSeqB s a = STRef s OccCKSeqB
- updateSTOccCKSeqAB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> (Int, Int, Maybe ByteString) -> ST s ()
- updateSTOccCKSeqBB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> Maybe ByteString -> ST s ()
- emptySTOccCKSeqB :: ST s (STOccCKSeqB s a)
- type STOccCKILB s a = STRef s (Seq (Maybe ByteString))
- loadSTOccCKILB :: STOccCKILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s ()
- emptySTOccCKILB :: ST s (STOccCKILB s a)
- type STOccCKCounterB s a = STRef s Int
- updateSTOccCKCounterB :: STOccCKCounterB s Int -> Int -> ST s ()
- emptySTOccCKCounterB :: ST s (STOccCKCounterB s Int)
- seqToOccCKB :: PBOccCKSeqB -> ST s OccCKSeqB
- type PTOccCKSeqT = Seq (Maybe Text)
- type OccCKSeqT = Seq (Maybe Text, Seq (Int, Int, Maybe Text))
- type STOccCKSeqT s a = STRef s OccCKSeqT
- updateSTOccCKSeqAT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> (Int, Int, Maybe Text) -> ST s ()
- updateSTOccCKSeqBT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> Maybe Text -> ST s ()
- emptySTOccCKSeqT :: ST s (STOccCKSeqT s a)
- type STOccCKILT s a = STRef s (Seq (Maybe Text))
- loadSTOccCKILT :: STOccCKILT s (Maybe Text) -> Seq (Maybe Text) -> ST s ()
- emptySTOccCKILT :: ST s (STOccCKILT s a)
- type STOccCKCounterT s a = STRef s Int
- updateSTOccCKCounterT :: STOccCKCounterT s Int -> Int -> ST s ()
- emptySTOccCKCounterT :: ST s (STOccCKCounterT s Int)
- seqToOccCKT :: PTOccCKSeqT -> ST s OccCKSeqT
- type PBCcSeqB = Seq (Maybe ByteString)
- type CcSeqB = Seq (Int, Maybe ByteString)
- type STCcSeqB s a = STRef s CcSeqB
- updateSTCcSeqB :: STCcSeqB s (Seq (Int, Maybe ByteString)) -> (Int, Maybe ByteString) -> ST s ()
- emptySTCcSeqB :: ST s (STCcSeqB s a)
- type STCcILB s a = STRef s (Seq (Maybe ByteString))
- loadSTCcILB :: STCcILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s ()
- emptySTCcILB :: ST s (STCcILB s a)
- type STCcCounterB s a = STRef s Int
- updateSTCcCounterB :: STCcCounterB s Int -> Int -> ST s ()
- emptySTCcCounterB :: ST s (STCcCounterB s Int)
- seqToCcB :: PBCcSeqB -> ST s CcSeqB
- type PTCcSeqT = Seq (Maybe Text)
- type CcSeqT = Seq (Int, Maybe Text)
- type STCcSeqT s a = STRef s CcSeqT
- updateSTCcSeqT :: STCcSeqT s (Seq (Int, Maybe Text)) -> (Int, Maybe Text) -> ST s ()
- emptySTCcSeqT :: ST s (STCcSeqT s a)
- type STCcILT s a = STRef s (Seq (Maybe Text))
- loadSTCcILT :: STCcILT s (Maybe Text) -> Seq (Maybe Text) -> ST s ()
- emptySTCcILT :: ST s (STCcILT s a)
- type STCcCounterT s a = STRef s Int
- updateSTCcCounterT :: STCcCounterT s Int -> Int -> ST s ()
- emptySTCcCounterT :: ST s (STCcCounterT s Int)
- seqToCcT :: PTCcSeqT -> ST s CcSeqT
- type FFMIndexSeqB = Seq (Maybe ByteString)
- seqFromFMIndexB :: FMIndexB -> FFMIndexSeqB
- type FFMIndexSeqT = Seq (Maybe Text)
- seqFromFMIndexT :: FMIndexT -> FFMIndexSeqT
Documentation
Basic FMIndex (ByteString
) data type.
Basic FMIndex (Text
) data type.
Basic OccCKB (ByteString
) data type.
OccCKB (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) |
Instances
Generic OccCKB Source # | |
Read OccCKB Source # | |
Show OccCKB Source # | |
Eq OccCKB Source # | |
Ord OccCKB Source # | |
type Rep OccCKB Source # | |
Defined in Data.FMIndex.Internal type Rep OccCKB = D1 ('MetaData "OccCKB" "Data.FMIndex.Internal" "text-compression-0.1.0.15-1U3drBT2M3HZ6U4DNXh3m" 'True) (C1 ('MetaCons "OccCKB" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString)))))) |
Basic OccCKT (Text
) data type.
Instances
Generic OccCKT Source # | |
Read OccCKT Source # | |
Show OccCKT Source # | |
Eq OccCKT Source # | |
Ord OccCKT Source # | |
type Rep OccCKT Source # | |
Defined in Data.FMIndex.Internal type Rep OccCKT = D1 ('MetaData "OccCKT" "Data.FMIndex.Internal" "text-compression-0.1.0.15-1U3drBT2M3HZ6U4DNXh3m" 'True) (C1 ('MetaCons "OccCKT" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Seq (Maybe Text, Seq (Int, Int, Maybe Text)))))) |
Basic C[c] table (ByteString
) data type.
CcB (Seq (Int, Maybe ByteString)) |
Basic C[c] table (Text
) data type.
type PBOccCKSeqB = Seq (Maybe ByteString) Source #
Abstract PBOccCKSeqB
type utilizing a Seq
.
type STOccCKSeqB s a = STRef s OccCKSeqB Source #
Abstract data type representing a OccCKSeqB
in the (strict) ST monad.
updateSTOccCKSeqAB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> (Int, Int, Maybe ByteString) -> ST s () Source #
State function to update OccCKSeqB
with each step of the OccCK.
updateSTOccCKSeqBB :: STOccCKSeqB s (Seq (Maybe ByteString, Seq (Int, Int, Maybe ByteString))) -> Maybe ByteString -> ST s () Source #
State function to update OccCKSeqB
with each step of the OccCK.
emptySTOccCKSeqB :: ST s (STOccCKSeqB s a) Source #
State function to create empty STOccCKSeqB
type.
type STOccCKILB s a = STRef s (Seq (Maybe ByteString)) Source #
Abstract STOccCKILB
and associated state type.
loadSTOccCKILB :: STOccCKILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s () Source #
State function to load list into STOccCKILB
.
emptySTOccCKILB :: ST s (STOccCKILB s a) Source #
State function to create empty STOccCKILB
type.
type STOccCKCounterB s a = STRef s Int Source #
Abstract STOccCKCounterB
and associated state type.
updateSTOccCKCounterB :: STOccCKCounterB s Int -> Int -> ST s () Source #
State function to update STOccCKCounterB
.
emptySTOccCKCounterB :: ST s (STOccCKCounterB s Int) Source #
State function to create empty STOccCKCounterB
type.
seqToOccCKB :: PBOccCKSeqB -> ST s OccCKSeqB Source #
Strict state monad function.
type PTOccCKSeqT = Seq (Maybe Text) Source #
Abstract PTOccCKSeqT
type utilizing a Seq
.
type STOccCKSeqT s a = STRef s OccCKSeqT Source #
Abstract data type representing a OccCKSeqT
in the (strict) ST monad.
updateSTOccCKSeqAT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> (Int, Int, Maybe Text) -> ST s () Source #
State function to update OccCKSeqT
with each step of the OccCK.
updateSTOccCKSeqBT :: STOccCKSeqT s (Seq (Maybe Text, Seq (Int, Int, Maybe Text))) -> Maybe Text -> ST s () Source #
State function to update OccCKSeqT
with each step of the OccCK.
emptySTOccCKSeqT :: ST s (STOccCKSeqT s a) Source #
State function to create empty STOccCKSeqT
type.
type STOccCKILT s a = STRef s (Seq (Maybe Text)) Source #
Abstract STOccCKILT
and associated state type.
loadSTOccCKILT :: STOccCKILT s (Maybe Text) -> Seq (Maybe Text) -> ST s () Source #
State function to load list into STOccCKILT
.
emptySTOccCKILT :: ST s (STOccCKILT s a) Source #
State function to create empty STOccCKILT
type.
type STOccCKCounterT s a = STRef s Int Source #
Abstract STOccCKCounterT
and associated state type.
updateSTOccCKCounterT :: STOccCKCounterT s Int -> Int -> ST s () Source #
State function to update STOccCKCounterT
.
emptySTOccCKCounterT :: ST s (STOccCKCounterT s Int) Source #
State function to create empty STOccCKCounterT
type.
seqToOccCKT :: PTOccCKSeqT -> ST s OccCKSeqT Source #
Strict state monad function.
type STCcSeqB s a = STRef s CcSeqB Source #
Abstract data type representing a CcSeqB
in the (strict) ST monad.
updateSTCcSeqB :: STCcSeqB s (Seq (Int, Maybe ByteString)) -> (Int, Maybe ByteString) -> ST s () Source #
State function to update CcSeqB
with each step of the C[c].
type STCcILB s a = STRef s (Seq (Maybe ByteString)) Source #
Abstract STCcILB
and associated state type.
loadSTCcILB :: STCcILB s (Maybe ByteString) -> Seq (Maybe ByteString) -> ST s () Source #
State function to load list into STCcILB
.
type STCcCounterB s a = STRef s Int Source #
Abstract STCcCounterB
and associated state type.
updateSTCcCounterB :: STCcCounterB s Int -> Int -> ST s () Source #
State function to update STCcCounterB
.
emptySTCcCounterB :: ST s (STCcCounterB s Int) Source #
State function to create empty STCcCounterT
type.
type STCcSeqT s a = STRef s CcSeqT Source #
Abstract data type representing a CcSeqT
in the (strict) ST monad.
updateSTCcSeqT :: STCcSeqT s (Seq (Int, Maybe Text)) -> (Int, Maybe Text) -> ST s () Source #
State function to update CcSeqT
with each step of the C[c].
loadSTCcILT :: STCcILT s (Maybe Text) -> Seq (Maybe Text) -> ST s () Source #
State function to load list into STCcILT
.
type STCcCounterT s a = STRef s Int Source #
Abstract STCcCounterT
and associated state type.
updateSTCcCounterT :: STCcCounterT s Int -> Int -> ST s () Source #
State function to update STCcCounterT
.
emptySTCcCounterT :: ST s (STCcCounterT s Int) Source #
State function to create empty STCcCounterT
type.
type FFMIndexSeqB = Seq (Maybe ByteString) Source #
Abstract FFMIndexSeqB
type utilizing a Seq
.
seqFromFMIndexB :: FMIndexB -> FFMIndexSeqB Source #
Simple Inverse FMIndex function.
type FFMIndexSeqT = Seq (Maybe Text) Source #
Abstract FFMIndexSeqT
type utilizing a Seq
.
seqFromFMIndexT :: FMIndexT -> FFMIndexSeqT Source #
Simple Inverse FMIndex function.