Copyright | (c) Matthew Mosior 2022 |
---|---|
License | BSD-style |
Maintainer | mattm.github@gmail.com |
Portability | portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Full-text Minute-space index (FM-index)
Users will get the most mileage by first compressing to a BWT
on the initial ByteString
or Text
input before compressing to
a FMIndexB
or FMIndexT
.
To do this, users can use the bytestringToBWTToFMIndexB
and bytestringToBWTToFMIndexT
functions,
as well as the textToBWTToFMIndexB
and textToBWTToFMIndexT
functions.
Operation: Count
The count operation is supported by both sequential, bytestringFMIndexCountS
and textFMIndexCountS
and parallel, bytestringFMIndexCountP
and textFMIndexCountP
, implementations.
The count operations on ByteString
, bytestringFMIndexCountS
and bytestringFMIndexCountP
, are implemented using the countFMIndexB
function.
The count operations on Text
, textFMIndexCountS
and textFMIndexCountP
, are implemented using the countFMIndexT
function.
Operation: Locate
The locate operation is supported by both sequential, bytestringFMIndexLocateS
and textFMIndexLocateS
and parallel, bytestringFMIndexLocateP
and textFMIndexLocateP
, implementations.
The locate operations on ByteString
, bytestringFMIndexLocateS
and bytestringFMIndexLocateP
, are implemented using the locateFMIndexB
function.
The locate operations on Text
, textFMIndexLocateS
and textFMIndexLocateP
, are implemented using the locateFMIndexT
function.
Internal
Data.FMIndex.Internal
contains efficient and stateful implementations of the FMIndex and Inverse FMIndex algorithms.
Synopsis
- bytestringToBWTToFMIndexB :: ByteString -> FMIndexB
- bytestringToBWTToFMIndexT :: ByteString -> FMIndexT
- textToBWTToFMIndexB :: Text -> FMIndexB
- textToBWTToFMIndexT :: Text -> FMIndexT
- textBWTToFMIndexB :: BWTMatrix Word8 -> TextBWT -> FMIndexB
- bytestringBWTToFMIndexB :: BWTMatrix Word8 -> BWT Word8 -> FMIndexB
- textBWTToFMIndexT :: BWTMatrix Word8 -> TextBWT -> FMIndexT
- bytestringBWTToFMIndexT :: BWTMatrix Word8 -> BWT Word8 -> FMIndexT
- bytestringFromBWTFromFMIndexB :: FMIndexB -> ByteString
- bytestringFromBWTFromFMIndexT :: FMIndexT -> ByteString
- textFromBWTFromFMIndexB :: FMIndexB -> Text
- textFromBWTFromFMIndexT :: FMIndexT -> Text
- textBWTFromFMIndexT :: FMIndexT -> BWT Text
- bytestringBWTFromFMIndexT :: FMIndexT -> BWT ByteString
- textBWTFromFMIndexB :: FMIndexB -> BWT Text
- bytestringBWTFromFMIndexB :: FMIndexB -> BWT ByteString
- textFromFMIndexB :: FMIndexB -> Seq (Maybe Text)
- bytestringFromFMIndexB :: FMIndexB -> Seq (Maybe ByteString)
- textFromFMIndexT :: FMIndexT -> Seq (Maybe Text)
- bytestringFromFMIndexT :: FMIndexT -> Seq (Maybe ByteString)
- bytestringFMIndexCountS :: [ByteString] -> ByteString -> Seq (ByteString, CIntB)
- textFMIndexCountS :: [Text] -> Text -> Seq (Text, CIntT)
- bytestringFMIndexCountP :: [ByteString] -> ByteString -> IO (Seq (ByteString, CIntB))
- textFMIndexCountP :: [Text] -> Text -> IO (Seq (Text, CIntT))
- bytestringFMIndexLocateS :: [ByteString] -> ByteString -> Seq (ByteString, LIntB)
- textFMIndexLocateS :: [Text] -> Text -> Seq (Text, LIntT)
- bytestringFMIndexLocateP :: [ByteString] -> ByteString -> IO (Seq (ByteString, LIntB))
- textFMIndexLocateP :: [Text] -> Text -> IO (Seq (Text, LIntT))
To FMIndex functions
bytestringToBWTToFMIndexB :: ByteString -> FMIndexB Source #
Helper function for converting a ByteString
to a FMIndexB
via a BWT
first.
bytestringToBWTToFMIndexT :: ByteString -> FMIndexT Source #
Helper function for converting a ByteString
to a FMIndexT
via a BWT
first.
textToBWTToFMIndexB :: Text -> FMIndexB Source #
textToBWTToFMIndexT :: Text -> FMIndexT Source #
From FMIndex functions
bytestringFromBWTFromFMIndexB :: FMIndexB -> ByteString Source #
Helper function for converting a BWT
ed FMIndexB
back to the original ByteString
.
bytestringFromBWTFromFMIndexT :: FMIndexT -> ByteString Source #
Helper function for converting a BWT
ed FMIndexT
back to the original ByteString
.
bytestringBWTFromFMIndexT :: FMIndexT -> BWT ByteString Source #
Takes a FMIndexT
and returns
the BWT
of ByteString
s.
bytestringBWTFromFMIndexB :: FMIndexB -> BWT ByteString Source #
Take a FMIndexB
and returns
the BWT
of ByteString
s.
bytestringFromFMIndexB :: FMIndexB -> Seq (Maybe ByteString) Source #
Takes a FMIndexB
and returns
the original Seq
of ByteString
s.
bytestringFromFMIndexT :: FMIndexT -> Seq (Maybe ByteString) Source #
Takes a FMIndexT
and returns
the original Seq
of ByteString
s.
Count operations
bytestringFMIndexCountS :: [ByteString] -> ByteString -> Seq (ByteString, CIntB) Source #
Takes a list of pattern(s) of ByteString
s
and an input ByteString
and returns the number of occurences of the pattern(s)
in the input ByteString
.
bytestringFMIndexCountP :: [ByteString] -> ByteString -> IO (Seq (ByteString, CIntB)) Source #
Takes a list of pattern(s) of ByteString
s
and an input ByteString
and returns the number of occurences of the pattern(s)
in the input ByteString
.
Parallelized and utilizes chunking
based on the number of available cores.
When using, compile with: -O2 -threaded -with-rtsopts=-N.
Locate operations
bytestringFMIndexLocateS :: [ByteString] -> ByteString -> Seq (ByteString, LIntB) Source #
Takes a list of pattern(s) of ByteString
s
and an input ByteString
and returns the indexe(s) of occurences of the pattern(s)
in the input ByteString
.
The output indices are 1-based,
and are not sorted.
bytestringFMIndexLocateP :: [ByteString] -> ByteString -> IO (Seq (ByteString, LIntB)) Source #
Takes a list of pattern(s) of ByteString
s
and an input ByteString
and returns the indexe(s) of occurences of the pattern(s)
in the input ByteString
.
The output indices are 1-based,
and are not sorted.
Parallelized and utilizes chunking
based on the number of available cores.
When using, compile with: -O2 -threaded -with-rtsopts=-N.
textFMIndexLocateP :: [Text] -> Text -> IO (Seq (Text, LIntT)) Source #
Takes a list of pattern(s) of Text
s
and an input Text
and returns the indexe(s) of occurences of the pattern(s)
in the input Text
.
The output indices are 1-based,
and are not sorted.
Parallelized and utilizes chunking
based on the number of available cores.
When using, compile with: -O2 -threaded -with-rtsopts=-N.