| Copyright | (c) Matthew Mosior 2022 |
|---|---|
| License | BSD-style |
| Maintainer | mattm.github@gmail.com |
| Portability | portable |
| Safe Haskell | Safe-Inferred |
| Language | Haskell2010 |
Data.FMIndex
Description
Full-text Minute-space index (FM-index)
Users will get the most mileage by first compressing to a BWT
on the initial ByteString or Text input before compressing to
a FMIndexB or FMIndexT.
To do this, users can use the bytestringToBWTToFMIndexB and bytestringToBWTToFMIndexT functions,
as well as the textToBWTToFMIndexB and textToBWTToFMIndexT functions.
Operation: Count
The count operation is supported by both sequential, bytestringFMIndexCountS and textFMIndexCountS
and parallel, bytestringFMIndexCountP and textFMIndexCountP , implementations.
The count operations on ByteString, bytestringFMIndexCountS and bytestringFMIndexCountP, are implemented using the countFMIndexB function.
The count operations on Text, textFMIndexCountS and textFMIndexCountP, are implemented using the countFMIndexT function.
Operation: Locate
The locate operation is supported by both sequential, bytestringFMIndexLocateS and textFMIndexLocateS
and parallel, bytestringFMIndexLocateP and textFMIndexLocateP , implementations.
The locate operations on ByteString, bytestringFMIndexLocateS and bytestringFMIndexLocateP, are implemented using the locateFMIndexB function.
The locate operations on Text, textFMIndexLocateS and textFMIndexLocateP, are implemented using the locateFMIndexT function.
Internal
Data.FMIndex.Internal contains efficient and stateful implementations of the FMIndex and Inverse FMIndex algorithms.
Synopsis
- bytestringToBWTToFMIndexB :: ByteString -> FMIndexB
- bytestringToBWTToFMIndexT :: ByteString -> FMIndexT
- textToBWTToFMIndexB :: Text -> FMIndexB
- textToBWTToFMIndexT :: Text -> FMIndexT
- textBWTToFMIndexB :: BWTMatrix Word8 -> TextBWT -> FMIndexB
- bytestringBWTToFMIndexB :: BWTMatrix Word8 -> BWT Word8 -> FMIndexB
- textBWTToFMIndexT :: BWTMatrix Word8 -> TextBWT -> FMIndexT
- bytestringBWTToFMIndexT :: BWTMatrix Word8 -> BWT Word8 -> FMIndexT
- bytestringFromBWTFromFMIndexB :: FMIndexB -> ByteString
- bytestringFromBWTFromFMIndexT :: FMIndexT -> ByteString
- textFromBWTFromFMIndexB :: FMIndexB -> Text
- textFromBWTFromFMIndexT :: FMIndexT -> Text
- textBWTFromFMIndexT :: FMIndexT -> BWT Text
- bytestringBWTFromFMIndexT :: FMIndexT -> BWT ByteString
- textBWTFromFMIndexB :: FMIndexB -> BWT Text
- bytestringBWTFromFMIndexB :: FMIndexB -> BWT ByteString
- textFromFMIndexB :: FMIndexB -> Seq (Maybe Text)
- bytestringFromFMIndexB :: FMIndexB -> Seq (Maybe ByteString)
- textFromFMIndexT :: FMIndexT -> Seq (Maybe Text)
- bytestringFromFMIndexT :: FMIndexT -> Seq (Maybe ByteString)
- bytestringFMIndexCountS :: [ByteString] -> ByteString -> Seq (ByteString, CIntB)
- textFMIndexCountS :: [Text] -> Text -> Seq (Text, CIntT)
- bytestringFMIndexCountP :: [ByteString] -> ByteString -> IO (Seq (ByteString, CIntB))
- textFMIndexCountP :: [Text] -> Text -> IO (Seq (Text, CIntT))
- bytestringFMIndexLocateS :: [ByteString] -> ByteString -> Seq (ByteString, LIntB)
- textFMIndexLocateS :: [Text] -> Text -> Seq (Text, LIntT)
- bytestringFMIndexLocateP :: [ByteString] -> ByteString -> IO (Seq (ByteString, LIntB))
- textFMIndexLocateP :: [Text] -> Text -> IO (Seq (Text, LIntT))
To FMIndex functions
bytestringToBWTToFMIndexB :: ByteString -> FMIndexB Source #
Helper function for converting a ByteString
to a FMIndexB via a BWT first.
bytestringToBWTToFMIndexT :: ByteString -> FMIndexT Source #
Helper function for converting a ByteString
to a FMIndexT via a BWT first.
textToBWTToFMIndexB :: Text -> FMIndexB Source #
textToBWTToFMIndexT :: Text -> FMIndexT Source #
From FMIndex functions
bytestringFromBWTFromFMIndexB :: FMIndexB -> ByteString Source #
Helper function for converting a BWTed FMIndexB
back to the original ByteString.
bytestringFromBWTFromFMIndexT :: FMIndexT -> ByteString Source #
Helper function for converting a BWTed FMIndexT
back to the original ByteString.
bytestringBWTFromFMIndexT :: FMIndexT -> BWT ByteString Source #
Takes a FMIndexT and returns
the BWT of ByteStrings.
bytestringBWTFromFMIndexB :: FMIndexB -> BWT ByteString Source #
Take a FMIndexB and returns
the BWT of ByteStrings.
bytestringFromFMIndexB :: FMIndexB -> Seq (Maybe ByteString) Source #
Takes a FMIndexB and returns
the original Seq of ByteStrings.
bytestringFromFMIndexT :: FMIndexT -> Seq (Maybe ByteString) Source #
Takes a FMIndexT and returns
the original Seq of ByteStrings.
Count operations
bytestringFMIndexCountS :: [ByteString] -> ByteString -> Seq (ByteString, CIntB) Source #
Takes a list of pattern(s) of ByteStrings
and an input ByteString
and returns the number of occurences of the pattern(s)
in the input ByteString.
bytestringFMIndexCountP :: [ByteString] -> ByteString -> IO (Seq (ByteString, CIntB)) Source #
Takes a list of pattern(s) of ByteStrings
and an input ByteString
and returns the number of occurences of the pattern(s)
in the input ByteString.
Parallelized and utilizes chunking
based on the number of available cores.
When using, compile with: -O2 -threaded -with-rtsopts=-N.
Locate operations
bytestringFMIndexLocateS :: [ByteString] -> ByteString -> Seq (ByteString, LIntB) Source #
Takes a list of pattern(s) of ByteStrings
and an input ByteString
and returns the indexe(s) of occurences of the pattern(s)
in the input ByteString.
The output indices are 1-based,
and are not sorted.
bytestringFMIndexLocateP :: [ByteString] -> ByteString -> IO (Seq (ByteString, LIntB)) Source #
Takes a list of pattern(s) of ByteStrings
and an input ByteString
and returns the indexe(s) of occurences of the pattern(s)
in the input ByteString.
The output indices are 1-based,
and are not sorted.
Parallelized and utilizes chunking
based on the number of available cores.
When using, compile with: -O2 -threaded -with-rtsopts=-N.
textFMIndexLocateP :: [Text] -> Text -> IO (Seq (Text, LIntT)) Source #
Takes a list of pattern(s) of Texts
and an input Text
and returns the indexe(s) of occurences of the pattern(s)
in the input Text.
The output indices are 1-based,
and are not sorted.
Parallelized and utilizes chunking
based on the number of available cores.
When using, compile with: -O2 -threaded -with-rtsopts=-N.