text-compression-0.1.0.25: A text compression library.
Copyright(c) Matthew Mosior 2022
LicenseBSD-style
Maintainermattm.github@gmail.com
Portabilityportable
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.FMIndex

Description

Full-text Minute-space index (FM-index)

Users will get the most mileage by first compressing to a BWT on the initial ByteString or Text input before compressing to a FMIndexB or FMIndexT.

To do this, users can use the bytestringToBWTToFMIndexB and bytestringToBWTToFMIndexT functions, as well as the textToBWTToFMIndexB and textToBWTToFMIndexT functions.

Operation: Count

The count operation is supported by both sequential, bytestringFMIndexCountS and textFMIndexCountS and parallel, bytestringFMIndexCountP and textFMIndexCountP , implementations.

The count operations on ByteString, bytestringFMIndexCountS and bytestringFMIndexCountP, are implemented using the countFMIndexB function.

The count operations on Text, textFMIndexCountS and textFMIndexCountP, are implemented using the countFMIndexT function.

Operation: Locate

The locate operation is supported by both sequential, bytestringFMIndexLocateS and textFMIndexLocateS and parallel, bytestringFMIndexLocateP and textFMIndexLocateP , implementations.

The locate operations on ByteString, bytestringFMIndexLocateS and bytestringFMIndexLocateP, are implemented using the locateFMIndexB function.

The locate operations on Text, textFMIndexLocateS and textFMIndexLocateP, are implemented using the locateFMIndexT function.

Internal

Data.FMIndex.Internal contains efficient and stateful implementations of the FMIndex and Inverse FMIndex algorithms.

Synopsis

To FMIndex functions

bytestringToBWTToFMIndexB :: ByteString -> FMIndexB Source #

Helper function for converting a ByteString to a FMIndexB via a BWT first.

bytestringToBWTToFMIndexT :: ByteString -> FMIndexT Source #

Helper function for converting a ByteString to a FMIndexT via a BWT first.

textToBWTToFMIndexB :: Text -> FMIndexB Source #

Helper function for converting a Text to a FMIndexB via a BWT first.

textToBWTToFMIndexT :: Text -> FMIndexT Source #

Helper function for converting a Text to a FMIndexT via a BWT first.

textBWTToFMIndexB :: BWTMatrix Word8 -> TextBWT -> FMIndexB Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

bytestringBWTToFMIndexB :: BWTMatrix Word8 -> BWT Word8 -> FMIndexB Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

textBWTToFMIndexT :: BWTMatrix Word8 -> TextBWT -> FMIndexT Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

bytestringBWTToFMIndexT :: BWTMatrix Word8 -> BWT Word8 -> FMIndexT Source #

Take a BWT of Word8s and generate the FM-index (FMIndexT).

From FMIndex functions

bytestringFromBWTFromFMIndexB :: FMIndexB -> ByteString Source #

Helper function for converting a BWTed FMIndexB back to the original ByteString.

bytestringFromBWTFromFMIndexT :: FMIndexT -> ByteString Source #

Helper function for converting a BWTed FMIndexT back to the original ByteString.

textFromBWTFromFMIndexB :: FMIndexB -> Text Source #

Helper function for converting a BWTed FMIndexB back to the original Text.

textFromBWTFromFMIndexT :: FMIndexT -> Text Source #

Helper function for converting a BWTed FMIndexT back to the original Text.

textBWTFromFMIndexT :: FMIndexT -> BWT Text Source #

Takes a FMIndexT and returns the BWT of Texts.

textBWTFromFMIndexB :: FMIndexB -> BWT Text Source #

Takes a FMIndexB and returns the BWT of Texts.

textFromFMIndexB :: FMIndexB -> Seq (Maybe Text) Source #

Takes a FMIndexB and returns the original Seq of Texts.

bytestringFromFMIndexB :: FMIndexB -> Seq (Maybe ByteString) Source #

Takes a FMIndexB and returns the original Seq of ByteStrings.

textFromFMIndexT :: FMIndexT -> Seq (Maybe Text) Source #

Takes a FMIndexT and returns the original Seq of Texts.

bytestringFromFMIndexT :: FMIndexT -> Seq (Maybe ByteString) Source #

Takes a FMIndexT and returns the original Seq of ByteStrings.

Count operations

bytestringFMIndexCountS :: [ByteString] -> ByteString -> Seq (ByteString, CIntB) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the number of occurences of the pattern(s) in the input ByteString.

textFMIndexCountS :: [Text] -> Text -> Seq (Text, CIntT) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the number of occurences of the pattern(s) in the input Text.

bytestringFMIndexCountP :: [ByteString] -> ByteString -> IO (Seq (ByteString, CIntB)) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the number of occurences of the pattern(s) in the input ByteString. Parallelized and utilizes chunking based on the number of available cores. When using, compile with: -O2 -threaded -with-rtsopts=-N.

textFMIndexCountP :: [Text] -> Text -> IO (Seq (Text, CIntT)) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the number of occurences of the pattern(s) in the input Text. Parallelized and utilizes chunking based on the number of available cores. When using, compile with: -O2 -threaded -with-rtsopts=-N.

Locate operations

bytestringFMIndexLocateS :: [ByteString] -> ByteString -> Seq (ByteString, LIntB) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the indexe(s) of occurences of the pattern(s) in the input ByteString. The output indices are 1-based, and are not sorted.

textFMIndexLocateS :: [Text] -> Text -> Seq (Text, LIntT) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the indexe(s) of occurences of the pattern(s) in the input Text. The output indices are 1-based, and are not sorted.

bytestringFMIndexLocateP :: [ByteString] -> ByteString -> IO (Seq (ByteString, LIntB)) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the indexe(s) of occurences of the pattern(s) in the input ByteString. The output indices are 1-based, and are not sorted. Parallelized and utilizes chunking based on the number of available cores. When using, compile with: -O2 -threaded -with-rtsopts=-N.

textFMIndexLocateP :: [Text] -> Text -> IO (Seq (Text, LIntT)) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the indexe(s) of occurences of the pattern(s) in the input Text. The output indices are 1-based, and are not sorted. Parallelized and utilizes chunking based on the number of available cores. When using, compile with: -O2 -threaded -with-rtsopts=-N.