text-compression-0.1.0.22: A text compression library.
Copyright(c) Matthew Mosior 2022
LicenseBSD-style
Maintainermattm.github@gmail.com
Portabilityportable
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.FMIndex

Description

Full-text Minute-space index (FM-index)

Users will get the most mileage by first compressing to a BWT on the initial ByteString or Text input before compressing to a FMIndexB or FMIndexT.

To do this, users can use the bytestringToBWTToFMIndexB and bytestringToBWTToFMIndexT functions, as well as the textToBWTToFMIndexB and textToBWTToFMIndexT functions.

The base functions for ByteString, bytestringToFMIndexB and bytestringToFMIndexT can be used to convert a Seq (Maybe ByteString) to a FMIndexB and FMIndexT, respectively.

Likewise, the base functions for Text, textToFMIndexB and textToFMIndexT can be used to convert a Seq (Maybe Text) to a FMIndexB and FMIndexT respectively.

There are various other lower-level functions for interacting with the FMIndex implementation on ByteString and Text as well.

Operation: Count

The count operation is supported both serial, bytestringFMIndexCountS and textFMIndexCountS and parallel, bytestringFMIndexCountP and textFMIndexCountP , implementations.

The count operations on ByteString, bytestringFMIndexCountS and bytestringFMIndexCountP, are implemented using the countFMIndexB function.

The count operations on Text, textFMIndexCountS and textFMIndexCountP, are implemented using the countFMIndexT function.

Operation: Locate

The locate operation is supported both serial, bytestringFMIndexLocateS and textFMIndexLocateS and parallel, bytestringFMIndexLocateP and textFMIndexLocateP , implementations.

The locate operations on ByteString, bytestringFMIndexLocateS and bytestringFMIndexLocateP, are implemented using the locateFMIndexB function.

The locate operations on Text, textFMIndexLocateS and textFMIndexLocateP, are implemented using the locateFMIndexT function.

Internal

Data.FMIndex.Internal contains efficient and stateful implementations of the FMIndex and Inverse FMIndex algorithms.

Synopsis

To FMIndex functions

bytestringToBWTToFMIndexB :: ByteString -> FMIndexB Source #

Helper function for converting a ByteString to a FMIndexB via a BWT first.

bytestringToBWTToFMIndexT :: ByteString -> FMIndexT Source #

Helper function for converting a ByteString to a FMIndexT via a BWT first.

textToBWTToFMIndexB :: Text -> FMIndexB Source #

Helper function for converting a Text to a FMIndexB via a BWT first.

textToBWTToFMIndexT :: Text -> FMIndexT Source #

Helper function for converting a Text to a FMIndexT via a BWT first.

textBWTToFMIndexB :: BWTMatrix Word8 -> TextBWT -> FMIndexB Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

bytestringBWTToFMIndexB :: BWTMatrix Word8 -> BWT Word8 -> FMIndexB Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

textBWTToFMIndexT :: BWTMatrix Word8 -> TextBWT -> FMIndexT Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

bytestringBWTToFMIndexT :: BWTMatrix Word8 -> BWT Word8 -> FMIndexT Source #

Take a BWT of Word8s and generate the FM-index (FMIndexT).

textToFMIndexB :: BWTMatrix Text -> Seq (Maybe Text) -> FMIndexB Source #

Takes a Text and returns the FM-index (FMIndexB).

textToFMIndexT :: BWTMatrix Text -> Seq (Maybe Text) -> FMIndexT Source #

Takes a Text and returns the FM-index (FMIndexT).

From FMIndex functions

bytestringFromBWTFromFMIndexB :: FMIndexB -> ByteString Source #

Helper function for converting a BWTed FMIndexB back to the original ByteString.

bytestringFromBWTFromFMIndexT :: FMIndexT -> ByteString Source #

Helper function for converting a BWTed FMIndexT back to the original ByteString.

textFromBWTFromFMIndexB :: FMIndexB -> Text Source #

Helper function for converting a BWTed FMIndexB back to the original Text.

textFromBWTFromFMIndexT :: FMIndexT -> Text Source #

Helper function for converting a BWTed FMIndexT back to the original Text.

textBWTFromFMIndexT :: FMIndexT -> BWT Text Source #

Takes a FMIndexT and returns the BWT of Texts.

textBWTFromFMIndexB :: FMIndexB -> BWT Text Source #

Takes a FMIndexB and returns the BWT of Texts.

textFromFMIndexB :: FMIndexB -> Seq (Maybe Text) Source #

Takes a FMIndexB and returns the original Seq of Texts.

bytestringFromFMIndexB :: FMIndexB -> Seq (Maybe ByteString) Source #

Takes a FMIndexB and returns the original Seq of ByteStrings.

textFromFMIndexT :: FMIndexT -> Seq (Maybe Text) Source #

Takes a FMIndexT and returns the original Seq of Texts.

bytestringFromFMIndexT :: FMIndexT -> Seq (Maybe ByteString) Source #

Takes a FMIndexT and returns the original Seq of ByteStrings.

Count operations

bytestringFMIndexCountS :: [ByteString] -> ByteString -> Seq (ByteString, CIntB) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the number of occurences of the pattern(s) in the input ByteString.

textFMIndexCountS :: [Text] -> Text -> Seq (Text, CIntT) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the number of occurences of the pattern(s) in the input Text.

bytestringFMIndexCountP :: [ByteString] -> ByteString -> Seq (ByteString, CIntB) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the number of occurences of the pattern(s) in the input ByteString. Parallelized over all availible cores.

textFMIndexCountP :: [Text] -> Text -> Seq (Text, CIntT) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the number of occurences of the pattern(s) in the input Text. Parallelized over all availible cores.

Locate operations

bytestringFMIndexLocateS :: [ByteString] -> ByteString -> Seq (ByteString, LIntB) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the indexe(s) of occurences of the pattern(s) in the input ByteString. The output indices are 1-based, and are not sorted.

textFMIndexLocateS :: [Text] -> Text -> Seq (Text, LIntT) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the indexe(s) of occurences of the pattern(s) in the input Text. The output indices are 1-based, and are not sorted.

bytestringFMIndexLocateP :: [ByteString] -> ByteString -> Seq (ByteString, LIntB) Source #

Takes a list of pattern(s) of ByteStrings and an input ByteString and returns the indexe(s) of occurences of the pattern(s) in the input ByteString. The output indices are 1-based, and are not sorted. Parallelized over all availible cores.

textFMIndexLocateP :: [Text] -> Text -> Seq (Text, LIntT) Source #

Takes a list of pattern(s) of Texts and an input Text and returns the indexe(s) of occurences of the pattern(s) in the input Text. The output indices are 1-based, and are not sorted. Parallelized over all availible cores.