text-compression-0.1.0.16: A text compression library.
Copyright(c) Matthew Mosior 2022
LicenseBSD-style
Maintainermattm.github@gmail.com
Portabilityportable
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.FMIndex

Description

Full-text Minute-space index (FM-index)

Users will get the most mileage by first compressing to a BWT on the initial ByteString or Text input before compressing to a FMIndexB or FMIndexT.

To do this, users can use the bytestringToBWTToFMIndexB and bytestringToBWTToFMIndexT functions, as well as the textToBWTToFMIndexB and textToBWTToFMIndexT functions.

The base functions for ByteString, bytestringToFMIndexB and bytestringToFMIndexT can be used to convert a Seq (Maybe ByteString) to a FMIndexB and FMIndexT, respectively.

Likewise, the base functions for Text, textToFMIndexB and textToFMIndexT can be used to convert a Seq (Maybe Text) to a FMIndexB and FMIndexT respectively.

There are various other lower-level functions for interacting with the FMIndex implementation on ByteString and Text as well.

Operations

The count operation is supported by the countFMIndexB function for ByteStrings and the countFMIndexT function for Text.

Internal

Data.FMIndex.Internal contains efficient and stateful implementations of the FMIndex and Inverse FMIndex algorithms.

Synopsis

To FMIndex functions

bytestringToBWTToFMIndexB :: ByteString -> FMIndexB Source #

Helper function for converting a ByteString to a FMIndexB via a BWT first.

bytestringToBWTToFMIndexT :: ByteString -> FMIndexT Source #

Helper function for converting a ByteString to a FMIndexT via a BWT first.

textToBWTToFMIndexB :: Text -> FMIndexB Source #

Helper function for converting a Text to a FMIndexB via a BWT first.

textToBWTToFMIndexT :: Text -> FMIndexT Source #

Helper function for converting a Text to a FMIndexT via a BWT first.

textBWTToFMIndexB :: BWTMatrix Word8 -> TextBWT -> FMIndexB Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

bytestringBWTToFMIndexB :: BWTMatrix Word8 -> BWT Word8 -> FMIndexB Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

textBWTToFMIndexT :: BWTMatrix Word8 -> TextBWT -> FMIndexT Source #

Take a BWT of Word8s and generate the FM-index (FMIndexB).

bytestringBWTToFMIndexT :: BWTMatrix Word8 -> BWT Word8 -> FMIndexT Source #

Take a BWT of Word8s and generate the FM-index (FMIndexT).

textToFMIndexB :: BWTMatrix Text -> Seq (Maybe Text) -> FMIndexB Source #

Takes a Text and returns the FM-index (FMIndexB).

textToFMIndexT :: BWTMatrix Text -> Seq (Maybe Text) -> FMIndexT Source #

Takes a Text and returns the FM-index (FMIndexT).

From FMIndex functions

bytestringFromBWTFromFMIndexB :: FMIndexB -> ByteString Source #

Helper function for converting a BWTed FMIndexB back to the original ByteString.

bytestringFromBWTFromFMIndexT :: FMIndexT -> ByteString Source #

Helper function for converting a BWTed FMIndexT back to the original ByteString.

textFromBWTFromFMIndexB :: FMIndexB -> Text Source #

Helper function for converting a BWTed FMIndexB back to the original Text.

textFromBWTFromFMIndexT :: FMIndexT -> Text Source #

Helper function for converting a BWTed FMIndexT back to the original Text.

textBWTFromFMIndexT :: FMIndexT -> BWT Text Source #

Takes a FMIndexT and returns the BWT of Texts.

textBWTFromFMIndexB :: FMIndexB -> BWT Text Source #

Takes a FMIndexB and returns the BWT of Texts.

textFromFMIndexB :: FMIndexB -> Seq (Maybe Text) Source #

Takes a FMIndexB and returns the original Seq of Texts.

bytestringFromFMIndexB :: FMIndexB -> Seq (Maybe ByteString) Source #

Takes a FMIndexB and returns the original Seq of ByteStrings.

textFromFMIndexT :: FMIndexT -> Seq (Maybe Text) Source #

Takes a FMIndexT and returns the original Seq of Texts.

bytestringFromFMIndexT :: FMIndexT -> Seq (Maybe ByteString) Source #

Takes a FMIndexT and returns the original Seq of ByteStrings.

Count operations

bytestringFMIndexCount :: ByteString -> ByteString -> CIntB Source #

Takes a pattern (ByteString) and an input ByteString and returns the number of occurences of the pattern in the input ByteString.

textFMIndexCount :: Text -> Text -> CIntT Source #

Takes a pattern (Text) and an input Text and returns the number of occurences of the pattern in the input Text.