BiobaseFasta- Iteratee-based FASTA parser

Safe HaskellSafe-Infered





type FastaFunction zSource


 = FastaHeader

the > header

-> StartPos

where in the original sequence to start

-> WindowSize

how many characters we are looking at

-> PeekSize

this many characters are from the next window (peeking into)

-> TrailSequence

trailing last window-size characters

-> FastaData

the actual sequence data

-> z

and what we return as result

This is the type of the conversion function from FASTA data to the data z. Make certain that all input is used strictly! BangPatterns are the easiest to do. In order, the function expects the current FASTA header, then a data segment, and finally the starting position of the data segment within the full FASTA data.

If you need the conversion to run in constant time, do not use the convenience functions and replace the final conversion to a strict stream by your own conversion (or output) function.

type StartPos = IntSource

Starting position in FASTA entry.

type FastaHeader = ByteStringSource

Current header (the line starting with >)

type FastaData = ByteStringSource

FASTA data

type WindowSize = IntSource


type PeekSize = IntSource

How many characters to peek forward

type TrailSequence = ByteStringSource

Last window-size characters as a bytestring

conversion from FASTA to data of type z.

rollingIter :: (Monad m, Functor m, Nullable z, Monoid z) => (StartPos -> WindowSize -> PeekSize -> TrailSequence -> FastaData -> z) -> WindowSize -> PeekSize -> Enumeratee ByteString z m aSource

Takes a bytestring sequence, applies f to each bytestring of windowsize and returns the results z.

eneeFasta :: (Monad m, Functor m, Nullable z, NullPoint z, Monoid z) => FastaFunction z -> WindowSize -> PeekSize -> Enumeratee ByteString z m aSource

Outer enumeratee. See the two convenience functions for how to use it (just like any enumeratee, basically).

The fasta function f manipulates small stretches of fasta data and has arguments: fasta header, fasta data, start position (all filled by eneeFasta).

Next we have the window size, how many characters to read at once,

followed by the the number of characters to read in addition.

The work is actually done by rollingIter.

Convenience functions: final data is returned strictly.

fromFile :: (Monoid z, Nullable z) => FastaFunction z -> Int -> Int -> FilePath -> IO zSource

From an uncompressed file.

fromFileZip :: (Monoid z, Nullable z) => FastaFunction z -> Int -> Int -> FilePath -> IO zSource

From a gzip-compressed file.