BiobaseFasta-0.3.0.0: streaming FASTA parser

Safe HaskellNone
LanguageHaskell2010

Biobase.Fasta.Streaming

Description

Streaming Fasta handling via the streaming library.

The functions in here should be streaming in constant memory.

TODO Check if this is actually true with some unit tests.

Synopsis

Documentation

data FindHeader Source #

Control structure for streamingFasta.

Constructors

FindHeader 

Fields

HasHeader 

Fields

fastaUid :: Lens' (SequenceIdentifier w) ByteString Source #

lens into the unique id / first word of the header.

streamingFasta Source #

Arguments

:: Monad m 
=> HeaderSize

Maximal length of the header. Ok to set to 20 000, only guards against an extremely long header line.

-> OverlapSize

How much of the current size to carry over to the next step. Even if set larger than current size, it will only be at most current size. (But see todo at overlappedFasta)

-> CurrentSize

The size of each window to be processed.

-> ByteString m r

A streaming bytestring of Fasta files.

-> Stream (Of (BioSequenceWindow w ty k)) m r

The outgoing stream of Current windows being processed.

Fully stream a fasta file, making sure to never exceed a constant amount of memory. The go function yields values of type a down the line for continued streaming.

r4 = toList . streamingFasta (HeaderSize 2) (OverlapSize 1) (CurrentSize 2) go . S8.fromStrict $ BS.pack t0
 where go (Header h) (Overlap o) (Current c) = yield (h,o,c)