| Safe Haskell | None |
|---|---|
| Language | Haskell2010 |
Biobase.Fasta.Streaming
Description
Streaming Fasta handling via the streaming library.
The functions in here should be streaming in constant memory.
A typical, slightly complicated is this:
forEach :: forall r . Stream (ByteString m) m r -> m (Stream (Of ()) m r)
forEach dna = do
-- extract the header, but at most 123 characters, dropping the rest
hdr SP.:> dta ← extractHeader (Just 123) dna
-- create windows ws of a particular type. Include the prefix, the suffix, and make each window 10 characters long
let ws = (streamedWindows True True (Just 10) (SequenceIdentifier hdr) PlusStrand dta :: SP.Stream (SP.Of (BioSequenceWindow DNA DNA 0)) m r)
-- count the number of characters in dna, get the return value, print each window
count SP.:> r ← SP.mapM_ (liftIO . print) . bswSeqLength $ SP.copy ws
liftIO $ print count
liftIO $ putStrLn ""
-- yield one vacuous () result, return the remainder r from dna.
return $ SP.yield () *> return r
TODO Check if this is actually true with some unit tests.
Synopsis
- streamedFasta :: Monad m => ByteStream m r -> Stream (Stream (ByteStream m) m) m r
- streamOfStreamedFasta :: forall m r. Monad m => ByteStream m r -> Stream (Stream (ByteStream m) m) m r
- splitFasta :: Monad m => Stream (ByteStream m) m r -> Stream (ByteStream m) m (Stream (ByteStream m) m r)
- collapseData :: Monad m => Stream (ByteStream m) m r -> Stream (ByteStream m) m r
- reChunkBS :: Monad m => Int -> Stream (ByteStream m) m r -> Stream (ByteStream m) m r
- chunksToWindows :: Monad m => SequenceIdentifier w -> Strand -> Stream (ByteStream m) m r -> Stream (Of (Location w FwdPosition (BioSequence ty))) m r
- streamedWindows :: Monad m => Maybe Int -> Maybe Int -> Maybe Int -> SequenceIdentifier w -> Strand -> Stream (ByteStream m) m r -> Stream (Of (PIS w FwdPosition (BioSequence ty))) m r
- streamLocationLength :: (Monad m, ModifyLocation posTy seqTy) => Stream (Of (Location i posTy seqTy)) m r -> m (Of Int r)
- extractHeader :: Monad m => Maybe Int -> Stream (ByteStream m) m r -> m (Of ByteString (Stream (ByteStream m) m r))
Documentation
streamedFasta :: Monad m => ByteStream m r -> Stream (Stream (ByteStream m) m) m r Source #
streamOfStreamedFasta Source #
Arguments
| :: forall m r. Monad m | |
| => ByteStream m r | |
| -> Stream (Stream (ByteStream m) m) m r |
Here each individual fasta file will be a stream.
TODO Once this works, streamingFasta should be S.concats . streamOfStreamedFasta ...
splitFasta :: Monad m => Stream (ByteStream m) m r -> Stream (ByteStream m) m (Stream (ByteStream m) m r) Source #
Given a 'Stream (ByteString m) m r' which is a Stream of lines, split
off the first Fasta entry.
collapseData :: Monad m => Stream (ByteStream m) m r -> Stream (ByteStream m) m r Source #
reChunkBS :: Monad m => Int -> Stream (ByteStream m) m r -> Stream (ByteStream m) m r Source #
Rechunk a stream of bytestrings.
chunksToWindows :: Monad m => SequenceIdentifier w -> Strand -> Stream (ByteStream m) m r -> Stream (Of (Location w FwdPosition (BioSequence ty))) m r Source #
Assuming a "rechunked" stream of bytestrings, create sequence windows.
Arguments
| :: Monad m | |
| => Maybe Int | |
| -> Maybe Int | |
| -> Maybe Int | desired size or a single huge |
| -> SequenceIdentifier w | |
| -> Strand | |
| -> Stream (ByteStream m) m r | |
| -> Stream (Of (PIS w FwdPosition (BioSequence ty))) m r |
Make it possible to take a fasta stream and produce a stream of
BioSequenceWindows. This is a convenience function around
'withSuffix . withPrefix . chunksToWindows . reChunks'.
In case of a Nothing window size, a single huge Fasta entry is produced
(and materialized!).
TODO In case of Nothing window size, we use the collapseData function
which has one check too many, and will be slightly slower. However, the
check should be once per ByteString.
streamLocationLength :: (Monad m, ModifyLocation posTy seqTy) => Stream (Of (Location i posTy seqTy)) m r -> m (Of Int r) Source #
Get the full length of a stream of BioSequenceWindows, counted in
characters in each bswSequence.
To use, start with bswSeqLength $ SP.copy xs. Then consume this stream
normally. It still provides a Stream of BioSequenceWindowss. However,
the return type is now not just r, but it provides Int SP.:> r, where
the Int provides the total length of characters within this Fasta entry.
This value may then be used to fully update negative strand information.
extractHeader :: Monad m => Maybe Int -> Stream (ByteStream m) m r -> m (Of ByteString (Stream (ByteStream m) m r)) Source #
As a first function, the header should be extracted from a Fasta stream. Since headers may be
malformed / malicious, we make it possible to