biohazard-1.0.3: bioinformatics support library

Description

Parsers for BAM and SAM.

TONOTDO:

• Reader for gzipped/bzipped/bgzf'ed SAM. Storing SAM is a bad idea, so why would anyone ever want to compress, much less index it?
• Proper support for the "=" symbol. It's completely alien to the ususal representation of sequences.
Synopsis

# Documentation

data Block Source #

One BGZF block: virtual offset and contents. Could also be a block of an uncompressed file, if we want to support indexing of uncompressed BAM or some silliness like that.

Constructors

 Block Fields
Instances
 Source # Instance detailsMethods(<>) :: Block -> Block -> Block #stimes :: Integral b => b -> Block -> Block # Source # Instance detailsMethodsmappend :: Block -> Block -> Block #mconcat :: [Block] -> Block # Source # Instance detailsMethods Source # Instance detailsMethods

Decompress a BGZF stream into a stream of Bytess.

Like compressBgzf', with sensible defaults.

decodeBam :: Monad m => (BamMeta -> Iteratee [BamRaw] m a) -> Iteratee Block m (Iteratee [BamRaw] m a) Source #

Decode a BAM stream into raw entries. Note that the entries can be unpacked using decodeBamEntry. Also note that this is an Enumeratee in spirit, only the BamMeta and Refs need to get passed separately.

Checks if a file contains BAM in any of the common forms, then decompresses it appropriately. We support plain BAM, Bgzf'd BAM, and Gzip'ed BAM.

The recommendation for these functions is to use decodeAnyBam (or decodeAnyBamFile) for any code that can handle BamRaw input, but decodeAnyBamOrSam (or decodeAnyBamOrSamFile) for code that needs BamRec. That way, SAM is supported automatically, and seeking will be supported if possible.

isBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #

Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses iLookAhead internally, so it shouldn't consume anything from the stream.

isPlainBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #

Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses iLookAhead internally, so it shouldn't consume anything from the stream.

isGzipBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #

Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses iLookAhead internally, so it shouldn't consume anything from the stream.

isBgzfBam :: MonadIO m => Iteratee Bytes m (Maybe (BamrawEnumeratee m a)) Source #

Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses iLookAhead internally, so it shouldn't consume anything from the stream.

decodeSam :: Monad m => (BamMeta -> Iteratee [BamRec] m a) -> Iteratee Bytes m (Iteratee [BamRec] m a) Source #

Iteratee-style parser for SAM files, designed to be compatible with the BAM parsers. Parses plain uncompressed SAM, nothing else. Since it is supposed to work the same way as the BAM parser, it requires the presense of the SQ header lines. These are stripped from the header text and turned into the symbol table.

decodeSam' :: Monad m => Refs -> Enumeratee Bytes [BamRec] m a Source #

Parser for SAM that doesn't look for a header. Has the advantage that it doesn't stall on a pipe that never delivers data. Has the disadvantage that it never reads the header and therefore needs a list of allowed RNAMEs.

Checks if a file contains BAM in any of the common forms, then decompresses it appropriately. If the stream doesn't contain BAM at all, it is instead decoded as SAM. Since SAM is next to impossible to recognize reliably, we don't even try. Any old junk is decoded as SAM and will fail later.