Safe Haskell | None |
---|---|
Language | Haskell2010 |
Parser for FastA/FastQ
, ByteStream
style, written such that it
works well with module Bio.Bam.
Input streams are broken into numbered lines, then into records.
Records can start with empty lines, which are ignored, or random
junk, which is ignored, but results in a warning, followed by a
header indicating either a FastA
(begins with >
or ;
) or
FastQ
record (begins with @
). More description lines begining
with ;
are allowed, and silently ignored. All following lines not
starting with +
, >
, ;
or @
are sequence lines. (Only) in a
FastQ
record, this is followed by a separator line starting with a
+
, which is ignored, and exactly as many quality lines as there
were sequence lines. A missing separator results in a warning and
the record being parsed without quality scores.
In sequence lines, IUPAC-IUB ambiguity codes are converted to
Nucleotides
, white space is skipped silently. Any other character
becomes an unknown base ('=' in SAM) and a warning is emitted. Note
that downstream tools are unlikely to handle the resulting unknown
bases and/or empty records gracefully. If the quality lines do not
have the same total length as the sequence lines (this includes
missing quality lines due to end-of-stream), a warning is emitted,
and the record receives no quality scores (just as if it was a
FastA
record). Else, if the quality lines have a different layout
than the sequence lines, a warning is emitted, but they are still
used.
Quality scores must be stored as raw bytes with offset 33. (Other variants, like 454's ASCII qualities and Solexa's raw bytes with offset 64 are difficult to detect, and extinct in the wild anyway.) If the second word of the header stores multiple fields, we try to extract Illumina's "QC failed" flag and either an index sequence or a read group name from it.
Other flags are commonly encoded into the sequence names. We do not
handle those here, but most of the conventions at MPI EVAN are dealt
with by removeWarts
.
Synopsis
- parseFastq :: MonadLog m => ByteStream m r -> Stream (Of BamRec) m r
- data EmptyRecord = EmptyRecord !Int !Bytes
- data IncoherentQualities = IncoherentQualities !Int !Bytes
- data IncongruentQualities = IncongruentQualities !Int !Bytes
- data JunkFound = JunkFound !Int !Bytes
- data QualitiesMissing = QualitiesMissing !Int !Bytes
- data SequenceHasGaps = SequenceHasGaps !Int !Bytes
Documentation
parseFastq :: MonadLog m => ByteStream m r -> Stream (Of BamRec) m r Source #
data EmptyRecord Source #
Instances
Show EmptyRecord Source # | |
Defined in Bio.Bam.Fastq showsPrec :: Int -> EmptyRecord -> ShowS # show :: EmptyRecord -> String # showList :: [EmptyRecord] -> ShowS # | |
Exception EmptyRecord Source # | |
Defined in Bio.Bam.Fastq |
data IncoherentQualities Source #
Emitted when a quality record does not fit the sequence record.
Instances
Show IncoherentQualities Source # | |
Defined in Bio.Bam.Fastq showsPrec :: Int -> IncoherentQualities -> ShowS # show :: IncoherentQualities -> String # showList :: [IncoherentQualities] -> ShowS # | |
Exception IncoherentQualities Source # | |
Defined in Bio.Bam.Fastq |
data IncongruentQualities Source #
Emitted when a quality record has different layout than the sequence.
Instances
Show IncongruentQualities Source # | |
Defined in Bio.Bam.Fastq showsPrec :: Int -> IncongruentQualities -> ShowS # show :: IncongruentQualities -> String # showList :: [IncongruentQualities] -> ShowS # | |
Exception IncongruentQualities Source # | |
Defined in Bio.Bam.Fastq |
Emitted when random text is found instead of a header.
Instances
Show JunkFound Source # | |
Exception JunkFound Source # | |
Defined in Bio.Bam.Fastq toException :: JunkFound -> SomeException # fromException :: SomeException -> Maybe JunkFound # displayException :: JunkFound -> String # |
data QualitiesMissing Source #
Emitted when a quality separator was expected, but not found.
Instances
Show QualitiesMissing Source # | |
Defined in Bio.Bam.Fastq showsPrec :: Int -> QualitiesMissing -> ShowS # show :: QualitiesMissing -> String # showList :: [QualitiesMissing] -> ShowS # | |
Exception QualitiesMissing Source # | |
Defined in Bio.Bam.Fastq |
data SequenceHasGaps Source #
Emitted when a sequence record contains strange characters
Instances
Show SequenceHasGaps Source # | |
Defined in Bio.Bam.Fastq showsPrec :: Int -> SequenceHasGaps -> ShowS # show :: SequenceHasGaps -> String # showList :: [SequenceHasGaps] -> ShowS # | |
Exception SequenceHasGaps Source # | |
Defined in Bio.Bam.Fastq |