Safe Haskell | Safe-Infered |
---|
Parse blast XML output.
If you use a recent version of NCBI BLAST and specify XML output (blastall -m 7),
this module should be able to parse the result into a hierarchical BlastResult
structure.
While the process may consume a bit of memory, the parsing is lazy,
and file sizes of several gigabytes can be parsed (see e.g. the
xml2x tool for an example). To parse XML, we use
TagSoup
.
- readXML :: FilePath -> IO BlastResult
- type SeqId = ByteString
- data Strand
- data Aux
- data BlastResult = BlastResult {
- blastprogram :: !ByteString
- blastversion :: !ByteString
- blastdate :: !ByteString
- blastreferences :: !ByteString
- database :: !ByteString
- dbsequences :: !Integer
- dbchars :: !Integer
- results :: [BlastRecord]
- data BlastRecord = BlastRecord {}
- data BlastHit = BlastHit {}
- data BlastMatch = BlastMatch {}
Documentation
readXML :: FilePath -> IO BlastResultSource
Parse BLAST results in XML format
type SeqId = ByteStringSource
The sequence id, i.e. the first word of the header field.
The Strand
indicates the direction of the match, i.e. the plain sequence or
its reverse complement.
The Aux field in the BLAST output includes match information that depends on the BLAST flavor (blastn, blastx, or blastp). This data structure captures those variations.
data BlastResult Source
A BlastResult
is the root of the hierarchy.
BlastResult | |
|
data BlastRecord Source
Each query sequence generates a BlastRecord
Each match between a query and a target sequence (or subject)
is a BlastHit
.
data BlastMatch Source
A BlastHit
may contain multiple separate matches (typcially when
an indel causes a frameshift that blastx is unable to bridge).