This module implements a hierarchical data structure for BLAST results, there is an alternative flat structure in the Bio.Alignment.BlastFlat module.
BLAST is a tool for searching in (biological) sequences for similarity. This library is tested against NCBI-blast version 2.2.14. There exist several independent versions of BLAST, so expect some incompatbilities if you're using a different BLAST version.
For parsing BLAST results, the XML format (blastall -m 7) is by far the most robust choice, and is implemented in the Bio.Alignment.BlastXML module.
The format is straightforward (and non-recursive). For more information on BLAST, check http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html
- type SeqId = ByteString
- data Strand
- data Aux
- data BlastResult = BlastResult {
- blastprogram :: !ByteString
- blastversion :: !ByteString
- blastdate :: !ByteString
- blastreferences :: !ByteString
- database :: !ByteString
- dbsequences :: !Integer
- dbchars :: !Integer
- results :: [BlastRecord]
- data BlastRecord = BlastRecord {}
- data BlastHit = BlastHit {}
- data BlastMatch = BlastMatch {}
Documentation
type SeqId = ByteStringSource
The sequence id, i.e. the first word of the header field.
The Strand
indicates the direction of the match, i.e. the plain sequence or
its reverse complement.
The Aux field in the BLAST output includes match information that depends on the BLAST flavor (blastn, blastx, or blastp). This data structure captures those variations.
data BlastResult Source
A BlastResult
is the root of the hierarchy.
BlastResult | |
|
data BlastRecord Source
Each query sequence generates a BlastRecord
Each match between a query and a target sequence (or subject)
is a BlastHit
.
data BlastMatch Source
A BlastHit
may contain multiple separate matches (typcially when
an indel causes a frameshift that blastx is unable to bridge).