blastxml-0.2: Library for reading Blast XML output

Safe HaskellSafe-Infered

Bio.BlastXML

Description

Parse blast XML output.

If you use a recent version of NCBI BLAST and specify XML output (blastall -m 7), this module should be able to parse the result into a hierarchical BlastResult structure.

While the process may consume a bit of memory, the parsing is lazy, and file sizes of several gigabytes can be parsed (see e.g. the xml2x tool for an example). To parse XML, we use TagSoup.

Synopsis

Documentation

readXML :: FilePath -> IO BlastResultSource

Parse BLAST results in XML format

type SeqId = ByteStringSource

The sequence id, i.e. the first word of the header field.

data Strand Source

The Strand indicates the direction of the match, i.e. the plain sequence or its reverse complement.

Constructors

Plus 
Minus 

data Aux Source

The Aux field in the BLAST output includes match information that depends on the BLAST flavor (blastn, blastx, or blastp). This data structure captures those variations.

Constructors

Strands !Strand !Strand

blastn

Frame !Strand !Int

blastx

Instances

data BlastRecord Source

Each query sequence generates a BlastRecord

Constructors

BlastRecord 

Fields

query :: !SeqId
 
qlength :: !Int
 
hits :: [BlastHit]
 

Instances

data BlastHit Source

Each match between a query and a target sequence (or subject) is a BlastHit.

Constructors

BlastHit 

Fields

subject :: !SeqId
 
slength :: !Int
 
matches :: [BlastMatch]
 

Instances

data BlastMatch Source

A BlastHit may contain multiple separate matches (typcially when an indel causes a frameshift that blastx is unable to bridge).

Constructors

BlastMatch 

Fields

bits :: !Double
 
e_val :: !Double
 
identity :: (Int, Int)
 
q_from :: !Int
 
q_to :: !Int
 
h_from :: !Int
 
h_to :: !Int
 
qseq :: !ByteString
 
hseq :: !ByteString
 
aux :: !Aux
 

Instances