bio-0.4.7: A bioinformatics library

Bio.Alignment.BlastData

Description

This module implements a hierarchical data structure for BLAST results, there is an alternative flat structure in the Bio.Alignment.BlastFlat module.

BLAST is a tool for searching in (biological) sequences for similarity. This library is tested against NCBI-blast version 2.2.14. There exist several independent versions of BLAST, so expect some incompatbilities if you're using a different BLAST version.

For parsing BLAST results, the XML format (blastall -m 7) is by far the most robust choice, and is implemented in the Bio.Alignment.BlastXML module.

The format is straightforward (and non-recursive). For more information on BLAST, check http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

Synopsis

Documentation

type SeqId = ByteStringSource

The sequence id, i.e. the first word of the header field.

data Strand Source

The Strand indicates the direction of the match, i.e. the plain sequence or its reverse complement.

Constructors

Plus 
Minus 

data Aux Source

The Aux field in the BLAST output includes match information that depends on the BLAST flavor (blastn, blastx, or blastp). This data structure captures those variations.

Constructors

Strands !Strand !Strand

blastn

Frame !Strand !Int

blastx

Instances

data BlastRecord Source

Each query sequence generates a BlastRecord

Constructors

BlastRecord 

Fields

query :: !SeqId
 
qlength :: !Int
 
hits :: [BlastHit]
 

Instances

data BlastHit Source

Each match between a query and a target sequence (or subject) is a BlastHit.

Constructors

BlastHit 

Fields

subject :: !SeqId
 
slength :: !Int
 
matches :: [BlastMatch]
 

Instances

data BlastMatch Source

A BlastHit may contain multiple separate matches (typcially when an indel causes a frameshift that blastx is unable to bridge).

Constructors

BlastMatch 

Fields

bits :: !Double
 
e_val :: !Double
 
identity :: (Int, Int)
 
q_from :: !Int
 
q_to :: !Int
 
h_from :: !Int
 
h_to :: !Int
 
aux :: !Aux
 

Instances