bio-0.5.3: A bioinformatics library

Safe HaskellSafe-Inferred

Bio.Alignment.QAlign

Contents

Description

Quality-aware alignments

Generally, quality data are ignored for alignment/pattern searching like Smith-Waterman, Needleman-Wunsch, or BLAST(p|n|x). I believe that accounting for quality will at the very least affect things like BLAST statistics, and e.g. is crucial for good EST annotation using Blastx.

This module performs sequences alignments, takes quality values into account.

See also http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn052v1.

Synopsis

Smith-Waterman

Locally optimal alignment with affine gaps, i.e. best infix match.

local_score :: QualMx t Double -> (Double, Double) -> Sequence t -> Sequence t -> DoubleSource

Calculate local edit distance (Smith-Waterman alignment score)

local_align :: QualMx t Double -> (Double, Double) -> Sequence t -> Sequence t -> (Double, EditList)Source

Calculate local alignment (Smith-Waterman) (can we replace uncurry max' with fst - a local alignment must always end on a subst, no?)

Needleman-Wunsch

Globally optimal alignment with affine gaps, the whole sequences are matched.

global_score :: QualMx t Double -> (Double, Double) -> Sequence t -> Sequence t -> DoubleSource

Calculate global edit distance (Needleman-Wunsch alignment score)

global_align :: QualMx t Double -> (Double, Double) -> Sequence t -> Sequence t -> (Double, EditList)Source

Calculate global alignment (Needleman-Wunsch)

Overlapping alignment.

The suffix of one sequence matches a prefix of another.

overlap_score :: QualMx t Double -> (Double, Double) -> Sequence t -> Sequence t -> DoubleSource

Calucalte best overlap score, where gaps at the edges are free The starting point is like for local score (0 cost for initial indels), the result is the maximum anywhere in the last column or bottom row of the matrix.

overlap_align :: QualMx t Double -> (Double, Double) -> Sequence t -> Sequence t -> (Double, EditList)Source

Calucalte best overlap score, where gaps at the edges are free The starting point is like for local score (0 cost for initial indels), the result is the maximum anywhere in the last column or bottom row of the matrix.

Matrix construction

Interactive testing of alignments