bio-0.5.3: A bioinformatics library

Safe HaskellSafe-Inferred




Data structures and helper functions for calculating alignments

There are two ways to view an alignment: either as a list of edits (i.e., insertions, deletions, or substitutions), or as a set of sequences with inserted gaps.

The edit list approach is perhaps more restrictive model but doesn't generalize to multiple alignments.

The gap approach is more general, and probably more commonly used by other software (see e.g. the ACE file format).


Data types for gap-based alignemnts

data Dir Source




Helper functions

extractGaps :: SeqData -> (SeqData, Gaps)Source

Gaps are coded as *s, this function removes them, and returns the sequence along with the list of gap positions. note that gaps are positioned relative to the *gapped* sequence (contrast to stmassembler/Cluster.hs)

Data types for edit-based alignments

data Edit Source

An Edit is either the insertion, the deletion, or the replacement of a character.


Ins Chr 
Del Chr 
Repl Chr Chr 


type EditList = [Edit]Source

An alignment is a sequence of edits.

type SubstMx t a = (Chr, Chr) -> aSource

A substitution matrix gives scores for replacing a character with another. Typically, it will be symmetric. It is type-tagged with the alphabet - Nuc or Amino.

type Selector a = [(a, Edit)] -> aSource

A Selector consists of a zero element, and a funcition that chooses a possible Edit operation, and generates an updated result.

type Chr = Word8Source

The sequence element type, used in alignments.

Helper functions

columns :: Selector a -> a -> Sequence b -> Sequence b -> [[a]]Source

Calculate a set of columns containing scores This represents the columns of the alignment matrix, but will only require linear space for score calculation.

eval :: SubstMx t a -> a -> Edit -> aSource

Evaluate an Edit based on SubstMx and gap penalty

isRepl :: Edit -> BoolSource

True if the Edit is a Repl.

on :: (t1 -> t1 -> t) -> (t2 -> t1) -> t2 -> t2 -> tSource

toStrings :: EditList -> (String, String)Source

turn an alignment into sequences with - representing gaps (for checking, filtering out the - characters should return the original sequences, provided - isn't part of the sequence alphabet)