Data structures and helper functions for calculating alignments
There are two ways to view an alignment: either as a list of edits (i.e., insertions, deletions, or substitutions), or as a set of sequences with inserted gaps.
The edit list approach is perhaps more restrictive model but doesn't generalize to multiple alignments.
The gap approach is more general, and probably more commonly used by other software (see e.g. the ACE file format).
- data Sequence = Seq SeqLabel SeqData (Maybe QualData)
- type Gaps = [Offset]
- type Alignment = [(Offset, Strand, Sequence, Gaps)]
- extractGaps :: SeqData -> (SeqData, Gaps)
- insertGaps :: Char -> (SeqData, Gaps) -> SeqData
- data Edit
- type EditList = [Edit]
- type SubstMx t a = (Chr, Chr) -> a
- type Selector a = [(a, Edit)] -> a
- type Chr = Word8
- columns :: Selector a -> a -> Sequence -> Sequence -> [[a]]
- eval :: SubstMx t a -> a -> Edit -> a
- isRepl :: Edit -> Bool
- on :: (t1 -> t1 -> t) -> (t2 -> t1) -> t2 -> t2 -> t
- showalign :: EditList -> [Char]
- toStrings :: EditList -> (String, String)
Data types for gap-based alignemnts
Gaps are coded as
*s, this function removes them, and returns
the sequence along with the list of gap positions.
note that gaps are positioned relative to the *gapped* sequence
(contrast to stmassembler/Cluster.hs)
Data types for edit-based alignments
An Edit is either the insertion, the deletion, or the replacement of a character.
A substitution matrix gives scores for replacing a character with another. Typically, it will be symmetric. It is type-tagged with the alphabet - Nuc or Amino.
A Selector consists of a zero element, and a funcition that chooses a possible Edit operation, and generates an updated result.
Calculate a set of columns containing scores This represents the columns of the alignment matrix, but will only require linear space for score calculation.