BiobaseInfernal- Infernal data structures and tools



Infernal CMs.



data CM Source

A datatype representing Infernal covariance models. This is a new representation that is incompatible with the one once found in Biobase. The most important difference is that lookups are mapped onto efficient data structures, currently PrimitiveArray.

Each State of a covariance model has up to 6 transition scores, hence we need s*6 cells for transitions.
Each State of a covariance has up to 16 emission scores, so we have s*16 cells for emissions, with unused cells set to a really high score.

On top of these basic structures, we then place additional high-level constructs.

paths are allowed transitions. This can safe a check, if the transition is encoded with a forbidden score.
localBegin and localEnd are local entry and exit strategies. A localBegin is a transition score to certain states, all such transitions are in begins. A localEnd is a transition score to a local end state.

NOTE that trustedCutoff > gathering > noiseCutoff

TODO as with other projects, we should not use Double's but Score and Probability newtypes.




name :: ModelIdentification

name of model as in tRNA

accession :: ModelAccession

RFxxxxx identification

trustedCutoff :: BitScore

lowest score of true member

gathering :: BitScore

all scores at or above gathering score are in the full alignment

noiseCutoff :: Maybe BitScore

highest score NOT included as member

transition :: PrimArray (Int, Int) Double
emission :: PrimArray (Int, Int) Double
paths :: Vector (Vector Double)
localBegin :: Vector Double
begins :: Vector Int
localEnd :: Vector Double
nodes :: Vector (Vector Int)


type ID2CM = Map ModelIdentification CMSource

Map of model names to individual CMs.

type AC2CM = Map ModelAccession CMSource

Map of model accession numbers to individual CMs.