bioinformatics-toolkit-0.3.2: A collection of bioinformatics tools

Safe HaskellNone





data PWM Source #

k x 4 position weight matrix for motifs




subPWM :: Int -> Int -> PWM -> PWM Source #

Extract sub-PWM given starting position and length, zero indexed.

rcPWM :: PWM -> PWM Source #

Reverse complementary of PWM.

gcContentPWM :: PWM -> Double Source #

GC content of PWM.

newtype Bkgd Source #

background model which consists of single nucletide frequencies, and di-nucletide frequencies.


BG (Double, Double, Double, Double) 


Default Bkgd Source # 


def :: Bkgd #

toPWM :: [ByteString] -> PWM Source #

Get pwm from a matrix.

ic :: PWM -> Int -> Double Source #

Information content of a poistion in pwm. (Not implemented)

scores :: Bkgd -> PWM -> DNA a -> [Double] Source #

Get scores of a long sequences at each position.

scores' :: Monad m => Bkgd -> PWM -> DNA a -> Source m Double Source #

A streaming version of scores.

score :: Bkgd -> PWM -> DNA a -> Double Source #

optimalScore :: Bkgd -> PWM -> Double Source #

The best possible matching score of a pwm.

newtype CDF Source #

The cumulative distribution function in the form of (x, P(X <= x)).


CDF (Vector (Double, Double)) 

cdf' :: CDF -> Double -> Double Source #

The inverse of cdf.

truncateCDF :: Double -> CDF -> CDF Source #

Truncate the CDF by a value, in order to reduce the memory usage.

scoreCDF :: Bkgd -> PWM -> CDF Source #

Approximate the cdf of motif matching scores

pValueToScore :: Double -> Bkgd -> PWM -> Double Source #

calculate the minimum motif mathching score that would produce a kmer with p-Value less than the given number. This score would then be used to search for motif occurrences with significant p-Value

pValueToScoreExact Source #


:: Double

desirable p-Value

-> Bkgd 
-> PWM 
-> Double 

Unlike pValueToScore, this version compute the exact score but much slower and is inpractical for long motifs.

toIUPAC :: PWM -> DNA IUPAC Source #

Convert pwm to consensus sequence, see D. R. Cavener (1987).