bioinformatics-toolkit-0.10.0: A collection of bioinformatics tools
Safe HaskellNone
LanguageHaskell2010

Bio.Data.Bed.Utils

Synopsis

Documentation

fetchSeq :: BioSeq DNA a => Genome -> BED -> IO (Either String (DNA a)) Source #

retreive sequences

clipBed Source #

Arguments

:: (BEDLike b, Monad m) 
=> [(ByteString, Int)]

Chromosome sizes

-> ConduitT b b m () 

data CutoffMotif Source #

Motif with predefined cutoff score. All necessary intermediate data structure for motif scanning are stored.

mkCutoffMotif Source #

Arguments

:: Bkgd 
-> Double

p-value

-> Motif 
-> CutoffMotif 

scanMotif :: (BEDLike b, MonadIO m) => Genome -> [CutoffMotif] -> ConduitT b BED m () Source #

Motif score is in [0, 1000]: ( 1 / (1 + exp (-(-logP - 5))) ) * 1000.

monoColonalize :: Monad m => ConduitT BED BED m () Source #

process a sorted BED stream, keep only mono-colonal tags

baseMap Source #

Arguments

:: PrimMonad m 
=> [(ByteString, Int)]

chromosomes and their sizes

-> ConduitT BED o m BaseMap 

Count the tags (starting positions) at each position in the genome.

rpkmBed :: (PrimMonad m, BEDLike b, Vector v Double) => [b] -> ConduitT BED o m (v Double) Source #

calculate RPKM on a set of unique regions. Regions (in bed format) would be kept in memory but not tag file. RPKM: Readcounts per kilobase per million reads. Only counts the starts of tags

rpkmSortedBed :: (PrimMonad m, BEDLike b, Vector v Double) => Sorted (Vector b) -> ConduitT BED o m (v Double) Source #

calculate RPKM on a set of regions. Regions must be sorted. The Sorted data type is used to remind users to sort their data.

countTagsBed :: (PrimMonad m, BEDLike b, Vector v Int) => [b] -> ConduitT BED o m (v Int, Int) Source #

countTagsBinBed Source #

Arguments

:: (Integral a, PrimMonad m, Vector v a, BEDLike b) 
=> Int

bin size

-> [b]

regions

-> ConduitT BED o m ([v a], Int) 

divide each region into consecutive bins, and count tags for each bin and return the number of all tags. Note: a tag is considered to be overlapped with a region only if the starting position of the tag is in the region. For the common sense overlapping, use countTagsBinBed'.

countTagsBinBed' Source #

Arguments

:: (Integral a, PrimMonad m, Vector v a, BEDLike b1, BEDLike b2) 
=> Int

bin size

-> [b1]

regions

-> ConduitT b2 o m ([v a], Int) 

Same as countTagsBinBed, except that tags are treated as complete intervals instead of single points.

peakCluster Source #

Arguments

:: (BEDLike b, Monad m) 
=> [b]

peaks

-> Int

radius

-> Int

cutoff

-> ConduitT o BED m () 

cluster peaks