bio-0.4.6: A bioinformatics librarySource codeContentsIndex
Bio.Sequence.SFF_filters
Contents
Discarding filters **
Trimming filters **
Utility functions **
Description
This implements a number of filters used in the Titanium pipeline
Synopsis
type DiscardFilter = ReadBlock -> Bool
filter_mixed :: DiscardFilter
filter_key :: DiscardFilter
filter_empty :: DiscardFilter
filter_dots :: DiscardFilter
filter_length :: Int -> DiscardFilter
type TrimFilter = ReadBlock -> ReadBlock
filter_qual20 :: TrimFilter
filter_sigint :: TrimFilter
sigint :: ReadBlock -> Int
qual20 :: ReadBlock -> Int
dlength :: [a] -> Double
avg :: Integral a => [a] -> Double
clipFlows :: ReadBlock -> Int -> ReadBlock
clipSeq :: ReadBlock -> Int -> ReadBlock
Discarding filters **
type DiscardFilter = ReadBlock -> BoolSource
DiscardFilters determine whether a read is to be retained or discarded
filter_mixed :: DiscardFilterSource
filter_key :: DiscardFilterSource
filter_empty :: DiscardFilterSource
filter_dots :: DiscardFilterSource
filter_length :: Int -> DiscardFilterSource

3.2.2.1.2 The dots filter discards sequences where the last positive flow is before flow 84, and flows with >5% dots (i.e. three successive noise values) before the last postitive flow. (Interpreted as 5% of called sequence length is Ns?)

3.2.2.1.3 The mixed filter discards sequences with more than 70% positive flows. Also, discard with 30% noise, 20% middle (0.45..0.75) or <30% positive.

Discard a read if the number of untrimmed flows is less than n (n=186 for Titanium)

Trimming filters **
type TrimFilter = ReadBlock -> ReadBlockSource
TrimFilters modify the read, typically trimming it for quality
filter_qual20 :: TrimFilterSource
filter_sigint :: TrimFilterSource
sigint :: ReadBlock -> IntSource
3.2.2.1.4 Signal intensity trim - trim back until <3% borderline flows (0.5..0.7). Then trim borderline values or dots from the end (use a window).
qual20 :: ReadBlock -> IntSource
3.2.2.1.7 Quality score trimming trims using a 10-base window until a Q20 average is found.
Utility functions **
dlength :: [a] -> DoubleSource
List length as a double (eliminates many instances of fromIntegral)
avg :: Integral a => [a] -> DoubleSource
Calculate average of a list
clipFlows :: ReadBlock -> Int -> ReadBlockSource
Translate a number of flows to position in sequence, and update clipping data accordingly
clipSeq :: ReadBlock -> Int -> ReadBlockSource
Update clip_qual_right if more severe than previous value
Produced by Haddock version 2.6.1