This implements a number of filters used in the Titanium pipeline
- type DiscardFilter = ReadBlock -> Bool
- filter_mixed :: DiscardFilter
- filter_key :: DiscardFilter
- filter_empty :: DiscardFilter
- filter_dots :: DiscardFilter
- filter_length :: Int -> DiscardFilter
- type TrimFilter = ReadBlock -> ReadBlock
- filter_qual20 :: TrimFilter
- filter_sigint :: TrimFilter
- sigint :: ReadBlock -> Int
- qual20 :: ReadBlock -> Int
- dlength :: [a] -> Double
- avg :: Integral a => [a] -> Double
- clipFlows :: ReadBlock -> Int -> ReadBlock
- clipSeq :: ReadBlock -> Int -> ReadBlock
Discarding filters **
type DiscardFilter = ReadBlock -> BoolSource
DiscardFilters determine whether a read is to be retained or discarded
filter_length :: Int -> DiscardFilterSource
- 2.2.1.2 The dots filter discards sequences where the last positive flow is before flow 84, and flows with >5% dots (i.e. three successive noise values) before the last postitive flow. (Interpreted as 5% of called sequence length is Ns?)
- 2.2.1.3 The mixed filter discards sequences with more than 70% positive flows. Also, discard with 30% noise, 20% middle (0.45..0.75) or <30% positive.
Discard a read if the number of untrimmed flows is less than n (n=186 for Titanium)
Trimming filters **
type TrimFilter = ReadBlock -> ReadBlockSource
TrimFilters modify the read, typically trimming it for quality
sigint :: ReadBlock -> IntSource
- 2.2.1.4 Signal intensity trim - trim back until <3% borderline flows (0.5..0.7). Then trim borderline values or dots from the end (use a window).
qual20 :: ReadBlock -> IntSource
- 2.2.1.7 Quality score trimming trims using a 10-base window until a Q20 average is found.