This implements a number of filters used in the Titanium pipeline, based on published documentation.
- type DiscardFilter = ReadBlock -> Bool
- discard_empty :: DiscardFilter
- discard_key :: String -> DiscardFilter
- discard_dots :: Double -> DiscardFilter
- discard_mixed :: DiscardFilter
- discard_length :: Int -> DiscardFilter
- type TrimFilter = ReadBlock -> ReadBlock
- trim_sigint :: TrimFilter
- sigint :: ReadBlock -> Int
- trim_primer :: String -> TrimFilter
- find_primer :: String -> ReadBlock -> Int
- trim_qual20 :: Int -> TrimFilter
- qual20 :: Int -> ReadBlock -> Int
- dlength :: [a] -> Double
- avg :: Integral a => [a] -> Double
- clipFlows :: ReadBlock -> Int -> ReadBlock
- clipSeq :: ReadBlock -> Int -> ReadBlock
- flx_linker :: [Char]
- ti_linker :: [Char]
- rna_adapter :: [Char]
- rna_adapter2 :: [Char]
- rna_adapter3 :: [Char]
- rapid_adapter :: [Char]
- ti_adapter_b :: [Char]
DiscardFilters determine whether a read is to be retained or discarded
Discard sequences that don't have the given key tag (typically TCAG) at the start of the read.
- 188.8.131.52 The dots filter discards sequences where the last positive flow is before flow 84, and flows with >5% dots (i.e. three successive noise values) before the last postitive flow. The percentage can be given as a parameter.
Discard a read if the number of untrimmed flows is less than n (n=186 for Titanium)
TrimFilters modify the read, typically trimming it for quality
- 184.108.40.206 Signal intensity trim - trim back until <3% borderline flows (0.5..0.7). Then trim borderline values or dots from the end (use a window).
- 220.127.116.11 Primer filter This looks for the B-adaptor at the end of the read. The 454 implementation isn't very effective at finding mutated adaptors.
- 18.104.22.168 Quality score trimming trims using a 10-base window until a Q20 average is found.
Translate a number of flows to position in sequence, and update clipping data accordingly
Update clip_qual_right if more severe than previous value