Safe Haskell | None |
---|
Bio.Sequence.SFF_filters
Description
This implements a number of filters used in the Titanium pipeline, based on published documentation.
- type DiscardFilter = ReadBlock -> Bool
- discard_empty :: DiscardFilter
- discard_key :: String -> DiscardFilter
- discard_dots :: Double -> DiscardFilter
- discard_mixed :: DiscardFilter
- discard_length :: Int -> DiscardFilter
- type TrimFilter = ReadBlock -> ReadBlock
- trim_sigint :: TrimFilter
- sigint :: ReadBlock -> Int
- trim_primer :: String -> TrimFilter
- find_primer :: String -> ReadBlock -> Int
- trim_qual20 :: Int -> TrimFilter
- qual20 :: Int -> ReadBlock -> Int
- dlength :: [a] -> Double
- avg :: Integral a => [a] -> Double
- clipFlows :: ReadBlock -> Int -> ReadBlock
- clipSeq :: ReadBlock -> Int -> ReadBlock
- flx_linker :: String
- ti_adapter_b :: String
- rapid_adapter :: String
- rna_adapter3 :: String
- rna_adapter2 :: String
- rna_adapter :: String
- ti_linker :: String
Discarding filters
type DiscardFilter = ReadBlock -> BoolSource
DiscardFilters determine whether a read is to be retained or discarded
discard_empty :: DiscardFilterSource
This filter discards empty sequences.
discard_key :: String -> DiscardFilterSource
Discard sequences that don't have the given key tag (typically TCAG) at the start of the read.
discard_dots :: Double -> DiscardFilterSource
- 2.2.1.2 The dots filter discards sequences where the last positive flow is before flow 84, and flows with >5% dots (i.e. three successive noise values) before the last postitive flow. The percentage can be given as a parameter.
discard_length :: Int -> DiscardFilterSource
Discard a read if the number of untrimmed flows is less than n (n=186 for Titanium)
Trimming filters
type TrimFilter = ReadBlock -> ReadBlockSource
TrimFilters modify the read, typically trimming it for quality
trim_sigint :: TrimFilterSource
- 2.2.1.4 Signal intensity trim - trim back until <3% borderline flows (0.5..0.7). Then trim borderline values or dots from the end (use a window).
trim_primer :: String -> TrimFilterSource
- 2.2.1.5 Primer filter This looks for the B-adaptor at the end of the read. The 454 implementation isn't very effective at finding mutated adaptors.
find_primer :: String -> ReadBlock -> IntSource
trim_qual20 :: Int -> TrimFilterSource
- 2.2.1.7 Quality score trimming trims using a 10-base window until a Q20 average is found.
Utility functions
clipFlows :: ReadBlock -> Int -> ReadBlockSource
Translate a number of flows to position in sequence, and update clipping data accordingly
clipSeq :: ReadBlock -> Int -> ReadBlockSource
Update clip_qual_right if more severe than previous value