bio-0.5.3: A bioinformatics library

Safe HaskellNone

Bio.Sequence.SFF_filters

Contents

Description

This implements a number of filters used in the Titanium pipeline, based on published documentation.

Synopsis

Discarding filters

type DiscardFilter = ReadBlock -> BoolSource

DiscardFilters determine whether a read is to be retained or discarded

discard_empty :: DiscardFilterSource

This filter discards empty sequences.

discard_key :: String -> DiscardFilterSource

Discard sequences that don't have the given key tag (typically TCAG) at the start of the read.

discard_dots :: Double -> DiscardFilterSource

  1. 2.2.1.2 The dots filter discards sequences where the last positive flow is before flow 84, and flows with >5% dots (i.e. three successive noise values) before the last postitive flow. The percentage can be given as a parameter.

discard_mixed :: DiscardFilterSource

  1. 2.2.1.3 The mixed filter discards sequences with more than 70% positive flows. Also, discard with noise,20% middle (0.45..0.75) or <30% positive.

discard_length :: Int -> DiscardFilterSource

Discard a read if the number of untrimmed flows is less than n (n=186 for Titanium)

Trimming filters

type TrimFilter = ReadBlock -> ReadBlockSource

TrimFilters modify the read, typically trimming it for quality

trim_sigint :: TrimFilterSource

  1. 2.2.1.4 Signal intensity trim - trim back until <3% borderline flows (0.5..0.7). Then trim borderline values or dots from the end (use a window).

trim_primer :: String -> TrimFilterSource

  1. 2.2.1.5 Primer filter This looks for the B-adaptor at the end of the read. The 454 implementation isn't very effective at finding mutated adaptors.

trim_qual20 :: Int -> TrimFilterSource

  1. 2.2.1.7 Quality score trimming trims using a 10-base window until a Q20 average is found.

Utility functions

dlength :: [a] -> DoubleSource

List length as a double (eliminates many instances of fromIntegral)

avg :: Integral a => [a] -> DoubleSource

Calculate average of a list

clipFlows :: ReadBlock -> Int -> ReadBlockSource

Translate a number of flows to position in sequence, and update clipping data accordingly

clipSeq :: ReadBlock -> Int -> ReadBlockSource

Update clip_qual_right if more severe than previous value

Data