bio-0.3.3.4: A bioinformatics librarySource codeContentsIndex
Bio.Sequence.SeqData
Contents
Data structure
Accessor functions
Adding information to header
Converting to and from [Char]
Nucleotide functionality
Protein functionality
Description

Data structures for manipulating (biological) sequences.

Generally supports both nucleotide and protein sequences, some functions, like revcompl, only makes sense for nucleotides.

Synopsis
data Sequence = Seq !SeqData !SeqData !(Maybe QualData)
type Offset = Int64
type SeqData = ByteString
type Qual = Word8
type QualData = ByteString
(!) :: Sequence -> Offset -> Char
seqlength :: Sequence -> Offset
seqlabel :: Sequence -> SeqData
seqheader :: Sequence -> SeqData
seqdata :: Sequence -> SeqData
(?) :: Sequence -> Offset -> Qual
hasqual :: Sequence -> Bool
seqqual :: Sequence -> QualData
appendHeader :: Sequence -> String -> Sequence
setHeader :: Sequence -> String -> Sequence
fromStr :: String -> SeqData
toStr :: SeqData -> String
compl :: Char -> Char
revcompl :: Sequence -> Sequence
data Amino
= Ala
| Arg
| Asn
| Asp
| Cys
| Gln
| Glu
| Gly
| His
| Ile
| Leu
| Lys
| Met
| Phe
| Pro
| Ser
| Thr
| Tyr
| Trp
| Val
| STP
| Asx
| Glx
| Xle
| Xaa
translate :: Sequence -> Offset -> [Amino]
fromIUPAC :: SeqData -> [Amino]
toIUPAC :: [Amino] -> SeqData
Data structure
A sequence is a header, sequence data itself, and optional quality data. All items are lazy bytestrings. The Offset type can be used for indexing.
data Sequence Source
A sequence consists of a header, the sequence data itself, and optional quality data.
Constructors
Seq !SeqData !SeqData !(Maybe QualData)header and actual sequence
show/hide Instances
type Offset = Int64Source
An offset, index, or length of a SeqData
type SeqData = ByteStringSource
The basic data type used in Sequences
Quality data is normally associated with nucleotide sequences
type Qual = Word8Source
Basic type for quality data. Range 0..255. Typical Phred output is in the range 6..50, with 20 as the line in the sand separating good from bad.
type QualData = ByteStringSource
Quality data is a Qual vector, currently implemented as a ByteString.
Accessor functions
(!) :: Sequence -> Offset -> CharSource
Read the character at the specified position in the sequence.
seqlength :: Sequence -> OffsetSource
Return sequence length.
seqlabel :: Sequence -> SeqDataSource
Return sequence label (first word of header)
seqheader :: Sequence -> SeqDataSource
Return full header.
seqdata :: Sequence -> SeqDataSource
Return the sequence data.
(?) :: Sequence -> Offset -> QualSource
hasqual :: Sequence -> BoolSource
Check whether the sequence has associated quality data.
seqqual :: Sequence -> QualDataSource
Return the quality data, or error if none exist. Use hasqual if in doubt.
Adding information to header
appendHeader :: Sequence -> String -> SequenceSource
setHeader :: Sequence -> String -> SequenceSource
Modify the header by appending text, or by replacing all but the sequence label (i.e. first word).
Converting to and from [Char]
fromStr :: String -> SeqDataSource
Convert a String to SeqData
toStr :: SeqData -> StringSource
Convert a SeqData to a String
Nucleotide functionality
Nucleotide sequences contain the alphabet [A,C,G,T]. IUPAC specifies an extended nucleotide alphabet with wildcards, but it is not supported at this point.
compl :: Char -> CharSource
Complement a single character. I.e. identify the nucleotide it can hybridize with. Note that for multiple nucleotides, you usually want the reverse complement (see revcompl for that).
revcompl :: Sequence -> SequenceSource
Calculate the reverse complement. This is only relevant for the nucleotide alphabet, and it leaves other characters unmodified.
Protein functionality
Proteins are chains of amino acids, represented by the IUPAC alphabet.
data Amino Source
Constructors
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Tyr
Trp
Val
STP
Asx
Glx
Xle
Xaa
show/hide Instances
translate :: Sequence -> Offset -> [Amino]Source
Translate a nucleotide sequence into the corresponding protein sequence. This works rather blindly, with no attempt to identify ORFs or otherwise QA the result.
fromIUPAC :: SeqData -> [Amino]Source
Convert a sequence in IUPAC format to a list of amino acids.
toIUPAC :: [Amino] -> SeqDataSource
Convert a list of amino acids to a sequence in IUPAC format.
Produced by Haddock version 2.4.2