|
|
|
|
|
| Description |
Data structures for manipulating (biological) sequences.
Generally supports both nucleotide and protein sequences, some functions,
like revcompl, only makes sense for nucleotides.
|
|
| Synopsis |
|
|
|
|
| Data structure
|
|
| A sequence is a header, sequence data itself, and optional quality data.
Sequences are type-tagged to identify them as nucleotide, amino acids,
or unknown type.
All items are lazy bytestrings. The Offset type can be used for indexing.
|
|
|
| A sequence consists of a header, the sequence data itself, and optional quality data.
The type parameter is a phantom type to separate nucleotide and amino acid sequences
| | Constructors | | Instances | |
|
|
|
| An offset, index, or length of a SeqData
|
|
|
| The basic data type used in Sequences
|
|
| Quality data is normally associated with nucleotide sequences
|
|
|
| Basic type for quality data. Range 0..255. Typical Phred output is in
the range 6..50, with 20 as the line in the sand separating good from bad.
|
|
|
| Quality data is a Qual vector, currently implemented as a ByteString.
|
|
| Accessor functions
|
|
|
| Read the character at the specified position in the sequence.
|
|
|
| Return sequence length.
|
|
|
| Return sequence label (first word of header)
|
|
|
| Return full header.
|
|
|
| Return the sequence data.
|
|
|
|
|
| Check whether the sequence has associated quality data.
|
|
|
| Return the quality data, or error if none exist. Use hasqual if in doubt.
|
|
| Adding information to header
|
|
|
|
|
| Modify the header by appending text, or by replacing
all but the sequence label (i.e. first word).
|
|
| Converting to and from [Char]
|
|
|
| Convert a String to SeqData
|
|
|
| Convert a SeqData to a String
|
|
| Sequence utilities
|
|
|
Returns a sequence with all internal storage freshly copied and
with sequence and quality data present as a single chunk.
By freshly copying internal storage, defragSeq allows garbage
collection of the original data source whence the sequence was
read; otherwise, use of just a short sequence name can cause an
entire sequence file buffer to be retained.
By compacting sequence data into a single chunk, defragSeq avoids
linear-time traversal of sequence chunks during random access into
sequence data.
|
|
|
| map over sequences, treating them as a sequence of (char,word8) pairs.
This will work on sequences without quality, as long as the function doesn't
try to examine it.
The current implementation is not very efficient.
|
|
|
| Phantom type functionality, unchecked conversion between sequence types
|
|
| Nucleotide functionality
|
|
| Nucleotide sequences contain the alphabet [A,C,G,T].
IUPAC specifies an extended nucleotide alphabet with wildcards, but
it is not supported at this point.
|
|
|
| Complement a single character. I.e. identify the nucleotide it
can hybridize with. Note that for multiple nucleotides, you usually
want the reverse complement (see revcompl for that).
|
|
|
| Calculate the reverse complement.
This is only relevant for the nucleotide alphabet,
and it leaves other characters unmodified.
|
|
|
| Calculate the reverse complent for SeqData only.
|
|
|
| For type tagging sequences (protein sequences use Amino below)
|
|
|
|
|
| Protein functionality
|
|
| Proteins are chains of amino acids, represented by the IUPAC alphabet.
|
|
|
| Constructors | | Ala | | | Arg | | | Asn | | | Asp | | | Cys | | | Gln | | | Glu | | | Gly | | | His | | | Ile | | | Leu | | | Lys | | | Met | | | Phe | | | Pro | | | Ser | | | Thr | | | Tyr | | | Trp | | | Val | | | STP | | | Asx | | | Glx | | | Xle | | | Xaa | |
| Instances | |
|
|
|
| Translate a nucleotide sequence into the corresponding protein
sequence. This works rather blindly, with no attempt to identify ORFs
or otherwise QA the result.
|
|
|
| Convert a sequence in IUPAC format to a list of amino acids.
|
|
|
| Convert a list of amino acids to a sequence in IUPAC format.
|
|
|
|
| Display a nicely formated sequence.
|
|
|
| A simple function to display a sequence: we generate the sequence string and
| call putStrLn
|
|
|
| Returns a properly formatted and probably highlighted string
| representation of a sequence. Highlighting is done using ANSI-Escape
| sequences.
|
|
| Default type for sequences
|
|
|
|
| Produced by Haddock version 2.6.1 |