Data structures for manipulating (biological) sequences.
Generally supports both nucleotide and protein sequences, some functions,
like revcompl, only makes sense for nucleotides.
|A sequence is a header, sequence data itself, and optional quality data.
Sequences are type-tagged to identify them as nucleotide, amino acids,
or unknown type.
All items are lazy bytestrings. The Offset type can be used for indexing.
|A sequence consists of a header, the sequence data itself, and optional quality data.
The type parameter is a phantom type to separate nucleotide and amino acid sequences
|An offset, index, or length of a SeqData
|The basic data type used in Sequences
|Quality data is normally associated with nucleotide sequences
|Basic type for quality data. Range 0..255. Typical Phred output is in
the range 6..50, with 20 as the line in the sand separating good from bad.
|Quality data is a Qual vector, currently implemented as a ByteString.
|Read the character at the specified position in the sequence.
|Return sequence length.
|Return sequence label (first word of header)
|Return full header.
|Return the sequence data.
|Check whether the sequence has associated quality data.
|Return the quality data, or error if none exist. Use hasqual if in doubt.
|Adding information to header
|Modify the header by appending text, or by replacing
all but the sequence label (i.e. first word).
|Converting to and from [Char]
|Convert a String to SeqData
|Convert a SeqData to a String
Returns a sequence with all internal storage freshly copied and
with sequence and quality data present as a single chunk.
By freshly copying internal storage, defragSeq allows garbage
collection of the original data source whence the sequence was
read; otherwise, use of just a short sequence name can cause an
entire sequence file buffer to be retained.
By compacting sequence data into a single chunk, defragSeq avoids
linear-time traversal of sequence chunks during random access into
|map over sequences, treating them as a sequence of (char,word8) pairs.
This will work on sequences without quality, as long as the function doesn't
try to examine it.
The current implementation is not very efficient.
|Phantom type functionality, unchecked conversion between sequence types
|Nucleotide sequences contain the alphabet [A,C,G,T].
IUPAC specifies an extended nucleotide alphabet with wildcards, but
it is not supported at this point.
|Complement a single character. I.e. identify the nucleotide it
can hybridize with. Note that for multiple nucleotides, you usually
want the reverse complement (see revcompl for that).
|Calculate the reverse complement.
This is only relevant for the nucleotide alphabet,
and it leaves other characters unmodified.
|Calculate the reverse complent for SeqData only.
|For type tagging sequences (protein sequences use Amino below)
|Proteins are chains of amino acids, represented by the IUPAC alphabet.
|Translate a nucleotide sequence into the corresponding protein
sequence. This works rather blindly, with no attempt to identify ORFs
or otherwise QA the result.
|Convert a sequence in IUPAC format to a list of amino acids.
|Convert a list of amino acids to a sequence in IUPAC format.
|Display a nicely formated sequence.
|A simple function to display a sequence: we generate the sequence string and
| call putStrLn
|Returns a properly formatted and probably highlighted string
| representation of a sequence. Highlighting is done using ANSI-Escape
|Default type for sequences
|Produced by Haddock version 2.6.1|