|
|
|
|
|
Description |
This is a meta-module importing and re-exporting sequence-related stuff.
It encompasses the Bio.Sequence.SeqData, Bio.Sequence.Fasta, and Bio.Sequence.TwoBit modules.
|
|
Synopsis |
|
|
|
|
Data structures etc (Bio.Sequence.SeqData)
|
|
|
A sequence consists of a header, the sequence data itself, and optional quality data.
The type parameter is a phantom type to separate nucleotide and amino acid sequences
| Constructors | | Instances | |
|
|
|
|
|
An offset, index, or length of a SeqData
|
|
|
The basic data type used in Sequences
|
|
|
Basic type for quality data. Range 0..255. Typical Phred output is in
the range 6..50, with 20 as the line in the sand separating good from bad.
|
|
|
Quality data is a Qual vector, currently implemented as a ByteString.
|
|
Accessor functions
|
|
|
Return sequence length.
|
|
|
Return sequence label (first word of header)
|
|
|
Return full header.
|
|
|
Return the sequence data.
|
|
|
Return the quality data, or error if none exist. Use hasqual if in doubt.
|
|
|
Read the character at the specified position in the sequence.
|
|
|
|
|
Modify the header by appending text, or by replacing
all but the sequence label (i.e. first word).
|
|
Converting to and from String.
|
|
|
Convert a String to SeqData
|
|
|
Convert a SeqData to a String
|
|
Nucleotide functionality.
|
|
|
Complement a single character. I.e. identify the nucleotide it
can hybridize with. Note that for multiple nucleotides, you usually
want the reverse complement (see revcompl for that).
|
|
|
Calculate the reverse complement.
This is only relevant for the nucleotide alphabet,
and it leaves other characters unmodified.
|
|
|
Calculate the reverse complent for SeqData only.
|
|
|
For type tagging sequences (protein sequences use Amino below)
|
|
|
|
|
Protein sequence functionality
|
|
|
Constructors | Ala | | Arg | | Asn | | Asp | | Cys | | Gln | | Glu | | Gly | | His | | Ile | | Leu | | Lys | | Met | | Phe | | Pro | | Ser | | Thr | | Tyr | | Trp | | Val | | STP | | Asx | | Glx | | Xle | | Xaa | |
| Instances | |
|
|
|
Translate a nucleotide sequence into the corresponding protein
sequence. This works rather blindly, with no attempt to identify ORFs
or otherwise QA the result.
|
|
|
Convert a sequence in IUPAC format to a list of amino acids.
|
|
|
Convert a list of amino acids to a sequence in IUPAC format.
|
|
|
|
Other utility functions
|
|
|
Returns a sequence with all internal storage freshly copied and
with sequence and quality data present as a single chunk.
By freshly copying internal storage, defragSeq allows garbage
collection of the original data source whence the sequence was
read; otherwise, use of just a short sequence name can cause an
entire sequence file buffer to be retained.
By compacting sequence data into a single chunk, defragSeq avoids
linear-time traversal of sequence chunks during random access into
sequence data.
|
|
|
map over sequences, treating them as a sequence of (char,word8) pairs.
This will work on sequences without quality, as long as the function doesn't
try to examine it.
The current implementation is not very efficient.
|
|
File IO
|
|
Generic sequence reading
|
|
|
Read nucleotide sequences in any format - Fasta, SFF, FastQ, 2bit, PHD...
|
|
|
Read protein sequences in any supported format (i.e. Fasta)
|
|
The Fasta file format (Bio.Sequence.Fasta)
|
|
|
Lazily read sequences from a FASTA-formatted file
|
|
|
Lazily read sequence from handle
|
|
|
Write sequences to a FASTA-formatted file.
Line length is 60.
|
|
|
Write sequences in FASTA format to a handle.
|
|
Quality data
|
|
Not part of the Fasta format, and treated separately.
|
|
|
Read quality data for sequences to a file.
|
|
|
Write quality data for sequences to a file.
|
|
|
|
|
Read sequence and associated quality. Will error if
the sequences and qualites do not match one-to-one in sequence.
|
|
|
Write sequence and quality data simulatnously
This may be more laziness-friendly.
|
|
|
|
The FastQ format (Bio.Sequence.FastQ)
|
|
|
|
|
|
|
|
|
|
The phd file format (Bio.Sequence.Phd)
|
|
These contain base (nucleotide) calling information,
and are generated by phred.
|
|
|
Parse a .phd file, extracting the contents as a Sequence
|
|
|
Parse .phd contents from a handle
|
|
TwoBit file format support (Bio.Seqeunce.TwoBit)
|
|
Used by BLAT and related tools.
|
|
|
Parse a (lazy) ByteString as sequences in the 2bit format.
|
|
|
Read sequences from a file in 2bit format and
| unmarshall/deserialize into Sequence format.
|
|
|
Read sequences from a file handle in the 2bit format and
| unmarshall/deserialze into Sequence format.
|
|
Hashing functionality (Bio.Sequence.HashWord)
|
|
Packing words from sequences into integral data types
|
|
|
This is a struct for containing a set of hashing functions
| Constructors | HF | | hash :: SeqData -> Offset -> Maybe k | calculates the hash at a given offset in the sequence
| hashes :: SeqData -> [(k, Offset)] | calculate all hashes from a sequence, and their indices
| ksort :: [k] -> [k] | for sorting hashes
|
|
|
|
|
|
Contigous constructs an int/eger from a contigous k-word.
|
|
|
Like contigous, but returns the same hash for a word and its reverse complement.
|
|
|
Like rcontig, but ignoring monomers (i.e. arbitrarily long runs of a single nucelotide
are treated the same a single nucleotide.
|
|
Entropy calculations
|
|
|
| Methods | | | Instances | |
|
|
|
|
Produced by Haddock version 2.6.1 |