Bio.Sequence

Complement a single character. I.e. identify the nucleotide it can hybridize with. Note that for multiple nucleotides, you usually want the reverse complement (see revcompl for that).

revcompl :: Sequence -> Sequence

Source

Calculate the reverse complement. This is only relevant for the nucleotide alphabet, and it leaves other characters unmodified.

Protein sequence functionality

data Amino

Source

Constructors

Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Tyr
Trp
Val
STP
Asx
Glx
Xle
Xaa

Instances

Eq Amino

Show Amino

translate :: Sequence -> Offset -> [Amino]

Source

Translate a nucleotide sequence into the corresponding protein sequence. This works rather blindly, with no attempt to identify ORFs or otherwise QA the result.

fromIUPAC :: SeqData -> [Amino]

Source

Convert a sequence in IUPAC format to a list of amino acids.

toIUPAC :: [Amino] -> SeqData

Source

Convert a list of amino acids to a sequence in IUPAC format.

File formats

The Fasta file format (Bio.Sequence.Fasta)

readFasta :: FilePath -> IO [Sequence]

Source

Lazily read sequences from a FASTA-formatted file

hReadFasta :: Handle -> IO [Sequence]

Source

Lazily read sequence from handle

writeFasta :: FilePath -> [Sequence] -> IO ()

Source

Write sequences to a FASTA-formatted file. Line length is 60.

hWriteFasta :: Handle -> [Sequence] -> IO ()

Source

Write sequences in FASTA format to a handle.

Quality data

Not part of the Fasta format, and treated separately.

readQual :: FilePath -> IO [Sequence]

Source

Read quality data for sequences to a file.

writeQual :: FilePath -> [Sequence] -> IO ()

Source

Write quality data for sequences to a file.

hWriteQual :: Handle -> [Sequence] -> IO ()

Source

readFastaQual :: FilePath -> FilePath -> IO [Sequence]

Source

Read sequence and associated quality. Will error if the sequences and qualites do not match one-to-one in sequence.

writeFastaQual :: FilePath -> FilePath -> [Sequence] -> IO ()

Source

Write sequence and quality data simulatnously This may be more laziness-friendly.

hWriteFastaQual :: Handle -> Handle -> [Sequence] -> IO ()

Source

The FastQ format (Bio.Sequence.FastQ)

readFastQ :: FilePath -> IO [Sequence]

Source

writeFastQ :: FilePath -> [Sequence] -> IO ()

Source

hReadFastQ :: Handle -> IO [Sequence]

Source

hWriteFastQ :: Handle -> [Sequence] -> IO ()

Source

The phd file format (Bio.Sequence.Phd)

These contain base (nucleotide) calling information, and are generated by phred.

readPhd :: FilePath -> IO Sequence

Source

Parse a .phd file, extracting the contents as a Sequence

hReadPhd :: Handle -> IO Sequence

Source

Parse .phd contents from a handle

TwoBit file format support (Bio.Seqeunce.TwoBit)

Used by BLAT and related tools.

decode2Bit :: ByteString -> [Sequence]

Source

Parse a (lazy) ByteString as sequences in the 2bit format.

read2Bit :: FilePath -> IO [Sequence]

Source

Extract sequences from a file in 2bit format.

hRead2Bit :: Handle -> IO [Sequence]

Source

Extract sequences in the 2bit format from a handle.

Hashing functionality (Bio.Sequence.HashWord)

Packing words from sequences into integral data types

data HashF k

Source

This is a struct for containing a set of hashing functions

Constructors

hash :: SeqData -> Offset -> Maybe k	calculates the hash at a given offset in the sequence
hashes :: SeqData -> [(k, Offset)]	calculate all hashes from a sequence, and their indices
ksort :: [k] -> [k]	for sorting hashes