bio-0.5: A bioinformatics library




This module incorporates functionality for reading and writing sequence data in the Fasta format. Each sequence consists of a header (with a > prefix) and a set of lines containing the sequence data.

As Fasta is used for both amino acids and nucleotides, the resulting Sequences are type-tagged with Unknown. If you know the type of sequence you are reading, use castToAmino or castToNuc.


Reading and writing plain FASTA files

readFasta :: FilePath -> IO [Sequence Unknown]Source

Lazily read sequences from a FASTA-formatted file

writeFasta :: FilePath -> [Sequence a] -> IO ()Source

Write sequences to a FASTA-formatted file. Line length is 60.

hReadFasta :: Handle -> IO [Sequence Unknown]Source

Lazily read sequence from handle

hWriteFasta :: Handle -> [Sequence a] -> IO ()Source

Write sequences in FASTA format to a handle.

Reading and writing quality files

readQual :: FilePath -> IO [Sequence Unknown]Source

Read quality data for sequences to a file.

writeQual :: FilePath -> [Sequence a] -> IO ()Source

Write quality data for sequences to a file.

Combining FASTA and quality files

readFastaQual :: FilePath -> FilePath -> IO [Sequence Unknown]Source

Read sequence and associated quality. Will error if the sequences and qualites do not match one-to-one in sequence.

writeFastaQual :: FilePath -> FilePath -> [Sequence a] -> IO ()Source

Write sequence and quality data simulatnously This may be more laziness-friendly.

Counting sequences in a FASTA file

Helper function for reading your own sequences

mkSeqs :: [ByteString] -> [Sequence Unknown]Source

Convert a list of FASTA-formatted lines into a list of sequences. Blank lines are ignored. Comment lines start with are allowed between sequences (and ignored). Lines starting with > initiate a new sequence.

Data structures

type Qual = Word8Source

Basic type for quality data. Range 0..255. Typical Phred output is in the range 6..50, with 20 as the line in the sand separating good from bad.