bio-0.5: A bioinformatics library

Bio.Alignment.Bowtie

Contents

Description

This module provides a data type to represent an alignment produced by the Bowtie short-read alignment tool (see http://bowtie-bio.sourceforge.net/index.shtml).

The simple accessors recapitulate the details of the Bowtie alignment output. The position of the alignment is given by the "0-based offset into the reference sequence where leftmost character of the alignment occurs". Thus, for forward-strand alignments this is the 5' end of the query sequence while for reverse-complement alignments this is the 3' end of the query sequence. Similarly, the query sequence and query quality are shown in reference forward strand orientation, and thus may be reverse complemented.

Synopsis

Data type and basic accessors

data Align Source

Constructors

Align 

Fields

name :: !SeqName

Name of the query sequence

strand :: !Strand

Strand of the alignment on the reference sequence

refname :: !SeqName

Name of the reference sequence

leftoffset :: !Offset

Zero-based offset of the left-most aligned position in the reference

sequ :: !SeqData

Query sequence, in the reference forward strand orientation

qual :: !QualData

Query quality, in the reference forward strand orientation

mismatches :: ![Mismatch]

Mismatches

data Mismatch Source

Representation of a single mismatch in a bowtie alignment

Constructors

Mismatch 

Fields

mmoffset :: !Offset

Offset of the mismatch site from the 5' end of the query

refbase :: !Char

Reference nucleotide

readbase :: !Char

Query nucleotide

length :: Align -> OffsetSource

Returns the length of the query sequence

nmismatch :: Align -> IntSource

Returns the number of mismatches in the alignment

querySequ :: Align -> SeqDataSource

Query sequence as given in the query file

queryQual :: Align -> QualDataSource

Query quality as given in the query file

Sequence positions of alignments

refCLoc :: Align -> ContigLocSource

As refCSeqLoc but without the reference sequence name.

refCSeqLoc :: Align -> ContigSeqLocSource

Returns the sequence location covered by the query in the alignment. This will be a sequence location on the reference sequence and may run on the forward or the reverse complement strand.

refSeqLoc :: Align -> SeqLocSource

Returns the sequence location covered by the query, as refCSeqLoc, as a SeqLoc location.

refSeqPos :: Align -> SeqPosSource

Returns the sequence position of the start of the query sequence alignment. This will include the strand of the alignment and will not be the same as the position computed from leftoffset when the alignment is on the reverse complement strand.

mismatchSeqPos :: Align -> Mismatch -> SeqPosSource

Sequence position of a mismatch on the reference sequence.

Parsing Bowtie output

parse :: ByteString -> Either String AlignSource

Parses a line of Bowtie output to produce a Align

Other utilities

sameRead :: Align -> Align -> BoolSource

Returns true when two alignments were derived from the same sequencing read. As Bowtie writes alignments of query sequences in their order in the query file, all alignments of a given read are grouped together and the lists of all alignments for each read can be gathered with

 groupBy sameRead