biostockholm-0.1: Reading and writing Stockholm files (multiple sequence alignment, used by Rfam and Infernal).

Bio.Sequence.Stockholm

Contents

Description

Synopsis

Data types

data Stockholm Source

An Stockholm 1.0 formatted file represented in memory.

data Ann d Source

A generic annotation.

Constructors

Ann 

Fields

feature :: !d
 
text :: !ByteString
 

Instances

Typeable1 Ann 
Eq d => Eq (Ann d) 
Ord d => Ord (Ann d) 
Show d => Show (Ann d) 
NFData (Ann d) 

data FileAnnotation Source

Possible file annotations.

Constructors

AC

Accession number: Accession number in form PFxxxxx.version or PBxxxxxx.

ID

Identification: One word name for family.

DE

Definition: Short description of family.

AU

Author: Authors of the entry.

SE

Source of seed: The source suggesting the seed members belong to one family.

GA

Gathering method: Search threshold to build the full alignment.

TC

Trusted Cutoff: Lowest sequence score and domain score of match in the full alignment.

NC

Noise Cutoff: Highest sequence score and domain score of match not in full alignment.

TP

Type: Type of family (presently Family, Domain, Motif or Repeat).

SQ

Sequence: Number of sequences in alignment.

AM

Alignment Method: The order ls and fs hits are aligned to the model to build the full align.

DC

Database Comment: Comment about database reference.

DR

Database Reference: Reference to external database.

RC

Reference Comment: Comment about literature reference.

RN

Reference Number: Reference Number.

RM

Reference Medline: Eight digit medline UI number.

RT

Reference Title: Reference Title.

RA

Reference Author: Reference Author

RL

Reference Location: Journal location.

PI

Previous identifier: Record of all previous ID lines.

KW

Keywords: Keywords.

CC

Comment: Comments.

NE

Pfam accession: Indicates a nested domain.

NL

Location: Location of nested domains - sequence ID, start and end of insert.

F_Other !ByteString

Other file annotation.

data SequenceAnnotation Source

Possible sequence annotations.

Constructors

S_AC

Accession number

S_DE

Description

S_DR

Database reference

OS

Organism (species)

OC

Organism classification (clade, etc.)

LO

Look (Color, etc.)

S_Other !ByteString

Other sequence annotation.

data ColumnAnnotation a Source

Possible column annotations. Phantom type can be InFile or InSeq.

Constructors

SS

Secondary structure.

SA

Surface accessibility.

TM

TransMembrane.

PP

Posterior probability.

LI

LIgand binding.

AS

Active site.

PAS

AS - Pfam predicted.

SAS

AS - from SwissProt.

IN

INtron (in or after).

C_Other !ByteString

Other column annotation.

Instances

data InFile Source

Phantom type for ColumnAnnotations of the whole file.

Instances

ClmnAnnLoc InFile 

data InSeq Source

Phantom type for ColumnAnnotations of a single sequence.

Instances

ClmnAnnLoc InSeq 

findAnn :: Eq d => d -> [Ann d] -> Maybe ByteStringSource

Find an annotation. For example, you may use findAnn SS to find the secondary of an Stockholm file.

Parsing

parseStockholm :: StockholmExc e => ByteString -> [Exceptional e Stockholm]Source

parseStockholm parses a file in Stockholm 1.0 format.

Each file must be completely read before it is used because the Stockholm format allows information to be given in any part of the file. However, there may be multiple "Stockholm files" concatenated in a single "filesystem file". These multiple files are read independently, which is why we return a list of Exceptionals.

If you prefer to read the whole file in one go, use sequence (parseStockholm input), which will fail if any family fails.

class StockholmExc e whereSource

Exceptions that may happen while parsing a Stockholm file.

Methods

emptyFileExc :: eSource

File is empty.

headerExc :: eSource

Header is missing.

malformedAnnExc :: ByteString -> eSource

Malformed annotation. The line is passed as argument.

unknownAnnTypeExc :: Char -> eSource

Unknown annotation type.

malformedSeqDataExc :: ByteString -> eSource

Malformed sequence line. The line is passed as argument.

Printing

prettyPrintStockholm :: Stockholm -> ByteStringSource

Pretty-prints an Stockholm file. We follow Rfam preferences and do not wrap lines.