biostockholm-0.2: Parsing and rendering of Stockholm files (used by Pfam, Rfam and Infernal).

Bio.Sequence.Stockholm

Contents

Description

Synopsis

Data types

data Stockholm Source

An Stockholm 1.0 formatted file represented in memory.

data Ann d Source

A generic annotation.

Constructors

Ann 

Fields

feature :: !d
 
text :: !ByteString
 

Instances

Typeable1 Ann 
Eq d => Eq (Ann d) 
Ord d => Ord (Ann d) 
Show d => Show (Ann d) 
NFData (Ann d) 

data FileAnnotation Source

Possible file annotations.

Constructors

AC

Accession number: Accession number in form PFxxxxx.version or PBxxxxxx.

ID

Identification: One word name for family.

DE

Definition: Short description of family.

AU

Author: Authors of the entry.

SE

Source of seed: The source suggesting the seed members belong to one family.

GA

Gathering method: Search threshold to build the full alignment.

TC

Trusted Cutoff: Lowest sequence score and domain score of match in the full alignment.

NC

Noise Cutoff: Highest sequence score and domain score of match not in full alignment.

TP

Type: Type of family (presently Family, Domain, Motif or Repeat).

SQ

Sequence: Number of sequences in alignment.

AM

Alignment Method: The order ls and fs hits are aligned to the model to build the full align.

DC

Database Comment: Comment about database reference.

DR

Database Reference: Reference to external database.

RC

Reference Comment: Comment about literature reference.

RN

Reference Number: Reference Number.

RM

Reference Medline: Eight digit medline UI number.

RT

Reference Title: Reference Title.

RA

Reference Author: Reference Author

RL

Reference Location: Journal location.

PI

Previous identifier: Record of all previous ID lines.

KW

Keywords: Keywords.

CC

Comment: Comments.

NE

Pfam accession: Indicates a nested domain.

NL

Location: Location of nested domains - sequence ID, start and end of insert.

F_Other !ByteString

Other file annotation.

data SequenceAnnotation Source

Possible sequence annotations.

Constructors

S_AC

Accession number

S_DE

Description

S_DR

Database reference

OS

Organism (species)

OC

Organism classification (clade, etc.)

LO

Look (Color, etc.)

S_Other !ByteString

Other sequence annotation.

data ColumnAnnotation a Source

Possible column annotations. Phantom type can be InFile or InSeq.

Constructors

SS

Secondary structure.

SA

Surface accessibility.

TM

TransMembrane.

PP

Posterior probability.

LI

LIgand binding.

AS

Active site.

PAS

AS - Pfam predicted.

SAS

AS - from SwissProt.

IN

INtron (in or after).

C_Other !ByteString

Other column annotation.

data InFile Source

Phantom type for ColumnAnnotations of the whole file.

Instances

ClmnFeatureLoc InFile 

data InSeq Source

Phantom type for ColumnAnnotations of a single sequence.

Instances

ClmnFeatureLoc InSeq 

findAnn :: Eq d => d -> [Ann d] -> Maybe ByteStringSource

Find an annotation. For example, you may use findAnn SS to find the secondary on an Stockholm file.

Parsing

parseStockholm :: ResourceThrow m => Conduit ByteString m StockholmSource

parseStockholm parses a stream of files in Stockholm 1.0 format.

Each file must be completely read before it is used because the Stockholm format allows information to be given in any part of the file. However, there may be multiple "Stockholm files" concatenated in a single "filesystem file". These multiple files are read independently. If you need to process large Stockholm files, consider using the streaming interface on Bio.Sequence.Stockholm.Stream.

Printing

renderStockholm :: ResourceUnsafeIO m => Conduit Stockholm m ByteStringSource

Pretty prints an Stockholm file.

Lazy I/O

lazyParseStockholm :: ByteString -> [Stockholm]Source

Use lazy I/O to parse a stream of files in Stockholm 1.0 format. We recommend using parseStockholm.

lazyRenderStockholm :: [Stockholm] -> ByteStringSource

Use lazy I/O to render a list of Stockholms into a stream of files in Stockholm 1.0 format. We recommend using renderStockholm.