Parsing and pretty printing of files in Stockholm 1.0 format. See:
- data Stockholm = Stockholm [Ann FileAnnotation] [Ann (ColumnAnnotation InFile)] [StockholmSeq]
- data StockholmSeq = StSeq !SeqLabel !SeqData [Ann SequenceAnnotation] [Ann (ColumnAnnotation InSeq)]
- data Ann d = Ann {
- feature :: !d
- text :: !ByteString
- data FileAnnotation
- data SequenceAnnotation
- data ColumnAnnotation a
- data InFile
- data InSeq
- findAnn :: Eq d => d -> [Ann d] -> Maybe ByteString
- parseStockholm :: StockholmExc e => ByteString -> [Exceptional e Stockholm]
- class StockholmExc e where
- emptyFileExc :: e
- headerExc :: e
- malformedAnnExc :: ByteString -> e
- unknownAnnTypeExc :: Char -> e
- malformedSeqDataExc :: ByteString -> e
- prettyPrintStockholm :: Stockholm -> ByteString
Data types
An Stockholm 1.0 formatted file represented in memory.
data StockholmSeq Source
A sequence in Stockholm 1.0 format.
A generic annotation.
Ann | |
|
data FileAnnotation Source
Possible file annotations.
AC | Accession number: Accession number in form PFxxxxx.version or PBxxxxxx. |
ID | Identification: One word name for family. |
DE | Definition: Short description of family. |
AU | Author: Authors of the entry. |
SE | Source of seed: The source suggesting the seed members belong to one family. |
GA | Gathering method: Search threshold to build the full alignment. |
TC | Trusted Cutoff: Lowest sequence score and domain score of match in the full alignment. |
NC | Noise Cutoff: Highest sequence score and domain score of match not in full alignment. |
TP | Type: Type of family (presently Family, Domain, Motif or Repeat). |
SQ | Sequence: Number of sequences in alignment. |
AM | Alignment Method: The order ls and fs hits are aligned to the model to build the full align. |
DC | Database Comment: Comment about database reference. |
DR | Database Reference: Reference to external database. |
RC | Reference Comment: Comment about literature reference. |
RN | Reference Number: Reference Number. |
RM | Reference Medline: Eight digit medline UI number. |
RT | Reference Title: Reference Title. |
RA | Reference Author: Reference Author |
RL | Reference Location: Journal location. |
PI | Previous identifier: Record of all previous ID lines. |
KW | Keywords: Keywords. |
CC | Comment: Comments. |
NE | Pfam accession: Indicates a nested domain. |
NL | Location: Location of nested domains - sequence ID, start and end of insert. |
F_Other !ByteString | Other file annotation. |
data SequenceAnnotation Source
Possible sequence annotations.
data ColumnAnnotation a Source
SS | Secondary structure. |
SA | Surface accessibility. |
TM | TransMembrane. |
PP | Posterior probability. |
LI | LIgand binding. |
AS | Active site. |
PAS | AS - Pfam predicted. |
SAS | AS - from SwissProt. |
IN | INtron (in or after). |
C_Other !ByteString | Other column annotation. |
Typeable1 ColumnAnnotation | |
Eq (ColumnAnnotation a) | |
Ord (ColumnAnnotation a) | |
Show (ColumnAnnotation a) | |
ClmnAnnLoc a => IsAnnotation (ColumnAnnotation a) |
Parsing
parseStockholm :: StockholmExc e => ByteString -> [Exceptional e Stockholm]Source
parseStockholm
parses a file in Stockholm 1.0 format.
Each file must be completely read before it is used because
the Stockholm format allows information to be given in any
part of the file. However, there may be multiple
"Stockholm files" concatenated in a single "filesystem
file". These multiple files are read independently, which
is why we return a list of Exceptional
s
.
If you prefer to read the whole file in one go, use
, which will fail if any
family fails.
sequence
(parseStockholm input)
class StockholmExc e whereSource
Exceptions that may happen while parsing a Stockholm file.
emptyFileExc :: eSource
File is empty.
Header is missing.
malformedAnnExc :: ByteString -> eSource
Malformed annotation. The line is passed as argument.
unknownAnnTypeExc :: Char -> eSource
Unknown annotation type.
malformedSeqDataExc :: ByteString -> eSource
Malformed sequence line. The line is passed as argument.
Printing
prettyPrintStockholm :: Stockholm -> ByteStringSource
Pretty-prints an Stockholm file. We follow Rfam preferences and do not wrap lines.