uniprot-kb-0.1.1.0: UniProt-KB format parser

Safe HaskellSafe
LanguageHaskell2010

Bio.Uniprot.Type

Synopsis

Documentation

data Kingdom Source #

Which taxonomic kingdom an organism belongs to.

Constructors

Archea

A for archaea (=archaebacteria)

Bacteria

B for bacteria (=prokaryota or eubacteria)

Eukaryota

E for eukaryota (=eukarya)

Virus

V for viruses and phages (=viridae)

Other

O for others (such as artificial sequences)

data Organism Source #

Controlled vocabulary of species

Constructors

Organism 

Fields

data Status Source #

To distinguish the fully annotated entries in the Swiss-Prot section of the UniProt Knowledgebase from the computer-annotated entries in the TrEMBL section, the status of each entry is indicated in the first (ID) line of each entry

Constructors

Reviewed

Entries that have been manually reviewed and annotated by UniProtKB curators

Unreviewed

Computer-annotated entries that have not been reviewed by UniProtKB curators

data ID Source #

IDentification

Constructors

ID 

Fields

  • entryName :: Text

    This name is a useful means of identifying a sequence, but it is not a stable identifier as is the accession number.

  • status :: Status

    The status of the entry

  • seqLength :: Int

    The length of the molecule, which is the total number of amino acids in the sequence. This number includes the positions reported to be present but which have not been determined (coded as X).

Instances

Eq ID Source # 

Methods

(==) :: ID -> ID -> Bool #

(/=) :: ID -> ID -> Bool #

Ord ID Source # 

Methods

compare :: ID -> ID -> Ordering #

(<) :: ID -> ID -> Bool #

(<=) :: ID -> ID -> Bool #

(>) :: ID -> ID -> Bool #

(>=) :: ID -> ID -> Bool #

max :: ID -> ID -> ID #

min :: ID -> ID -> ID #

Show ID Source # 

Methods

showsPrec :: Int -> ID -> ShowS #

show :: ID -> String #

showList :: [ID] -> ShowS #

newtype AC Source #

ACcession numbers. The purpose of accession numbers is to provide a stable way of identifying entries from release to release. It is sometimes necessary for reasons of consistency to change the names of the entries, for example, to ensure that related entries have similar names. However, an accession number is always conserved, and therefore allows unambiguous citation of entries. Researchers who wish to cite entries in their publications should always cite the first accession number. This is commonly referred to as the 'primary accession number'. 'Secondary accession numbers' are sorted alphanumerically.

Constructors

AC 

Fields

Instances

Eq AC Source # 

Methods

(==) :: AC -> AC -> Bool #

(/=) :: AC -> AC -> Bool #

Ord AC Source # 

Methods

compare :: AC -> AC -> Ordering #

(<) :: AC -> AC -> Bool #

(<=) :: AC -> AC -> Bool #

(>) :: AC -> AC -> Bool #

(>=) :: AC -> AC -> Bool #

max :: AC -> AC -> AC #

min :: AC -> AC -> AC #

Show AC Source # 

Methods

showsPrec :: Int -> AC -> ShowS #

show :: AC -> String #

showList :: [AC] -> ShowS #

data DT Source #

DaTe: the date of creation and last modification of the database entry.

Constructors

DT 

Fields

  • dbIntegrationDate :: Text

    Indicates when the entry first appeared in the database.

  • dbName :: Text

    Indicates in which section of UniProtKB, Swiss-Prot or TrEMBL, the entry can be found.

  • seqVersionDate :: Text

    Indicates when the sequence data was last modified.

  • seqVersion :: Int

    The sequence version number of an entry is incremented by one when the amino acid sequence shown in the sequence record is modified.

  • entryVersionDate :: Text

    Indicates when data other than the sequence was last modified.

  • entryVersion :: Int

    The entry version number is incremented by one whenever any data in the flat file representation of the entry is modified.

Instances

Eq DT Source # 

Methods

(==) :: DT -> DT -> Bool #

(/=) :: DT -> DT -> Bool #

Ord DT Source # 

Methods

compare :: DT -> DT -> Ordering #

(<) :: DT -> DT -> Bool #

(<=) :: DT -> DT -> Bool #

(>) :: DT -> DT -> Bool #

(>=) :: DT -> DT -> Bool #

max :: DT -> DT -> DT #

min :: DT -> DT -> DT #

Show DT Source # 

Methods

showsPrec :: Int -> DT -> ShowS #

show :: DT -> String #

showList :: [DT] -> ShowS #

data Name Source #

Constructors

Name 

Fields

Instances

Eq Name Source # 

Methods

(==) :: Name -> Name -> Bool #

(/=) :: Name -> Name -> Bool #

Ord Name Source # 

Methods

compare :: Name -> Name -> Ordering #

(<) :: Name -> Name -> Bool #

(<=) :: Name -> Name -> Bool #

(>) :: Name -> Name -> Bool #

(>=) :: Name -> Name -> Bool #

max :: Name -> Name -> Name #

min :: Name -> Name -> Name #

Show Name Source # 

Methods

showsPrec :: Int -> Name -> ShowS #

show :: Name -> String #

showList :: [Name] -> ShowS #

data Flag Source #

Constructors

Fragment

The complete sequence is not determined.

Fragments

The complete sequence is not determined.

Precursor

The sequence displayed does not correspond to the mature form of the protein.

data DE Source #

DEscription - general descriptive information about the sequence stored.

Constructors

DE 

Fields

  • recName :: Maybe Name

    The name recommended by the UniProt consortium.

  • altNames :: [AltName]

    A synonym of the recommended name.

  • subNames :: [Name]

    A name provided by the submitter of the underlying nucleotide sequence.

  • includes :: [DE]

    A protein is known to include multiple functional domains each of which is described by a different name.

  • contains :: [DE]

    The functional domains of an enzyme are cleaved, but the catalytic activity can only be observed, when the individual chains reorganize in a complex.

  • flags :: Maybe Flag

    Flags whether the entire is a precursor or/and a fragment.

Instances

Eq DE Source # 

Methods

(==) :: DE -> DE -> Bool #

(/=) :: DE -> DE -> Bool #

Ord DE Source # 

Methods

compare :: DE -> DE -> Ordering #

(<) :: DE -> DE -> Bool #

(<=) :: DE -> DE -> Bool #

(>) :: DE -> DE -> Bool #

(>=) :: DE -> DE -> Bool #

max :: DE -> DE -> DE #

min :: DE -> DE -> DE #

Show DE Source # 

Methods

showsPrec :: Int -> DE -> ShowS #

show :: DE -> String #

showList :: [DE] -> ShowS #

data GN Source #

Gene Name - the name(s) of the gene(s) that code for the stored protein sequence.

Constructors

GN 

Fields

  • geneName :: Maybe Text

    The name used to represent a gene.

  • synonyms :: [Text]

    Other (unofficial) names of a gene.

  • orderedLocusNames :: [Text]

    A name used to represent an ORF in a completely sequenced genome or chromosome.

  • orfNames :: [Text]

    A name temporarily attributed by a sequencing project to an open reading frame.

Instances

Eq GN Source # 

Methods

(==) :: GN -> GN -> Bool #

(/=) :: GN -> GN -> Bool #

Ord GN Source # 

Methods

compare :: GN -> GN -> Ordering #

(<) :: GN -> GN -> Bool #

(<=) :: GN -> GN -> Bool #

(>) :: GN -> GN -> Bool #

(>=) :: GN -> GN -> Bool #

max :: GN -> GN -> GN #

min :: GN -> GN -> GN #

Show GN Source # 

Methods

showsPrec :: Int -> GN -> ShowS #

show :: GN -> String #

showList :: [GN] -> ShowS #

newtype OS Source #

Organism Species - the organism which was the source of the stored sequence.

Constructors

OS 

Fields

Instances

Eq OS Source # 

Methods

(==) :: OS -> OS -> Bool #

(/=) :: OS -> OS -> Bool #

Ord OS Source # 

Methods

compare :: OS -> OS -> Ordering #

(<) :: OS -> OS -> Bool #

(<=) :: OS -> OS -> Bool #

(>) :: OS -> OS -> Bool #

(>=) :: OS -> OS -> Bool #

max :: OS -> OS -> OS #

min :: OS -> OS -> OS #

Show OS Source # 

Methods

showsPrec :: Int -> OS -> ShowS #

show :: OS -> String #

showList :: [OS] -> ShowS #

data Plastid Source #

A enum of possible plastid types, based on either taxonomic lineage or photosynthetic capacity.

Constructors

PlastidSimple

The term Plastid is used when the capacities of the organism are unclear; for example in the parasitic plants of the Cuscuta lineage, where sometimes young tissue is photosynthetic.

PlastidApicoplast

Apicoplasts are the plastids found in Apicocomplexa parasites such as Eimeria, Plasmodium and Toxoplasma; they are not photosynthetic.

PlastidChloroplast

Chloroplasts are the plastids found in all land plants and algae with the exception of the glaucocystophyte algae (see below). Chloroplasts in green tissue are photosynthetic; in other tissues they may not be photosynthetic and then may also have secondary information relating to subcellular location (e.g. amyloplasts, chromoplasts).

PlastidOrganellarChromatophore

Chloroplasts are the plastids found in all land plants and algae with the exception of the glaucocystophyte algae (see below). Chloroplasts in green tissue are photosynthetic; in other tissues they may not be photosynthetic and then may also have secondary information relating to subcellular location (e.g. amyloplasts, chromoplasts).

PlastidCyanelle

Cyanelles are the plastids found in the glaucocystophyte algae. They are also photosynthetic but their plastid has a vestigial cell wall between the 2 envelope membranes.

PlastidNonPhotosynthetic

Non-photosynthetic plastid is used when the plastid in question derives from a photosynthetic lineage but the plastid in question is missing essential genes. Some examples are Aneura mirabilis, Epifagus virginiana, Helicosporidium (a liverwort, higher plant and green alga respectively).

data OG Source #

OrGanelle - indicates if the gene coding for a protein originates from mitochondria, a plastid, a nucleomorph or a plasmid.

Constructors

Hydrogenosome

Hydrogenosomes are membrane-enclosed redox organelles found in some anaerobic unicellular eukaryotes which contain hydrogenase and produce hydrogen and ATP by glycolysis. They are thought to have evolved from mitochondria; most hydrogenosomes lack a genome, but some like (e.g. the anaerobic ciliate Nyctotherus ovalis) have retained a rudimentary genome.

Mitochondrion

Mitochondria are redox-active membrane-bound organelles found in the cytoplasm of most eukaryotic cells. They are the site of sthe reactions of oxidative phosphorylation, which results in the formation of ATP.

Nucleomorph

Nucleomorphs are reduced vestigal nuclei found in the plastids of cryptomonad and chlorachniophyte algae. The plastids originate from engulfed eukaryotic phototrophs.

Plasmid [Text]

Plasmid with a specific name. If an entry reports the sequence of a protein identical in a number of plasmids, the names of these plasmids will all be listed.

Plastid Plastid

Plastids are classified based on either their taxonomic lineage or in some cases on their photosynthetic capacity.

Instances

Eq OG Source # 

Methods

(==) :: OG -> OG -> Bool #

(/=) :: OG -> OG -> Bool #

Ord OG Source # 

Methods

compare :: OG -> OG -> Ordering #

(<) :: OG -> OG -> Bool #

(<=) :: OG -> OG -> Bool #

(>) :: OG -> OG -> Bool #

(>=) :: OG -> OG -> Bool #

max :: OG -> OG -> OG #

min :: OG -> OG -> OG #

Show OG Source # 

Methods

showsPrec :: Int -> OG -> ShowS #

show :: OG -> String #

showList :: [OG] -> ShowS #

newtype OC Source #

Organism Classification - the taxonomic classification of the source organism.

Constructors

OC 

Fields

Instances

Eq OC Source # 

Methods

(==) :: OC -> OC -> Bool #

(/=) :: OC -> OC -> Bool #

Ord OC Source # 

Methods

compare :: OC -> OC -> Ordering #

(<) :: OC -> OC -> Bool #

(<=) :: OC -> OC -> Bool #

(>) :: OC -> OC -> Bool #

(>=) :: OC -> OC -> Bool #

max :: OC -> OC -> OC #

min :: OC -> OC -> OC #

Show OC Source # 

Methods

showsPrec :: Int -> OC -> ShowS #

show :: OC -> String #

showList :: [OC] -> ShowS #

data OX Source #

Organism taxonomy cross-reference indicates the identifier of a specific organism in a taxonomic database.

Constructors

OX 

Fields

Instances

Eq OX Source # 

Methods

(==) :: OX -> OX -> Bool #

(/=) :: OX -> OX -> Bool #

Ord OX Source # 

Methods

compare :: OX -> OX -> Ordering #

(<) :: OX -> OX -> Bool #

(<=) :: OX -> OX -> Bool #

(>) :: OX -> OX -> Bool #

(>=) :: OX -> OX -> Bool #

max :: OX -> OX -> OX #

min :: OX -> OX -> OX #

Show OX Source # 

Methods

showsPrec :: Int -> OX -> ShowS #

show :: OX -> String #

showList :: [OX] -> ShowS #

data OH Source #

Organism Host - indicates the host organism(s) that are susceptible to be infected by a virus. Appears only in viral entries.

Constructors

OH 

Fields

Instances

Eq OH Source # 

Methods

(==) :: OH -> OH -> Bool #

(/=) :: OH -> OH -> Bool #

Ord OH Source # 

Methods

compare :: OH -> OH -> Ordering #

(<) :: OH -> OH -> Bool #

(<=) :: OH -> OH -> Bool #

(>) :: OH -> OH -> Bool #

(>=) :: OH -> OH -> Bool #

max :: OH -> OH -> OH #

min :: OH -> OH -> OH #

Show OH Source # 

Methods

showsPrec :: Int -> OH -> ShowS #

show :: OH -> String #

showList :: [OH] -> ShowS #

data Token Source #

Reference comment token.

Constructors

STRAIN 
PLASMID 
TRANSPOSON 
TISSUE 

data Reference Source #

Reference lines.

Constructors

Reference 

Fields

  • rn :: Int

    Reference Number - a sequential number to each reference citation in an entry.

  • rp :: Text

    Reference Position - the extent of the work relevant to the entry carried out by the authors.

  • rc :: [(Token, Text)]

    Reference Comment - comments relevant to the reference cited.

  • rx :: [(BibliographicDB, Text)]

    Reference cross-reference - the identifier assigned to a specific reference in a bibliographic database.

  • rg :: Maybe Text

    Reference Group - the consortium name associated with a given citation.

  • ra :: [Text]

    Reference Author - authors of the paper (or other work) cited.

  • rt :: Maybe Text

    Reference Title - the title of the paper (or other work) cited as exactly as possible given the limitations of the computer character set.

  • rl :: Text

    Reference Location - he conventional citation information for the reference.

type Topic = Text Source #

The comment blocks are arranged according to what we designate as topics.

data CC Source #

Free text comments on the entry, and are used to convey any useful information.

Constructors

CC 

Fields

Instances

Eq CC Source # 

Methods

(==) :: CC -> CC -> Bool #

(/=) :: CC -> CC -> Bool #

Ord CC Source # 

Methods

compare :: CC -> CC -> Ordering #

(<) :: CC -> CC -> Bool #

(<=) :: CC -> CC -> Bool #

(>) :: CC -> CC -> Bool #

(>=) :: CC -> CC -> Bool #

max :: CC -> CC -> CC #

min :: CC -> CC -> CC #

Show CC Source # 

Methods

showsPrec :: Int -> CC -> ShowS #

show :: CC -> String #

showList :: [CC] -> ShowS #

data DR Source #

Database cross-Reference - pointers to information in external data resources that is related to UniProtKB entries.

Constructors

DR 

Fields

Instances

Eq DR Source # 

Methods

(==) :: DR -> DR -> Bool #

(/=) :: DR -> DR -> Bool #

Ord DR Source # 

Methods

compare :: DR -> DR -> Ordering #

(<) :: DR -> DR -> Bool #

(<=) :: DR -> DR -> Bool #

(>) :: DR -> DR -> Bool #

(>=) :: DR -> DR -> Bool #

max :: DR -> DR -> DR #

min :: DR -> DR -> DR #

Show DR Source # 

Methods

showsPrec :: Int -> DR -> ShowS #

show :: DR -> String #

showList :: [DR] -> ShowS #

data PE Source #

Protein existence - indication on the evidences that we currently have for the existence of a protein. Because most protein sequences are derived from translation of nucleotide sequences and are mere predictions, the PE line indicates what the evidences are of the existence of a protein.

Instances

Eq PE Source # 

Methods

(==) :: PE -> PE -> Bool #

(/=) :: PE -> PE -> Bool #

Ord PE Source # 

Methods

compare :: PE -> PE -> Ordering #

(<) :: PE -> PE -> Bool #

(<=) :: PE -> PE -> Bool #

(>) :: PE -> PE -> Bool #

(>=) :: PE -> PE -> Bool #

max :: PE -> PE -> PE #

min :: PE -> PE -> PE #

Show PE Source # 

Methods

showsPrec :: Int -> PE -> ShowS #

show :: PE -> String #

showList :: [PE] -> ShowS #

newtype KW Source #

KeyWord - information that can be used to generate indexes of the sequence entries based on functional, structural, or other categories.

Constructors

KW 

Fields

Instances

Eq KW Source # 

Methods

(==) :: KW -> KW -> Bool #

(/=) :: KW -> KW -> Bool #

Ord KW Source # 

Methods

compare :: KW -> KW -> Ordering #

(<) :: KW -> KW -> Bool #

(<=) :: KW -> KW -> Bool #

(>) :: KW -> KW -> Bool #

(>=) :: KW -> KW -> Bool #

max :: KW -> KW -> KW #

min :: KW -> KW -> KW #

Show KW Source # 

Methods

showsPrec :: Int -> KW -> ShowS #

show :: KW -> String #

showList :: [KW] -> ShowS #

data FT Source #

Feature Table - means for the annotation of the sequence data.

Constructors

FT 

Fields

Instances

Eq FT Source # 

Methods

(==) :: FT -> FT -> Bool #

(/=) :: FT -> FT -> Bool #

Ord FT Source # 

Methods

compare :: FT -> FT -> Ordering #

(<) :: FT -> FT -> Bool #

(<=) :: FT -> FT -> Bool #

(>) :: FT -> FT -> Bool #

(>=) :: FT -> FT -> Bool #

max :: FT -> FT -> FT #

min :: FT -> FT -> FT #

Show FT Source # 

Methods

showsPrec :: Int -> FT -> ShowS #

show :: FT -> String #

showList :: [FT] -> ShowS #

data SQ Source #

SeQuence header - sequence data and a quick summary of its content.

Constructors

SQ 

Fields

  • length :: Int

    Length of the sequence in amino acids.

  • molWeight :: Int

    Molecular weight rounded to the nearest mass unit (Dalton).

  • crc64 :: Text

    Sequence 64-bit CRC (Cyclic Redundancy Check) value.

  • sequence :: Text

    Sequence of the protein

Instances

Eq SQ Source # 

Methods

(==) :: SQ -> SQ -> Bool #

(/=) :: SQ -> SQ -> Bool #

Ord SQ Source # 

Methods

compare :: SQ -> SQ -> Ordering #

(<) :: SQ -> SQ -> Bool #

(<=) :: SQ -> SQ -> Bool #

(>) :: SQ -> SQ -> Bool #

(>=) :: SQ -> SQ -> Bool #

max :: SQ -> SQ -> SQ #

min :: SQ -> SQ -> SQ #

Show SQ Source # 

Methods

showsPrec :: Int -> SQ -> ShowS #

show :: SQ -> String #

showList :: [SQ] -> ShowS #

data Record Source #

Full UniProt record in UniProt-KB format.

Constructors

Record 

Fields