r      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijk l m n o p q r s t u v w x y z { | } ~  &State for import and export functions ":Infernal bit score. Behaves like a double (deriving Num).  ?Identifies a certain scaffold or chromosome where a hit occurs 0Classification names (taxonomic classification) Strict FASTA data. String name for species. "Numeric species accession number. JEMBL sequence accession based on sequence accession and sequence start to 4 stop. (Should this then be RfamSequenceAccession?) KString identifier of a covariance model or Stockholm multiple alignment as  in 5S_rRNA. CThe numeric identifier of a covarience model or Stockholm multiple  alignment as in RFxxxxx.  !Clan model name "#$Clan accession identifier %&'Simple function to create  from a . "  !"#$%&'"$%&!"# '  "      !"#"#$%&%&'(2Map of model accession numbers to individual CMs. )&Map of model names to individual CMs. *BA datatype representing Infernal covariance models. This is a new @ representation that is incompatible with the one once found in Biobase. N The most important difference is that lookups are mapped onto efficient data  structures, currently PrimitiveArray.  1 Each State< of a covariance model has up to 6 transition scores, hence $ we need s*6 cells for transitions. 2 Each State: of a covariance has up to 16 emission scores, so we have I s*16 cells for emissions, with unused cells set to a really high score. FOn top of these basic structures, we then place additional high-level  constructs.  3 38 are allowed transitions. This can safe a check, if the / transition is encoded with a forbidden score. 4 4 and 6( are local entry and exit strategies. A  4? is a transition score to certain states, all such transitions  are in 5. A 6- is a transition score to a local end state. 2NOTE that trustedCutoff > gathering > noiseCutoff 5TODO as with other projects, we should not use Double's but Score and   Probability newtypes. +,name of model as in tRNA -RFxxxxx identification .lowest score of true member /all scores at or above / score are in the full alignment 0%highest score NOT included as member 1234567()*+,-./01234567*+,-./01234567)(()* +,-./01234567+,-./01234567 89:;<.iteratee-based parsing of human-readable CMs. =>?@IRead covariance models from file. This parser reads one or more CMs from  file. A/Read covariance models from a compressed file. 89:;<=>?@A <=89:;>?@A 89:;9:;<=>?@A B$Generalized accessors for VerboseHit's and TabularHit's. CModel name (like 5S_rRNA). DHTarget name, typically the scaffold or chromosome where the hit occurs. EStart of submodel. FStop of submodel. GStart of substring in target. HStop of substring in target. I)Bit score of the hit of model in target. JIEvalue, expectation of bit score of higher in target sequence of length. KG/C content in target. BCDEFGHIJK BCDEFGHIJK B CDEFGHIJKCDEFGHIJK L+Model identifier and sequence accession to QP entry. M*Model accession and sequence accession to QP entry (and model / accession to all entries for this accession). N$Model identifier to model accession O$Model accession to model identifier PRfam FASTA entry. QR0Rfam accession number RFxxxxx (the xxxxx part). S Rfam identifier (like 5S_rRNA). T1EMBL sequence accession identifier and position. URfam species accession. VSpecies name. W FASTA data 2Since RfamFasta entries are just fasta entries... LMNOPQRSTUVW PQRSTUVWONML LMNOPQRSTUVWQRSTUVWX4Enumeratee for RfamFasta entries from a ByteString. YGCreate a mapping between rfam family accession numbers and rfam family  names. ZECreate a mapping between rfam family names and rfam family accession  numbers. [HProvides a mapping between (Rfam accession, sequence accession) and the  complete QP. \LProvides a mapping between (Rfam name, sequence accession) and the complete  QP. ](Convenience function creating all maps. ^(Convenience function creating all maps. XYZ[\]^XYZ[\]^XYZ[\]^ _Individual sequence scores. 3TODO avgProbability should use Probability newtype `a8sequence name, typically RFxxxxxx;RfamID;embl-accession baligned sequence length ctotal alignment bitscore dstructural score part ef8cmalign results, includes sequence scores if available. #TODO stockholmAlignment, should be  biostockholm (will be set after some # fun iteratee tests). For now, the  holds everything needed to  parse using biostockholm. ghij _`abcdefghij fghij_`abcde _`abcde`abcdefghijghij k!Transforms bytestring to list of gf data. l%Creates the required sequence score. (Convenience function creating all maps. m(Convenience function creating all maps. klmklmklm nSimple Rfam clan data. opresult of the  AC CL00001 line, keeping 1 in this case. qthe  ID tRNA line, keeping tRNA. rall the MB RF00005;, MB RF00023; lines, keeping [5,23]. sHall lines of each clan, without any processing (except being in lines). nopqrsnopqrsnopqrsopqrs t;Import the complete data from an uncompressed source file. u&Transform a bytestring into a list of ons. v.Given a list of bytestrings, create one Clan. DTODO return Maybe, make crash-safe (not really high on the list...) tuvtuvtuv wHFor each species, we store the name and a classification list from most J general (head) to most specific (last). The database comes with the NCBI  taxon identifier (taxid). xyz{|Given a name such as Drosophila Melanogaster , returns d.melanogaster. wxyz{|wxyz{|wxyz{xyz{| }=Provide name-based lookup as the most-common usage scenario. -TODO there are 9 duplicates in the names, let's find them and see what is  going on ~And a map based on taxon id Imports taxonomy data. Given a , create a species entry. MNOTE The taxonomy format is, for each species, a line consisting of: taxid - M tab - species name - tab - semicolon separated list of classification names  - dot - end of line. NConvenience function: given a taxonomy file, produce both maps simultanously. }~}~}~Captures a complete alignment .part of target sequence (start counting at 1) which part of the CM/stk do we align to which part of the CM/stk do we align to the CM for this alignment should be either  or   bit score &number of hits we expect to find with score or higher for targetSequence length ? ? Mscaffold, chromosome, ... (the name of the sequence, not the sequence data!) 9fancy secondary structure annotation using wuss notation +query consensus (upper: highly, lower: weak/no) 8represents where positive and negative scores come from .the target sequence which aligns to the model *any annotations that could be associated (# lines) Generalized accessors. FTransforms a stream into verbose hits. We need to keep a state in the C accumulator to keep track of the current CM, scaffold and strand. Parses one CM query result. $Parses multiple four-line elements. ;Convenience function: read all results into a single list. 7This transformer keeps a 1-1 relationship between each  and E bytestring representation. Useful for merging different streams, if  individual s are to be annotated. Given the current state a and verbose hit h, determine if any state  switches have to be emitted.  Convert a 1 to a string, ready for printing as in the input  file.  GTabular Infernal hits. See Biobase.Infernal.Hit for description of the  individual fields. Generalized accessors. &Transform a stream into tabular hits. HConvenience function to load from file and return a big list of tabular  hits. 0nopqrswxyz{0wxyz{nopqrs  !""#$$%&&'(()**+,,-../01233456789:;<=>?@@ABCDEFGHIJKLMNOPQRSTUVWWXYZ[\]^_`abHGccdefghiijkl m H G n n o p q r G s t u u v w x y z { | } G~GG BiobaseInfernal-0.6.2.0$Biobase.Infernal.VerboseHit.InternalBiobase.Infernal.TypesBiobase.Infernal.CMBiobase.Infernal.CM.ImportBiobase.Infernal.HitBiobase.Infernal.RfamFasta!Biobase.Infernal.RfamFasta.ImportBiobase.Infernal.AlignBiobase.Infernal.Align.ImportBiobase.Infernal.ClanBiobase.Infernal.Clan.ImportBiobase.Infernal.Taxonomy Biobase.Infernal.Taxonomy.ImportBiobase.Infernal.VerboseHit"Biobase.Infernal.VerboseHit.Import"Biobase.Infernal.VerboseHit.ExportBiobase.Infernal.TabularHit"Biobase.Infernal.TabularHit.ImportBiobase.InfernalBiobase.Infernal.CM.ExportAliGoaliCM aliScaffold aliStrand aliAnnotationBitScore unBitScoreScaffold unScaffoldClassificationunClassification StrictSeqDataunStrictSeqData SpeciesName unSpeciesNameSpeciesAccessionunSpeciesAccession EmblAccessionunEmblAccessionModelIdentificationunModelIdentificationModelAccessionunModelAccessionClanIdentificationunClanIdentification ClanAccessionunClanAccessionmkEmblAccessionAC2CMID2CMCMname accession trustedCutoff gathering noiseCutoff transitionemissionpaths localBeginbeginslocalEndnodesNode nodeHeader nodeIndexeneeCM iterNodes isNodeHeaderisStatefromFile fromFileZipHitmodeltarget modelStart modelStop targetStart targetStopbitScoreevalue gcPercentIDAC2RfamFastaACAC2RfamFasta ModelID2AC ModelAC2ID RfamFastamodelAccessionmodelIdentifiersequenceAccessionspeciesAccession speciesName fastaData eneeRfamFasta iModelAC2ID iModelID2ACiACAC2RfamFastaiIDAC2RfamFasta SequenceScore sequenceNamesLength totalBitScorestructureBitScoreavgProbabilityAlignmodelIdentificationsequenceScoresstockholmAlignment eneeAlignClan cAccession cIdentifiercMemberscStringsfromByteStringmkClanSpeciesTaxonomy stAccessionstNamestClassification shortenName iSpeciesMap iTaxIdMap eneeSpecies mkSpeciesStrand VerboseHit vhTargetStart vhTargetStop vhModelStart vhModelStopvhModelvhStrand vhBitScorevhEvaluevhPvalue vhGCpercentvhTargetvhWuss vhConsensus vhScoring vhSequence vhAnnotationeneeVerboseHiteneeByteStringeneeByteStringsshowVerboseHit TabularHitthModelthTarget thTargetStart thTargetStop thModelStart thModelStop thBitScorethEvalue thGCpercenteneeTabularHit thFromFile vhFromFilevhEneeByteStringvhEneeByteStrings tFromFile cFromFilebytestring-0.9.2.0Data.ByteString.Internal ByteStringmkMap$fBioSeqRfamFastamkScorebaseGHC.Num+-$fHitVerboseHitqsnewAcc$fHitTabularHit