!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  +Break a list of bytestrings on a predicate 9Output (to stderr) progress while evaluating a lazy list L Useful for generating output while (conceptually, at least) in pure code " Strictness warning!! This doesn'#t *quite* work in all cases. Why? 1Data structure for storing hierarchical clusters )Single linkage agglomerative clustering. T Cluster elements by slurping a sorted list of pairs with score (i.e. triples :-) 3 Keeps a set of contained elements at each branch's root, so O(n log n), ' and requires elements to be in Ord. Z For this to work, the triples must be sorted on score. Earlier scores in the list will W make up the lower nodes, so sort descending for similarity, ascending for distance.  A 7 may contain multiple separate matches (typcially when B an indel causes a frameshift that blastx is unable to bridge). >Each match between a query and a target sequence (or subject)  is a .  Each query sequence generates a  A  is the root of the hierarchy. &JThe Aux field in the BLAST output includes match information that depends R on the BLAST flavor (blastn, blastx, or blastp). This data structure captures  those variations. 'blastx (blastn )The )B indicates the direction of the match, i.e. the plain sequence or  its reverse complement. ,:The sequence id, i.e. the first word of the header field. %  !"#$%&'()*+,%,)+*&(' !"#$% %    !"#$% !"#$%&(''()+**+,-GThe BlastFlat data structure contains information about a single match ;SConvert BlastRecords into BlastFlats (representing a depth-first traversal of the  BlastRecord structure.)  !"#$%&'()*+-./0123456789:;-./0123456789:; !"#$%&(')+*- ./0123456789:./0123456789:;<<<#breaks p = groupBy (const (not.p)) ===>>Evidence codes describe the type of support for an annotation   -http://www.geneontology.org/GO.evidence.shtml MRGO maps GO terms (GO:xxxx for some number xxxx) to biologically meaningful terms.  Defined in  0http://www.geneontology.org/doc/GO.terms_and_ids  The format is )GO:0000000 [tab] text string [tab] F|P|C O*GOA Annotation - or multiple annotations? X7Read the goa_uniprot file (warning: this one is huge!) YRead GO term definitions ]JThe vast majority of GOA data is IEA, while the most reliable information L is manually curated. Filtering on this is useful to keep data set sizes  manageable, too. >?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\] XYZVWUQTSROP[MN\>LKJIHGFEDCBA@?] >LKJIHGFEDCBA@??@ABCDEFGHIJKLMNNOPPQTSRRSTUVWWXYZ[\] ^_`^_`^__` {VA sequence consists of a header, the sequence data itself, and optional quality data. |header and actual sequence }Quality data is a ~ $ vector, currently implemented as a  ByteString. ~HBasic type for quality data. Range 0..255. Typical Phred output is in N the range 6..50, with 20 as the line in the sand separating good from bad. The basic data type used in { s !An offset, index, or length of a   Convert a String to    Convert a   to a String >Read the character at the specified position in the sequence. Return sequence length. -Return sequence label (first word of header) Return full header. Return the sequence data. KReturn the quality data, or error if none exist. Use hasqual if in doubt. 8Check whether the sequence has associated quality data. "Calculate the reverse complement. 7 This is only relevant for the nucleotide alphabet, 0 and it leaves other characters unmodified. AComplement a single character. I.e. identify the nucleotide it H can hybridize with. Note that for multiple nucleotides, you usually $ want the reverse complement (see   for that). ?Translate a nucleotide sequence into the corresponding protein J sequence. This works rather blindly, with no attempt to identify ORFs  or otherwise QA the result. =Convert a list of amino acids to a sequence in IUPAC format. =Convert a sequence in IUPAC format to a list of amino acids. /abcdefghijklmnopqrstuvwxyz{|}~/{|~}azyxwvutsrqponmlkjihgfedcb/azyxwvutsrqponmlkjihgfedcbbcdefghijklmnopqrstuvwxyz{||}~ 2Lazily read sequences from a FASTA-formatted file +Write sequences to a FASTA-formatted file.  Line length is 60. +Read quality data for sequences to a file. ,Write quality data for sequences to a file. 5Read sequence and associated quality. Will error if C the sequences and qualites do not match one-to-one in sequence. .Write sequence and quality data simulatnously ' This may be more laziness-friendly. !Lazily read sequence from handle -Write sequences in FASTA format to a handle. BConvert a list of FASTA-formatted lines into a list of sequences.  Blank lines are ignored.  Comment lines start with #. are allowed between sequences (and ignored).  Lines starting with > initiate a new sequence. &Split lines into blocks starting with  characters  Filter out # comments (but not semicolons?) ;Parse a (lazy) ByteString as sequences in the 2bit format. .Extract sequences from a file in 2bit format. 4Extract sequences in the 2bit format from a handle. ,Write sequences to file in the 2bit format. 0Write sequences to a handle in the 2bit format.  9Parse a .phd file, extracting the contents as a Sequence "Parse .phd contents from a handle The actual phd parser. 2Pack bytestring segments into a single bytestring 2 Allows the (rest of the) file contents to be GC'ed ;This is a struct for containing a set of hashing functions 6calculates the hash at a given offset in the sequence 8calculate all hashes from a sequence, and their indices for sorting hashes Adds a default hashes function to a HashF, when hash is defined. Contigous constructs an int/eger from a contigous k-word. Like C, but returns the same hash for a word and its reverse complement. Like rcontigK, but ignoring monomers (i.e. arbitrarily long runs of a single nucelotide - are treated the same a single nucleotide.  7A Selector consists of a zero element, and a funcition L that chooses a possible Edit operation, and generates an updated result. KA substitution matrix gives scores for replacing a character with another. $ Typically, it will be symmetric. %An alignment is a sequence of edits. /An Edit is either the insertion, the deletion, & or the replacement of a character. /The sequence element type, used in alignments. Gaps are coded as +s, this function removes them, and returns 6 the sequence along with the list of gap positions. &turn an alignment into sequences with  representing gaps " (for checking, filtering out the  characters should return " the original sequences, provided  isn't part of the sequence  alphabet) True if the Edit is a Repl. 2Evaluate an Edit based on SubstMx and gap penalty -Calculate a set of columns containing scores [ This represents the columns of the alignment matrix, but will only require linear space  for score calculation. The standard BLOSUM45 matrix. The standard BLOSUM62 matrix. The standard BLOSUM80 matrix. The standard PAM30 matrix The standard PAM70 matrix. 7Blast defaults, use with gap_open = -5 gap_extend = -3 G This should really check for valid nucleotides, and perhaps be more ( lenient in the case of Ns. Oh well. Construct a simple matrix from match score/mismatch penalty BCalculate global edit distance (Needleman-Wunsch alignment score) Scoring/(selection function for global alignment ?Calculate local edit distance (Smith-Waterman alignment score) Scoring/'selection funciton for local alignmnet Calculate alignments. -Minus infinity (or an approximation thereof) BCalculate global edit distance (Needleman-Wunsch alignment score) ?Calculate local edit distance (Smith-Waterman alignment score) DGeneric scoring and selection function for global and local scoring .Calculate global alignment (Needleman-Wunsch) +Calculate local alignmnet (Smith-Waterman) =Generic scoring and selection for global and local alignment AThe selector must take into account the quality of the sequences  on Ins/FDel, the average of qualities surrounding the gap is (should be) used -Minus infinity (or an approximation thereof) BCalculate global edit distance (Needleman-Wunsch alignment score) ?Calculate local edit distance (Smith-Waterman alignment score) DGeneric scoring and selection function for global and local scoring .Calculate global alignment (Needleman-Wunsch) +Calculate local alignmnet (Smith-Waterman) =Generic scoring and selection for global and local alignment  The Parsec parser type !ACE header lines with parameters F The tokenizer (scanner) should convert input into a list of these, ) which in turn can be parsed by Parsec 'Parse a single token, primitive parser (Test parser p on a list of ACE elements %Add SourcePoses to a stream of ACEs. 2Parse a complete ACE file as a set of assemblies. parse the initial header 2parse the contig and quality information (CO, BQ) 'Read a list of Ints in the Maybe monad Given the CO info, get the AFS'es =Parse a list of AFS, followed by actual read, and merge them ' afs :: Sequence -> AceParser [Sequence] -- plus some auxiliary info? parse each read (RD, QA, DS) Reading an ACE file. G^_`abcdefghijklmnopqrstuvwxyz{|}~G{|~}azyxwvutsrqponmlkjihgfedcb^_` Progressive multiple alignment. > Calculate a tree from agglomerative clustering, then align G at each branch going bottom up. Returns a list of columns (rows?). ODerive alignments indirectly, i.e. calculate A|C using alignments A|B and B|C.  This is central for Coffee5 evaluation of alignments, and T-Coffee construction  of alignments.  !""#$%&'()*++,-.//01233456789:;<=>?@ABCC01,-#$%&'()*DEFGHIJKLMNOPQRSTUVVWXYZ[\]^_`abcde f g h i j k l m n o p q r s t u v w x y z { | } ~              bio-0.3.3.2Bio.Util.ParsexBio.UtilBio.ClusteringBio.Alignment.BlastDataBio.Alignment.BlastFlatBio.Alignment.BlastBio.Alignment.BlastXMLBio.Sequence.GOABio.Sequence.EntropyBio.Sequence.SeqDataBio.Sequence.FastaBio.Sequence.TwoBitBio.Sequence.PhdBio.Sequence.HashWordBio.Alignment.AlignDataBio.Alignment.MatricesBio.Alignment.SAlignBio.Alignment.AAlignBio.Alignment.QAlignBio.Alignment.ACEBio.Alignment.MultiplebaseData.OrdPrelude Bio.SequencelazyMany splitWhencountIO sequence' ClusteredLeafBranch cluster_sl BlastMatchbitse_validentityq_fromq_toh_fromh_toauxBlastHitsubjectslengthmatches BlastRecordqueryqlengthhits BlastResult blastprogram blastversion blastdateblastreferencesdatabase dbsequencesdbcharsresultsAuxFrameStrandsStrandMinusPlusSeqId BlastFlatflattenparsereadXML EvidenceCodeNRTASRCANDNASISSIPIIMPIGIIGCIEPIEAIDAICGoDef AnnotationAnnGoClassCompProcFunc UniProtAccGoTermGOreadGOAreadGO decommentmkAnnmkGoDef isCuratedKWordskwordsentropyAminoXaaXleGlxAsxSTPValTrpTyrThrSerProPheMetLysLeuIleHisGlyGluGlnCysAspAsnArgAlaSequenceSeqQualDataQualSeqDataOffsetfromStrtoStr!? seqlengthseqlabel seqheaderseqdataseqqualhasqualrevcomplcompl translatetoIUPAC fromIUPAC readFasta writeFastareadQual writeQual readFastaQualwriteFastaQualhWriteFastaQual hReadFasta hWriteFasta hWriteQualmkSeqs countSeqs decode2Bitread2Bit hRead2BitreadPhdhReadPhdShapeHashFHFhashhashesksortgenkeys contigousrcontigcompactrcpackedgappedisNn2kn2i'k2nvalunval complementSelectorSubstMxEditListEditReplDelInsChr AlignmentGapsDirRevFwd extractGaps insertGaps toStringsisReplevalcolumnsblosum45blosum62blosum80pam30pam70blastn_defaultsimpleMx global_score local_score global_align local_alignqualMxAssemblyAsmcontig fragmentsreadsptestreadACEwriteACE progressiveindirectlinesmylinesbreaksblocks GHC.Classes> write2Bit hWrite2BitmkPhdk2n'GHC.Num*- showalignong_scorel_scoreminf score_select align_select QSelector AceParserACEparse1sourceaceasctgreadIntsasmafrd