"      !"#$%&'()*+,-./0123456789:;<=>?@ABCDE F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~        !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~        !!!!!!!!!!!!""""""""""#########################$$$$$$$$$$$$$$$$$$$$$$$$$$$$%%%%%%&&&&&&&&&&&&&&&&&&&&&&&&&&''''''''''''((( ( ( ( ( ((((((()))))))))))) *!*+Workaround, the current Data.ByteString.Lazy.Char8 contains a bug in  Data.ByteString.Lazy.Char8.lines. ,Break a list of bytestrings on a predicate. :Output (to stderr) progress while evaluating a lazy list. L Useful for generating output while (conceptually, at least) in pure code A lazier version of Control.Monad.sequence in  Control.Monad , needed by  above.     1Data structure for storing hierarchical clusters )Single linkage agglomerative clustering. T Cluster elements by slurping a sorted list of pairs with score (i.e. triples :-) 3 Keeps a set of contained elements at each branch's root, so O(n log n), ' and requires elements to be in Ord. Z For this to work, the triples must be sorted on score. Earlier scores in the list will W make up the lower nodes, so sort descending for similarity, ascending for distance.      A 7 may contain multiple separate matches (typcially when B an indel causes a frameshift that blastx is unable to bridge). >Each match between a query and a target sequence (or subject)  is a .  Each query sequence generates a  "A #" is the root of the hierarchy. ,JThe Aux field in the BLAST output includes match information that depends R on the BLAST flavor (blastn, blastx, or blastp). This data structure captures  those variations. -blastx .blastn /The /B indicates the direction of the match, i.e. the plain sequence or  its reverse complement. 2:The sequence id, i.e. the first word of the header field. % !"#$%&'()*+,-./012%2/10,.-"#$%&'()*+ !%  ! !" #$%&'()*+#$%&'()*+,.--./100123334"Parse BLAST results in XML format "#breaks p = groupBy (const (not.p)) 4445GThe BlastFlat data structure contains information about a single match DSConvert BlastRecords into BlastFlats (representing a depth-first traversal of the  BlastRecord structure.) $%&'()*+,-./0156789:;<=>?@ABCD56789:;<=>?@ABCD$%&'()*+,.-/105 6789:;<=>?@AB6789:;<=>?@ABCD E;Read names encode various information, as per this struct. EFGHIJKLMNOPQRSTUVWEFGHIJKLMNOPQRSTUVWEFGHIJKFGHIJKLMNOPQRSTUVW X>Evidence codes describe the type of support for an annotation   -http://www.geneontology.org/GO.evidence.shtml YNot Recorded ZTraceable Author Statement [.Inferred from Reviewed Computational Analysis \No biological Data available ]Non-traceable Author Statement ^0Inferred from Sequence or Structural Similarity _#Inferred from Physical Interaction `Inferred from Mutant Phenotype a"Inferred from Genetic Interaction bInferred from Genomic Context c!Inferred from Expression Pattern d$Inferred from Electronic Annotation eInferred from Direct Assay fInferred by Curator gRA GOA annotation, containing a UniProt identifier, a GoTerm and an evidence code. i=A UniProt identifier (short string of capitals and numbers). jA GoDef maps a GoTerm to a description and a GoClass. p A GO term is a positive integer rSA list of Go definitions, with pointers to parent nodes. Read from the .obo file. U The user may construct the explicit hierachy by storing these in a Map or similar sXRead the GO hierarchy from the obo file. Note that this is not quite a tree structure. t7Read the goa_uniprot file (warning: this one is huge!) u9Read GO term definitions, from the GO.terms_and_ids file #Parse a GoDef+ from a line in the GO.terms_and_ids file. $ Reading an  Annotation& from a line in the association file. %?Read the evidence code from a ByteString (no error checking!). wJThe vast majority of GOA data is IEA, while the most reliable information L is manually curated. Filtering on this is useful to keep data set sizes  manageable, too. XYZ[\]^_`abcdefghijklmnopqrstuvw pqjkrsughilonmXfedcba`_^]\[ZYtwv Xfedcba`_^]\[ZYYZ[\]^_`abcdefghhijkklonmmnopqqrstuvw zJMost KEGG files that contain associations, have one association per line, R consisting of two items separated by whitespace. This is a generalized reader  function. {'Convert UniProt IDs (up:xxxxxx) to the  UniProtAcc type. |!Convert KO IDs (ko:xxxxx) to the KO data type. }RKEGG uses strings with an identifying prefix for IDs. This helper function checks 2 and removes prefix to construct native values. xyz{|}zxy{|}xyyz{|} !XYZ[\]^_`abcdefghijklmnopqrstuvw~~~ 2For type tagging sequences (protein sequences use  below) VA sequence consists of a header, the sequence data itself, and optional quality data. X The type parameter is a phantom type to separate nucleotide and amino acid sequences header and actual sequence Quality data is a $ vector, currently implemented as a  ByteString. HBasic type for quality data. Range 0..255. Typical Phred output is in N the range 6..50, with 20 as the line in the sand separating good from bad. The basic data type used in s !An offset, index, or length of a  Phantom type functionality =Returns a properly formatted and probably highlighted string H | representation of a sequence. Highlighting is done using ANSI-Escape  | sequences. MA simple function to display a sequence: we generate the sequence string and  | call putStrLn &KSplits a string into parts of size width. The last element can be shorter. Convert a String to   Convert a  to a String >Read the character at the specified position in the sequence. Return sequence length. -Return sequence label (first word of header) Return full header. Return the sequence data. KReturn the quality data, or error if none exist. Use hasqual if in doubt. 8Check whether the sequence has associated quality data. 5Modify the header by appending text, or by replacing 1 all but the sequence label (i.e. first word). @Returns a sequence with all internal storage freshly copied and = with sequence and quality data present as a single chunk. %By freshly copying internal storage,  allows garbage @ collection of the original data source whence the sequence was A read; otherwise, use of just a short sequence name can cause an - entire sequence file buffer to be retained. 1By compacting sequence data into a single chunk,  avoids D linear-time traversal of sequence chunks during random access into  sequence data. "Calculate the reverse complement. 7 This is only relevant for the nucleotide alphabet, 0 and it leaves other characters unmodified. 1Calculate the reverse complent for SeqData only. AComplement a single character. I.e. identify the nucleotide it H can hybridize with. Note that for multiple nucleotides, you usually $ want the reverse complement (see  for that). ?Translate a nucleotide sequence into the corresponding protein J sequence. This works rather blindly, with no attempt to identify ORFs  or otherwise QA the result. =Convert a list of amino acids to a sequence in IUPAC format. =Convert a sequence in IUPAC format to a list of amino acids. 999 2Lazily read sequences from a FASTA-formatted file +Write sequences to a FASTA-formatted file.  Line length is 60. +Read quality data for sequences to a file. ,Write quality data for sequences to a file. 5Read sequence and associated quality. Will error if C the sequences and qualites do not match one-to-one in sequence. .Write sequence and quality data simulatnously ' This may be more laziness-friendly. !Lazily read sequence from handle -Write sequences in FASTA format to a handle. BConvert a list of FASTA-formatted lines into a list of sequences.  Blank lines are ignored.  Comment lines start with #. are allowed between sequences (and ignored).  Lines starting with > initiate a new sequence. '&Split lines into blocks starting with (+, characters  Filter out # comments (but not semicolons?) -Parse one FastQ entry, suitable for using in )+- over  *./ from a file ;Parse a (lazy) ByteString as sequences in the 2bit format. @Marshall from neutral representation to the 2Bit ByteString rep /Read sequences from a file in 2bit format and  | unmarshall/"deserialize into Sequence format. 9Read sequences from a file handle in the 2bit format and  | unmarshall/!deserialze into Sequence format. Marshall/serialize [Sequence]( into 2Bit format and write to a file. Marshall/serialize [Sequence]5 into 2Bit format and write to a file using handle. 9Parse a .phd file, extracting the contents as a Sequence "Parse .phd contents from a handle +The actual phd parser. ,2Pack bytestring segments into a single bytestring 2 Allows the (rest of the) file contents to be GC'ed ;This is a struct for containing a set of hashing functions 6calculates the hash at a given offset in the sequence 8calculate all hashes from a sequence, and their indices for sorting hashes Adds a default hashes function to a HashF, when hash is defined. Contigous constructs an int/eger from a contigous k-word. Like C, but returns the same hash for a word and its reverse complement. Like rcontigK, but ignoring monomers (i.e. arbitrarily long runs of a single nucelotide - are treated the same a single nucleotide. -5This contains the actual flowgram for a single read. "Each Read has a fixed read header  SFF has a 31-byte common header @ Todo: remove items that are derivable (counters, magic, etc) 3 cheader_lenght points to the first read header. H Also, the format is open to having the index anywhere between reads, I we should really keep count and check for each read. In practice, it ' seems to be places after the reads. CThe following two fields are considered part of the header, but as < they are static, they are not part of the data structure : magic :: Word32 -- ^ 0x2e736666, i.e. the string .sff ) version :: Word32 -- ^ 0x00000001 Points to a text(?) section JThe data structure storing the contents of an SFF file (modulo the index) The type of flowgram value  Write an  to the specified file name   Write an - to the specified file name, but go back and ? update the read count. Useful if you want to output a lazy  stream of s.  test serialization by output'$ing the header and first two reads : in an SFF, and the same after a decode + encode cycle.  1Convert a file by decoding it and re-encoding it & This will lose the index (which isn't really necessary) .!Generalized function for padding /%Generalized function to skip padding 0A ReadBlock can'9t be an instance of Binary directly, since it depends on & information from the CommonHeader. 1What the name and type says. 0EFGHIJKLO   0   EFGHIJKLO$      7A Selector consists of a zero element, and a funcition L that chooses a possible Edit operation, and generates an updated result.  KA substitution matrix gives scores for replacing a character with another. Y Typically, it will be symmetric. It is type-tagged with the alphabet - Nuc or Amino. %An alignment is a sequence of edits. /An Edit is either the insertion, the deletion, & or the replacement of a character. /The sequence element type, used in alignments. Gaps are coded as 2+0+s, this function removes them, and returns 6 the sequence along with the list of gap positions. &turn an alignment into sequences with 3+0 representing gaps " (for checking, filtering out the 3+0 characters should return " the original sequences, provided 3+0 isn't part of the sequence  alphabet) True if the Edit is a Repl. 2Evaluate an Edit based on SubstMx and gap penalty -Calculate a set of columns containing scores [ This represents the columns of the alignment matrix, but will only require linear space  for score calculation.   45    :BLOSUM45 matrix, suitable for distantly related sequences  The standard BLOSUM62 matrix. !:BLOSUM80 matrix, suitable for closely related sequences. "The standard PAM30 matrix #The standard PAM70 matrix. $7Blast defaults, use with gap_open = -5 gap_extend = -3 G This should really check for valid nucleotides, and perhaps be more ( lenient in the case of Ns. Oh well. %Construct a simple matrix from match score/mismatch penalty  !"#$% !"#$% !"#$%&BCalculate global edit distance (Needleman-Wunsch alignment score) 6Scoring/(selection function for global alignment '?Calculate local edit distance (Smith-Waterman alignment score) 7Scoring/'selection funciton for local alignmnet (Calculate alignments. &'()')&(&'()8-Minus infinity (or an approximation thereof) *BCalculate global edit distance (Needleman-Wunsch alignment score) +?Calculate local edit distance (Smith-Waterman alignment score) 9DGeneric scoring and selection function for global and local scoring ,.Calculate global alignment (Needleman-Wunsch) -+Calculate local alignmnet (Smith-Waterman) :=Generic scoring and selection for global and local alignment *+,-+-*,*+,- ;AThe selector must take into account the quality of the sequences  on Ins/FDel, the average of qualities surrounding the gap is (should be) used <-Minus infinity (or an approximation thereof) /BCalculate global edit distance (Needleman-Wunsch alignment score) 0?Calculate local edit distance (Smith-Waterman alignment score) 1?Calucalte best overlap score, where gaps at the edges are free K The starting point is like for local score (0 cost for initial indels), V the result is the maximum anywhere in the last column or bottom row of the matrix. =DGeneric scoring and selection function for global and local scoring 2.Calculate global alignment (Needleman-Wunsch) 3+Calculate local alignment (Smith-Waterman)  (can we replace uncurry max'? with fst - a local alignment must always end on a subst, no?) 4?Calucalte best overlap score, where gaps at the edges are free K The starting point is like for local score (0 cost for initial indels), V the result is the maximum anywhere in the last column or bottom row of the matrix. >HVariant that retains indels to retain the entire sequence in the result ?=Generic scoring and selection for global and local alignment ./01234503/214.5./012345 @The Parsec parser type A!ACE header lines with parameters F The tokenizer (scanner) should convert input into a list of these, ) which in turn can be parsed by Parsec B'Parse a single token, primitive parser ;(Test parser p on a list of ACE elements C%Add SourcePoses to a stream of ACEs. D2Parse a complete ACE file as a set of assemblies. Eparse the initial header F2parse the contig and quality information (CO, BQ) G'Read a list of Ints in the Maybe monad HGiven the CO info, get the AFS'es I=Parse a list of AFS, followed by actual read, and merge them ' afs :: Sequence -> AceParser [Sequence] -- plus some auxiliary info? Jparse each read (RD, QA, DS) Y Vector NTI appears to insert solitary RDs, sometimes even without any sequence data!? ( This is not supported at this point. <Reading an ACE file. 6789:;<=<=6789;:6789789:;<=B For benchmarking, fixed lengths HFor testing, variable lengths R-Take time (CPU and wall clock) and report it SPrint a CPUTime difference TShamelessly stolen from FPS U Constrained position generators >?@ABCDEFGHIJKLMNOPQRSTUVWNOLMJKPQHIFGDEBC@A>?RSTUVW>??@AABCCDEEFGGHIIJKKLMMNOOPQRSTUVWX@A nucleotide sequence or location on a nucleotide sequence that 5 lies on a specific strand and has an orientation. ZSequence strand ]Convert the orientation of a X thing based on a  specified Z XYZ[\]Z\[XY]XYYZ\[[\]^Position in a sequence `0-based index of the position aStrand of the position b@Returns a position resulting from sliding the original position C along the sequence by a specified offset. A positive offset will " move the position away from the 5'! end of the forward stand of the B sequence regardless of the strand of the position itself. Thus, 6 slide (revCompl pos) off == revCompl (slide pos off) c@Extract the nucleotide at a specific sequence position. If the E position lies outside the bounds of the sequence, an error results. dAs c0, extract the nucleotide at a specific sequence  position, but return N$ when the position lies outside the  bounds of the sequence. : seqNtPadded sequ pos == (either 'N' id . seqNt sequ) pos eLDisplay a human-friendly, zero-based representation of a sequence position. ^_`abcde^_`abcde^_`a_`abcdef;Contiguous sequence location defined by a span of sequence 8 positions, lying on a specific strand of the sequence. hThe offset of the 5') end of the location, as a 0-based index iThe length of the location jThe strand of the location k>Create a sequence location lying between 0-based starting and  ending offsets. When start < end, the location 7 be on the forward strand, otherwise it will be on the  reverse complement strand. l=Create a sequence location from the sequence position of the C start of the location and the length of the position. The strand A of the location, and the direction it extends from the starting B position, are determined by the strand of the starting position. mAThe bounds of a sequence location. This is a pair consisting of E the lowest and highest sequence offsets covered by the region. The B bounds ignore the strand of the sequence location, and the first ; element of the pair will always be lower than the second. n>Sequence position of the start of the location. This is the 5' B end on the location strand, which will have a higher offset than  o if the location is on the [ strand. o>Sequence position of the end of the location, as described in n. p@Returns a location resulting from sliding the original location C along the sequence by a specified offset. A positive offset will " move the location away from the 5'! end of the forward stand of the B sequence regardless of the strand of the location itself. Thus, 8 slide (revCompl cloc) off == revCompl (slide cloc off) qExtract the nucleotide  for the sequence location. If C any part of the location lies outside the bounds of the sequence,  an error results. rAs q-, extract the nucleotide subsequence for the C location. Any positions in the location lying outside the bounds ! of the sequence are returned as N! rather than producing an error. sBGiven a sequence position and a sequence location relative to the A same sequence, compute a new position representing the original C position relative to the subsequence defined by the location. If > the sequence position lies outside of the sequence location,  Nothing8 is returned; thus, the offset of the new position will  always be in the range [0, length cloc - 1]. t>Given a sequence location and a sequence position within that E location, compute a new position representing the original position @ relative to the outer sequence. If the sequence position lies  outside the location, Nothing is returned. This function inverts s! when the sequence position lies 6 within the position is actually within the location. u?Returns a sequence location produced by extending the original + location on each end, based on a pair of ( 5\' extension, /3'  extension/ ). The 5' extension is applied to the 5' end of the < location on the location strand; if the location is on the  [ strand, the 5'( end will have a higher offset than the  3'9 end and this offset will increase by the amount of the 5'  extension. Similarly, the 3' extension is applied to the 3' end  of the location. vReturns True2 when a sequence position lies within a sequence > location on the same sequence, and occupies the same strand. wReturns True, when two sequence locations overlap at any  position. xLDisplay a human-friendly, zero-based representation of a sequence location. fghijklmnopqrstuvwxfghijklmnostvwqrpuxfghijghijklmnopqrstuvwx y@General (disjoint) sequence region consisting of a concatenated " set of contiguous regions (see gf). {!Returns the length of the region |AThe bounds of a sequence location. This is a pair consisting of E the lowest and highest sequence offsets covered by the region. The B bounds ignore the strand of the sequence location, and the first D element of the pair will always be lower than the second. Even if D the positions in the location do not run monotonically through the I location, the overall lowest and highest sequence offsets are returned. }>Sequence position of the start of the location. This is the 5' B end on the location strand, which will have a higher offset than  ~ if the location is on the [ strand. ~>Sequence position of the end of the location, as described in }. Extract the nucleotide  for the sequence location. If C any part of the location lies outside the bounds of the sequence,  an error results. As -, extract the nucleotide subsequence for the C location. Any positions in the location lying outside the bounds ! of the sequence are returned as N! rather than producing an error. BGiven a sequence position and a sequence location relative to the A same sequence, compute a new position representing the original C position relative to the subsequence defined by the location. If > the sequence position lies outside of the sequence location,  Nothing8 is returned; thus, the offset of the new position will  always be in the range [0, length cloc - 1]. ?When the sequence positions in the location are not monotonic, D there may be multiple possible posInto solutions. That is, if the E same outer sequence position is covered by two different contiguous B blocks of the location, then it would have two possible sequence @ positions relative to the location. In this case, the position  5'/-most in the location orientation is returned. >Given a sequence location and a sequence position within that E location, compute a new position representing the original position @ relative to the outer sequence. If the sequence position lies  outside the location, Nothing is returned. This function inverts ! when the sequence position lies B within the position is actually within the location. Due to the B possibility of redundant location-relative positions for a given  absolute position,  does not necessary invert  ?Returns a sequence location produced by extending the original + location on each end, based on a pair of ( 5\' extension, /3'  extension/+). These add contiguous positions to the 5' and 3' & ends of the original location. The 5' extension is applied to the  5'@ end of the location on the location strand; if the location is  on the [ strand, the 5' end will have a higher offset  than the 3'8 end and this offset will increase by the amount of the  5' extension. Similarly, the 3' extension is applied to the 3'  end of the location. Returns True2 when a sequence position lies within a sequence > location on the same sequence, and occupies the same strand. Returns True, when two sequence locations overlap at any  position. LDisplay a human-friendly, zero-based representation of a sequence location. yz{|}~yz|{}~yzz{|}~ <Data structure allowing efficient lookup of target sequence C locations that overlap a query location. Target locations can be " paired with an arbitrary object.  Create a  / from an association list of target locations. <Insert a new target association into a target location map. BFind the (possibly empty) list of target locations and associated ; objects that contain a sequence position, in the sense of   BFind the (possibly empty) list of target locations and associated ; objects that overlap a sequence location, in the sense of  GRemove a target location and object association from the map, if it is ; present. If it is present multiple times, only the first  occurrence will be deleted. Generalized version of   that removes the first target  location /9 object association that satisfies a predicate function. ! ?Data type for a collection of objects indexed by sequence name CData type for an object associated with a specific, named sequence Sequence name, as in a  9Looks up a sequence by name and applies a function to it =Tests a predicate when two objects are on the same sequence,  returning False% if they are on different sequences. APerforms an action when two objects are on the same sequence and  produces an error otherwise. ALifts a function on an underlying object to look up the sequence $ name in a name-indexed collection. BLifts a function that updates an underlying object to look up the 5 named sequence and update a named-index collection. BLifts a function on underlying objects to look up a sequence in a  name-indexed collection   " AA general location, consisting of spans of sequence positions on  a specific, named sequence. =A location consisting of a contiguous span of positions on a  named sequence. A position on a named sequence -Display a human-friendly representation of a " BTest whether a sequence position lies within a sequence location. @ This requires that the position lie within the location as per  v" and have the same sequence name. -Display a human-friendly representation of a " BTest whether a sequence position lies within a sequence location. @ This requires that the position lie within the location as per  " and have the same sequence name. =Test whether two sequence locations overlap in any position. 1 This requires that the locations overlap as per  and  have the same sequence name. @Extract the subsequence specified by a sequence location from a D sequence database. The sequence name is used to retrieve the full 1 sequence and the subsequence is extracted as by  -Display a human-friendly representation of a "   #:Representation of a single mismatch in a bowtie alignment &Offset of the mismatch site from the 5' end of the query Reference nucleotide Query nucleotide Name of the query sequence 2Strand of the alignment on the reference sequence Name of the reference sequence EZero-based offset of the left-most aligned position in the reference <Query sequence, in the reference forward strand orientation ;Query quality, in the reference forward strand orientation  Mismatches )Returns the length of the query sequence 2Returns the number of mismatches in the alignment ,Parses a line of Bowtie output to produce a ## *Query sequence as given in the query file )Query quality as given in the query file AReturns the sequence position of the start of the query sequence D alignment. This will include the strand of the alignment and will / not be the same as the position computed from # when the 0 alignment is on the reverse complement strand. 6Returns the sequence location covered by the query in C the alignment. This will be a sequence location on the reference ? sequence and may run on the forward or the reverse complement  strand. As #* but without the reference sequence name. 7Returns the sequence location covered by the query, as  #, as a " location. <Returns true when two alignments were derived from the same E sequencing read. As Bowtie writes alignments of query sequences in C their order in the query file, all alignments of a given read are D grouped together and the lists of all alignments for each read can  be gathered with  groupBy sameRead ;Sequence position of a mismatch on the reference sequence. $(Read nt in reference strand orientation -Reference nt in reference strand orientation Offset from reference strand 5'% end in reference strand orientation Quality score of read nt Alignment output from SOAP &Reference strand orientation sequence *Reference strand orientation quality data 71-based index, as output by SOAP, of reference strand 5' end  %9A data structure for efficiently finding target sequence  locations ( SeqLoc.Loc-) that overlap query positions or locations. E Each target location can be associated with an arbitrary additional  value in the lookup map. Empty lookup map.  Creates a %+ from a list of target locations and their  associated objects =Inserts a new target location and associated object into the  location lookup map. BFind the (possibly empty) list of target locations and associated ; objects that contain a sequence position, in the sense of   Loc.isWithin. BFind the (possibly empty) list of target locations and associated ; objects that overlap a sequence location, in the sense of   Loc.overlaps. & '   (               )   1RR*  Progressive multiple alignment. > Calculate a tree from agglomerative clustering, then align G at each branch going bottom up. Returns a list of columns (rows?). !ODerive alignments indirectly, i.e. calculate A|C using alignments A|B and B|C.  This is central for Coffee5 evaluation of alignments, and T-Coffee construction  of alignments.  ! ! !K23456789:;<=>?@@ABCDEFGHIIJKLMMNOPQQRSTUVWXYZ[\]^_`abccNOJKABCDEFGHbd e e f g h i j k l m n o p q r s t u v w x y z { | } ~                       a       !"#$%&'()*+,-./0123456789:;<=>?@A>?@AB>?C@AD"EFGHIJKLMNOPQRSTUVWXYZ[\]^_`gabcdefg]h0ijjklmnopqqrsltuvwxmyz{|}~psvwxyz{|}~p        !!!!!!!!!!!!"""""""~""y"p#########l######s##a########$$$$$k$$$$$$$$$s$l$$$$$$$$$a$$$$%%%%%%&&&&&&&&&l&&&&&&&s&a&&&&&&&&&''''''''''''((a(((((((y((((())))))))))))**   ++-./7++     bio-0.4Bio.GFF3.EscapeBio.Util.ParsexBio.UtilBio.ClusteringBio.Alignment.BlastDataBio.Alignment.BlastBio.Alignment.BlastXMLBio.Alignment.BlastFlatBio.Sequence.SFF_nameBio.Sequence.GeneOntologyBio.Sequence.KEGGBio.Sequence.GOABio.Sequence.EntropyBio.Sequence.SeqDataBio.Sequence.FastaBio.Sequence.FastQBio.Sequence.TwoBitBio.Sequence.PhdBio.Sequence.HashWordBio.Sequence.SFFBio.Alignment.AlignDataBio.Alignment.MatricesBio.Alignment.SAlignBio.Alignment.AAlignBio.Alignment.QAlignBio.Alignment.ACEBio.Util.TestBaseBio.Location.StrandBio.Location.PositionBio.Location.ContigLocationBio.Location.LocationBio.Location.LocMapBio.Location.OnSeqBio.Location.SeqLocationBio.Alignment.BowtieBio.Alignment.SoapBio.Location.SeqLocMapBio.GFF3.FeatureBio.GFF3.FeatureHierBio.GFF3.FeatureHierSequences Bio.GFF3.SGDBio.Alignment.MultiplebaseData.Ord Data.Listbytestring-0.9.1.4Data.ByteString.Lazy.Char8Prelude Bio.SequenceunEscapeByteStringescapeByteString escapeAllBut escapeAllOflazyManylinesmylines splitWhencountIO sequence' ClusteredLeafBranch cluster_sl BlastMatchbitse_validentityq_fromq_toh_fromh_toauxBlastHitsubjectslengthmatches BlastRecordqueryqlengthhits BlastResult blastprogram blastversion blastdateblastreferencesdatabase dbsequencesdbcharsresultsAuxFrameStrandsStrandMinusPlusSeqIdparsereadXML BlastFlatflattenReadNamedatetimeregionx_locy_locdecodeReadNamedecodeLocation decodeDateencodeReadNameencodeLocation encodeRegion encodeDatedivModsdecode36decChencode36b36 EvidenceCodeNRTASRCANDNASISSIPIIMPIGIIGCIEPIEAIDAIC AnnotationAnn UniProtAccGoDefGoClassCompProcFuncGoTermGO GoHierarchyreadOboreadGOA readTerms decomment isCuratedKO genReadKeggdecodeUPdecodeKO removePrefixreadGOKWordskwordsentropyAminoXaaXleGlxAsxSTPValTrpTyrThrSerProPheMetLysLeuIleHisGlyGluGlnCysAspAsnArgAlaUnknownNucSequenceSeqQualDataQualSeqDataOffset castToNuc castToAminoseqToStrputSeqLnfromStrtoStr!? seqlengthseqlabel seqheaderseqdataseqqualhasqual appendHeader setHeader defragSeqrevcompl revcompl'compl translatetoIUPAC fromIUPAC readFasta writeFastareadQual writeQual readFastaQualwriteFastaQualhWriteFastaQual hReadFasta hWriteFasta hWriteQualmkSeqs countSeqs readFastQ hReadFastQ writeFastQ hWriteFastQunparse decode2Bit encode2Bitread2Bit hRead2Bit write2Bit hWrite2BitreadPhdhReadPhdShapeHashFHFhashhashesksortgenkeys contigousrcontigcompactrcpackedgappedisNn2kn2i'k2nvalunval complement ReadBlock read_headerflowgram flow_indexbasesquality ReadHeader name_length num_basesclip_qual_leftclip_qual_rightclip_adapter_leftclip_adapter_right read_name CommonHeader index_offset index_length num_reads key_length flow_length flowgram_fmtflowkeySFFIndexFlowreadSFF sffToSequencewriteSFF writeSFF'testconvertSelectorSubstMxEditListEditReplDelInsChr AlignmentGapsDirRevFwd extractGaps insertGaps toStringsisReplevalcolumnsblosum45blosum62blosum80pam30pam70blastn_defaultsimpleMx global_score local_score global_align local_alignqualMx overlap_score overlap_alignAssemblyAsmcontig fragmentsreadsptestreadACEwriteACEEST_setESetEST_longEL EST_shortESProteinPESTqEqESTEQualityQ NucleotideNTestTfromNfromQshowTintegralRandomR genOffsetgenNonNegOffsetgenPositiveOffsetStrandedrevComplRevComplstrandedPosoffsetstrandslideseqNt seqNtPaddeddisplay ContigLocoffset5length fromStartEnd fromPosLenboundsstartPosendPosseqData seqDataPaddedposIntoposOutofextendisWithinoverlapsLocLocMapfromListinsert lookupWithinlookupOverlapsdeletedeleteBycheckInvariantsOnSeqsOnSeq onSeqNameonSeqObjSeqName withSeqData andSameSeq onSameSeqperSeq perSeqUpdatewithNameAndSeqSeqLoc ContigSeqLocSeqPos displaySeqPoswithinContigSeqLocdisplayContigSeqLocMismatchmmoffsetrefbasereadbaseAlignnamerefname leftoffsetseququal mismatches nmismatch querySequ queryQual refSeqPos refCSeqLocrefCLoc refSeqLocsameReadmismatchSeqPosSoapAlignMismatchSAMreadntrefntqualnt SoapAlignSAnhitpairendrefstart parseMismatchunparseMismatchgroup SeqLocMapemptyFeatureseqidsourceftypestartendscorephase attributesGFFAttrattrTag attrValuesparseWithFasta attrByTagids parentIds contigLoclocseqLoc FeatureHierfeatureslookupIdlookupIdChildrenparentschildrenparentsM childrenMFeatureHierSequences fromLists sequences getSequencefeatureSequencerunGFFrunGFFIOasksGFF chromosomesgenesrRNAs geneSequence geneSeqLoc geneCDSesnoncodingSequencenoncodingSeqLocnoncodingExons sortExonsnamedSLM geneCDS_SLM progressiveindirectbreaksmkGoDefmkAnngetECsplitsblocks GHC.Classes>unfoldrmkPhdk2n'padskipputRB decodeArrayGHC.Num*- showalignong_scorel_scoreminf score_select align_select QSelectoroverlap_align' AceParserACEparse1aceasctgreadIntsasmafrd