_5La      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~        !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~        !!!!!!!!!!!!!!!!!!!""""""""""""""##########$$$$$$$$$$$$$$$$$$$$$$$$$%%%%%%%%%%%%%%%%%%%%%%%%%%%%&&&&&&&&&&& & & & & &&''''''''(((((())) )!)")#)$)%)&)')()))*)+),)-).)/)0)1)2)3)4)5)6)7*8*9*:*;*<*=*>*?*@*A*B*C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q,R,S,T,U,V,W,X,Y,Z,[,\,]-^-_.`..ab1Data structure for storing hierarchical clusters )Single linkage agglomerative clustering. T Cluster elements by slurping a sorted list of pairs with score (i.e. triples :-) 3 Keeps a set of contained elements at each branch's root, so O(n log n), ' and requires elements to be in Ord. Z For this to work, the triples must be sorted on score. Earlier scores in the list will W make up the lower nodes, so sort descending for similarity, ascending for distance.   !"#$  !"#$  !"#$    !"#$%&Workaround, the current Data.ByteString.Lazy.Char8 contains a bug in  Data.ByteString.Lazy.Char8.lines. ',Break a list of bytestrings on a predicate. (:Output (to stderr) progress while evaluating a lazy list. L Useful for generating output while (conceptually, at least) in pure code )A lazier version of Control.Monad.sequence in  Control.Monad , needed by ( above. %&'()%'()&%&'()%*A 547 may contain multiple separate matches (typcially when B an indel causes a frameshift that blastx is unable to bridge). +,-./01234>Each match between a query and a target sequence (or subject)  is a 54. 56789 Each query sequence generates a :9 :;<=>A ?> is the root of the hierarchy. ?@ABCDEFGHJThe Aux field in the BLAST output includes match information that depends R on the BLAST flavor (blastn, blastx, or blastp). This data structure captures  those variations. Iblastx Jblastn KThe KB indicates the direction of the match, i.e. the plain sequence or  its reverse complement. LMN:The sequence id, i.e. the first word of the header field. %*+,-./0123456789:;<=>?@ABCDEFGHIJKLMN%NKMLHJI>?@ABCDEFG9:;<=45678*+,-./0123%* +,-./0123+,-./01234567856789:;<=:;<=> ?@ABCDEFG?@ABCDEFGHJIIJKMLLMNO"Parse BLAST results in XML format c#breaks p = groupBy (const (not.p)) defghiOOOPGThe BlastFlat data structure contains information about a single match QRSTUVWXYZ[\]^_SConvert BlastRecords into BlastFlats (representing a depth-first traversal of the  BlastRecord structure.) 9@ABCDEFGHIJKLMPQRSTUVWXYZ[\]^_PQRSTUVWXYZ[\]^_9@ABCDEFGHJIKMLP QRSTUVWXYZ[\]QRSTUVWXYZ[\]^_ jklmnopqr`stuv``` a;Read names encode various information, as per this struct. bcdefghijklmnopqrsabcdefghijklmnopqrsabcdefghijklmnopqrsabcdefgbcdefghijklmnopqrs %t>Evidence codes describe the type of support for an annotation   -http://www.geneontology.org/GO.evidence.shtml uNot Recorded vTraceable Author Statement w.Inferred from Reviewed Computational Analysis xNo biological Data available yNon-traceable Author Statement z0Inferred from Sequence or Structural Similarity {#Inferred from Physical Interaction |Inferred from Mutant Phenotype }"Inferred from Genetic Interaction ~Inferred from Genomic Context !Inferred from Expression Pattern $Inferred from Electronic Annotation Inferred from Direct Assay Inferred by Curator RA GOA annotation, containing a UniProt identifier, a GoTerm and an evidence code. =A UniProt identifier (short string of capitals and numbers). A GoDef maps a GoTerm to a description and a GoClass.  A GO term is a positive integer SA list of Go definitions, with pointers to parent nodes. Read from the .obo file. U The user may construct the explicit hierachy by storing these in a Map or similar XRead the GO hierarchy from the obo file. Note that this is not quite a tree structure. 7Read the goa_uniprot file (warning: this one is huge!) 9Read GO term definitions, from the GO.terms_and_ids file wxyParse a GoDef+ from a line in the GO.terms_and_ids file. z Reading an  Annotation& from a line in the association file. {?Read the evidence code from a ByteString (no error checking!). JThe vast majority of GOA data is IEA, while the most reliable information L is manually curated. Filtering on this is useful to keep data set sizes  manageable, too. tuvwxyz{|}~ t~}|{zyxwvu t~}|{zyxwvuuvwxyz{|}~ JMost KEGG files that contain associations, have one association per line, R consisting of two items separated by whitespace. This is a generalized reader  function. 'Convert UniProt IDs (up:xxxxxx) to the  UniProtAcc type. !Convert KO IDs (ko:xxxxx) to the KO data type. RKEGG uses strings with an identifying prefix for IDs. This helper function checks 2 and removes prefix to construct native values.  !tuvwxyz{|}~|}N2For type tagging sequences (protein sequences use  below) VA sequence consists of a header, the sequence data itself, and optional quality data. X The type parameter is a phantom type to separate nucleotide and amino acid sequences header and actual sequence Quality data is a $ vector, currently implemented as a  ByteString. HBasic type for quality data. Range 0..255. Typical Phred output is in N the range 6..50, with 20 as the line in the sand separating good from bad. The basic data type used in s !An offset, index, or length of a  HPhantom type functionality, unchecked conversion between sequence types =Returns a properly formatted and probably highlighted string H | representation of a sequence. Highlighting is done using ANSI-Escape  | sequences. MA simple function to display a sequence: we generate the sequence string and  | call putStrLn ~KSplits a string into parts of size width. The last element can be shorter. Convert a String to   Convert a  to a String >Read the character at the specified position in the sequence. Return sequence length. -Return sequence label (first word of header) Return full header. Return the sequence data. KReturn the quality data, or error if none exist. Use hasqual if in doubt. 8Check whether the sequence has associated quality data. 5Modify the header by appending text, or by replacing 1 all but the sequence label (i.e. first word). @Returns a sequence with all internal storage freshly copied and = with sequence and quality data present as a single chunk. %By freshly copying internal storage,  allows garbage @ collection of the original data source whence the sequence was A read; otherwise, use of just a short sequence name can cause an - entire sequence file buffer to be retained. 1By compacting sequence data into a single chunk,  avoids D linear-time traversal of sequence chunks during random access into  sequence data. Gmap over sequences, treating them as a sequence of (char,word8) pairs. M This will work on sequences without quality, as long as the function doesn't  try to examine it. 5 The current implementation is not very efficient. "Calculate the reverse complement. 7 This is only relevant for the nucleotide alphabet, 0 and it leaves other characters unmodified. 1Calculate the reverse complent for SeqData only. AComplement a single character. I.e. identify the nucleotide it H can hybridize with. Note that for multiple nucleotides, you usually $ want the reverse complement (see  for that). ?Translate a nucleotide sequence into the corresponding protein J sequence. This works rather blindly, with no attempt to identify ORFs  or otherwise QA the result. =Convert a list of amino acids to a sequence in IUPAC format. =Convert a sequence in IUPAC format to a list of amino acids. HA more arranged show instance for Sequences reassembling the display of  the fasta-format ;;;2Lazily read sequences from a FASTA-formatted file +Write sequences to a FASTA-formatted file.  Line length is 60. +Read quality data for sequences to a file. ,Write quality data for sequences to a file. 5Read sequence and associated quality. Will error if C the sequences and qualites do not match one-to-one in sequence. .Write sequence and quality data simulatnously ' This may be more laziness-friendly. !Lazily read sequence from handle -Write sequences in FASTA format to a handle. BConvert a list of FASTA-formatted lines into a list of sequences.  Blank lines are ignored.  Comment lines start with #. are allowed between sequences (and ignored).  Lines starting with > initiate a new sequence. &Split lines into blocks starting with  characters  Filter out # comments (but not semicolons?) -Parse one FastQ entry, suitable for using in  over   from a file 9Parse a .phd file, extracting the contents as a Sequence "Parse .phd contents from a handle The actual phd parser. 2Pack bytestring segments into a single bytestring 2 Allows the (rest of the) file contents to be GC'ed 1;Parse a (lazy) ByteString as sequences in the 2bit format. @Marshall from neutral representation to the 2Bit ByteString rep /Read sequences from a file in 2bit format and  | unmarshall/"deserialize into Sequence format. 9Read sequences from a file handle in the 2bit format and  | unmarshall/!deserialze into Sequence format. Marshall/serialize [Sequence]( into 2Bit format and write to a file. Marshall/serialize [Sequence]5 into 2Bit format and write to a file using handle. ;This is a struct for containing a set of hashing functions 6calculates the hash at a given offset in the sequence 8calculate all hashes from a sequence, and their indices for sorting hashes Adds a default hashes function to a HashF, when hash is defined. Contigous constructs an int/eger from a contigous k-word. Like C, but returns the same hash for a word and its reverse complement. Like rcontigK, but ignoring monomers (i.e. arbitrarily long runs of a single nucelotide - are treated the same a single nucleotide.      2The propensities for forming secondary structures C From Zvelebil and Baum: Understanding Bioinformatics, Chapter 11  citing Chou and Fasman. @ Today, more complex methods like GOR are recommended instead.                F]This allows us to decode the constant parts of the read header for verifying its correcness. RRSFF wraps an SFF to provide an instance of Binary with some more error checking. 5This contains the actual flowgram for a single read. "Each Read has a fixed read header  !"#$ SFF has a 31-byte common header @ Todo: remove items that are derivable (counters, magic, etc) 3 cheader_lenght points to the first read header. H Also, the format is open to having the index anywhere between reads, I we should really keep count and check for each read. In practice, it ' seems to be places after the reads. CThe following two fields are considered part of the header, but as < they are static, they are not part of the data structure : magic :: Word32 -- ^ 0x2e736666, i.e. the string .sff ) version :: Word32 -- ^ 0x00000001 %&Points to a text(?) section '()*+,-.JThe data structure storing the contents of an SFF file (modulo the index) /01The type of flowgram value 2345+Trim a read to specific sequence position. z The current implementation has the unintended side effect of always trimming the flowgram down to a basecalled position. 6.Trim a read according to clipping information 7?Convert a flow position to the corresponding sequence position 8?Convert a sequence position to the corresponding flow position 9: Write an /. to the specified file name ; Write an /.- to the specified file name, but go back and ? update the read count. Useful if you want to output a lazy  stream of )s. Returns the number of reads written. <test serialization by output'$ing the header and first two reads : in an SFF, and the same after a decode + encode cycle. =1Convert a file by decoding it and re-encoding it & This will lose the index (which isn't really necessary) !Generalized function for padding %Generalized function to skip padding A ReadBlock can'9t be an instance of Binary directly, since it depends on & information from the CommonHeader. >6Unpack the flow_data field into a list of flow values ?SPack a list of flows into the corresponding binary structure (the flow_data field) @Ensure that the header we'&re decoding matches our expectations. >Wrapper for ReadBlocks since they need additional information 9abcdefghk !"#$%&'()*+,-./0123456789:;<=>?@9./$%&'()*+,- !"#2:;9465387<=@?>10abcdefghk- !"# !"#$ %&'()*+,-%&'()*+,-.//0123456789:;<=>?@A?TrimFilters modify the read, typically trimming it for quality BGDiscardFilters determine whether a read is to be retained or discarded CDEFG  2.2.1.2 The dots< filter discards sequences where the last positive flow is Q before flow 84, and flows with >5% dots (i.e. three successive noise values) X before the last postitive flow. (Interpreted as 5% of called sequence length is Ns?)  2.2.1.3 The mixed@ filter discards sequences with more than 70% positive flows.  Also, discard with  30% noise, 20% middle (0.45..0.75) or <30% positive. TDiscard a read if the number of untrimmed flows is less than n (n=186 for Titanium) HI 02.2.1.4 Signal intensity trim - trim back until < 3% borderline flows (0.5..0.7). D Then trim borderline values or dots from the end (use a window). JK [2.2.1.7 Quality score trimming trims using a 10-base window until a Q20 average is found. LDList length as a double (eliminates many instances of fromIntegral) MCalculate average of a list NZTranslate a number of flows to position in sequence, and update clipping data accordingly O:Update clip_qual_right if more severe than previous value ABCDEFGHIJKLMNOBFDCEGAJHIKLMNOABCDEFGHIJKLMNOP7A Selector consists of a zero element, and a funcition L that chooses a possible Edit operation, and generates an updated result. QKA substitution matrix gives scores for replacing a character with another. Y Typically, it will be symmetric. It is type-tagged with the alphabet - Nuc or Amino. R%An alignment is a sequence of edits. S/An Edit is either the insertion, the deletion, & or the replacement of a character. TUVW/The sequence element type, used in alignments. XYZ[\]Gaps are coded as +s, this function removes them, and returns 6 the sequence along with the list of gap positions. D note that gaps are positioned relative to the *gapped* sequence  (contrast to stmassembler/ Cluster.hs) ^`&turn an alignment into sequences with  representing gaps " (for checking, filtering out the  characters should return " the original sequences, provided  isn't part of the sequence  alphabet) aTrue if the Edit is a Repl. b2Evaluate an Edit based on SubstMx and gap penalty c-Calculate a set of columns containing scores [ This represents the columns of the alignment matrix, but will only require linear space  for score calculation. PQRSTUVWXYZ[\]^_`abcdZ\[YX]^SVUTRQPWcbad_`PQRSVUTTUVWXYZ\[[\]^`abce:BLOSUM45 matrix, suitable for distantly related sequences fThe standard BLOSUM62 matrix. g:BLOSUM80 matrix, suitable for closely related sequences. hThe standard PAM30 matrix iThe standard PAM70 matrix. j7Blast defaults, use with gap_open = -5 gap_extend = -3 G This should really check for valid nucleotides, and perhaps be more ( lenient in the case of Ns. Oh well. kConstruct a simple matrix from match score/mismatch penalty efghijkefghijkefghijklBCalculate global edit distance (Needleman-Wunsch alignment score) Scoring/(selection function for global alignment m?Calculate local edit distance (Smith-Waterman alignment score) Scoring/'selection funciton for local alignmnet nCalculate alignments. olmnomolnlmno-Minus infinity (or an approximation thereof) pBCalculate global edit distance (Needleman-Wunsch alignment score) q?Calculate local edit distance (Smith-Waterman alignment score) DGeneric scoring and selection function for global and local scoring r.Calculate global alignment (Needleman-Wunsch) s+Calculate local alignmnet (Smith-Waterman) =Generic scoring and selection for global and local alignment pqrsqsprpqrsAThe selector must take into account the quality of the sequences  on Ins/FDel, the average of qualities surrounding the gap is (should be) used -Minus infinity (or an approximation thereof) tuBCalculate global edit distance (Needleman-Wunsch alignment score) v?Calculate local edit distance (Smith-Waterman alignment score) w?Calucalte best overlap score, where gaps at the edges are free K The starting point is like for local score (0 cost for initial indels), V the result is the maximum anywhere in the last column or bottom row of the matrix. DGeneric scoring and selection function for global and local scoring x.Calculate global alignment (Needleman-Wunsch) y+Calculate local alignment (Smith-Waterman)  (can we replace uncurry max'? with fst - a local alignment must always end on a subst, no?) z?Calucalte best overlap score, where gaps at the edges are free K The starting point is like for local score (0 cost for initial indels), V the result is the maximum anywhere in the last column or bottom row of the matrix. HVariant that retains indels to retain the entire sequence in the result =Generic scoring and selection for global and local alignment {tuvwxyz{vyuxwzt{tuvwxyz{-The Parsec parser type !ACE header lines with parameters F The tokenizer (scanner) should convert input into a list of these, ) which in turn can be parsed by Parsec  |}~  'Parse a single token, primitive parser (Test parser p on a list of ACE elements  %Add SourcePoses to a stream of ACEs.  2Parse a complete ACE file as a set of assemblies. parse the initial header 2parse the contig and quality information (CO, BQ) 'Read a list of Ints in the Maybe monad Given the CO info, get the AFS'es =Parse a list of AFS, followed by actual read, and merge them ' afs :: Sequence -> AceParser [Sequence] -- plus some auxiliary info? parse each read (RD, QA, DS) Y Vector NTI appears to insert solitary RDs, sometimes even without any sequence data!? ( This is not supported at this point.  !Reading an ACE file. |}~|}~|}~}~ ?Data type for a collection of objects indexed by sequence name CData type for an object associated with a specific, named sequence Sequence name, as in a  9Looks up a sequence by name and applies a function to it Function using sequence data Lookup sequence by name Object with named sequence =Tests a predicate when two objects are on the same sequence,  returning False% if they are on different sequences. APerforms an action when two objects are on the same sequence and  produces an error otherwise. ALifts a function on an underlying object to look up the sequence $ name in a name-indexed collection. BLifts a function that updates an underlying object to look up the 5 named sequence and update a named-index collection. BLifts a function on underlying objects to look up a sequence in a  name-indexed collection   @A nucleotide sequence or location on a nucleotide sequence that 5 lies on a specific strand and has an orientation. Sequence strand Convert the orientation of a  thing based on a  specified   Position in a sequence 0-based index of the position Strand of the position @Returns a position resulting from sliding the original position C along the sequence by a specified offset. A positive offset will " move the position away from the 5'! end of the forward stand of the B sequence regardless of the strand of the position itself. Thus, 6 slide (revCompl pos) off == revCompl (slide pos off) @Extract the nucleotide at a specific sequence position. If the E position lies outside the bounds of the sequence, an error results. As 0, extract the nucleotide at a specific sequence  position, but return N$ when the position lies outside the  bounds of the sequence. : seqNtPadded sequ pos == (either 'N' id . seqNt sequ) pos LDisplay a human-friendly, zero-based representation of a sequence position. !;Contiguous sequence location defined by a span of sequence 8 positions, lying on a specific strand of the sequence. The offset of the 5') end of the location, as a 0-based index The length of the location The strand of the location >Create a sequence location lying between 0-based starting and  ending offsets. When start < end, the location 7 be on the forward strand, otherwise it will be on the  reverse complement strand. =Create a sequence location from the sequence position of the C start of the location and the length of the position. The strand A of the location, and the direction it extends from the starting B position, are determined by the strand of the starting position. AThe bounds of a sequence location. This is a pair consisting of E the lowest and highest sequence offsets covered by the region. The B bounds ignore the strand of the sequence location, and the first ; element of the pair will always be lower than the second. >Sequence position of the start of the location. This is the 5' B end on the location strand, which will have a higher offset than   if the location is on the  strand. >Sequence position of the end of the location, as described in . @Returns a location resulting from sliding the original location C along the sequence by a specified offset. A positive offset will " move the location away from the 5'! end of the forward stand of the B sequence regardless of the strand of the location itself. Thus, 8 slide (revCompl cloc) off == revCompl (slide cloc off) Extract the nucleotide  for the sequence location. If C any part of the location lies outside the bounds of the sequence,  an error results. As -, extract the nucleotide subsequence for the C location. Any positions in the location lying outside the bounds ! of the sequence are returned as N! rather than producing an error. BGiven a sequence position and a sequence location relative to the A same sequence, compute a new position representing the original C position relative to the subsequence defined by the location. If > the sequence position lies outside of the sequence location,  Nothing8 is returned; thus, the offset of the new position will  always be in the range [0, length cloc - 1]. >Given a sequence location and a sequence position within that E location, compute a new position representing the original position @ relative to the outer sequence. If the sequence position lies  outside the location, Nothing is returned. This function inverts ! when the sequence position lies 6 within the position is actually within the location. ?Returns a sequence location produced by extending the original + location on each end, based on a pair of ( 5\' extension, /3'  extension/ ). The 5' extension is applied to the 5' end of the < location on the location strand; if the location is on the   strand, the 5'( end will have a higher offset than the  3'9 end and this offset will increase by the amount of the 5'  extension. Similarly, the 3' extension is applied to the 3' end  of the location. Returns True2 when a sequence position lies within a sequence > location on the same sequence, and occupies the same strand. Returns True, when two sequence locations overlap at any  position. LDisplay a human-friendly, zero-based representation of a sequence location. "@General (disjoint) sequence region consisting of a concatenated " set of contiguous regions (see ). !Returns the length of the region AThe bounds of a sequence location. This is a pair consisting of E the lowest and highest sequence offsets covered by the region. The B bounds ignore the strand of the sequence location, and the first D element of the pair will always be lower than the second. Even if D the positions in the location do not run monotonically through the I location, the overall lowest and highest sequence offsets are returned. >Sequence position of the start of the location. This is the 5' B end on the location strand, which will have a higher offset than   if the location is on the  strand. >Sequence position of the end of the location, as described in . Extract the nucleotide  for the sequence location. If C any part of the location lies outside the bounds of the sequence,  an error results. As -, extract the nucleotide subsequence for the C location. Any positions in the location lying outside the bounds ! of the sequence are returned as N! rather than producing an error. BGiven a sequence position and a sequence location relative to the A same sequence, compute a new position representing the original C position relative to the subsequence defined by the location. If > the sequence position lies outside of the sequence location,  Nothing8 is returned; thus, the offset of the new position will  always be in the range [0, length cloc - 1]. ?When the sequence positions in the location are not monotonic, D there may be multiple possible posInto solutions. That is, if the E same outer sequence position is covered by two different contiguous B blocks of the location, then it would have two possible sequence @ positions relative to the location. In this case, the position  5'/-most in the location orientation is returned. ">Given a sequence location and a sequence position within that E location, compute a new position representing the original position @ relative to the outer sequence. If the sequence position lies  outside the location, Nothing is returned. This function inverts ! when the sequence position lies B within the position is actually within the location. Due to the B possibility of redundant location-relative positions for a given  absolute position,  does not necessary invert  #?Returns a sequence location produced by extending the original + location on each end, based on a pair of ( 5\' extension, /3'  extension/+). These add contiguous positions to the 5' and 3' & ends of the original location. The 5' extension is applied to the  5'@ end of the location on the location strand; if the location is  on the  strand, the 5' end will have a higher offset  than the 3'8 end and this offset will increase by the amount of the  5' extension. Similarly, the 3' extension is applied to the 3'  end of the location. (5' extension, 3' extension) Returns True2 when a sequence position lies within a sequence > location on the same sequence, and occupies the same strand. $Returns True, when two sequence locations overlap at any  position. LDisplay a human-friendly, zero-based representation of a sequence location. # AA general location, consisting of spans of sequence positions on  a specific, named sequence. =A location consisting of a contiguous span of positions on a  named sequence. A position on a named sequence -Display a human-friendly representation of a  BTest whether a sequence position lies within a sequence location. @ This requires that the position lie within the location as per  " and have the same sequence name. -Display a human-friendly representation of a  BTest whether a sequence position lies within a sequence location. @ This requires that the position lie within the location as per  " and have the same sequence name. =Test whether two sequence locations overlap in any position. 1 This requires that the locations overlap as per  and  have the same sequence name. @Extract the subsequence specified by a sequence location from a D sequence database. The sequence name is used to retrieve the full 1 sequence and the subsequence is extracted as by  -Display a human-friendly representation of a    $:Representation of a single mismatch in a bowtie alignment &Offset of the mismatch site from the 5' end of the query Reference nucleotide Query nucleotide Name of the query sequence 2Strand of the alignment on the reference sequence Name of the reference sequence EZero-based offset of the left-most aligned position in the reference <Query sequence, in the reference forward strand orientation ;Query quality, in the reference forward strand orientation  Mismatches )Returns the length of the query sequence 2Returns the number of mismatches in the alignment ,Parses a line of Bowtie output to produce a  *Query sequence as given in the query file )Query quality as given in the query file AReturns the sequence position of the start of the query sequence D alignment. This will include the strand of the alignment and will / not be the same as the position computed from  when the 0 alignment is on the reverse complement strand. 6Returns the sequence location covered by the query in C the alignment. This will be a sequence location on the reference ? sequence and may run on the forward or the reverse complement  strand. As * but without the reference sequence name. 7Returns the sequence location covered by the query, as  , as a  location. <Returns true when two alignments were derived from the same E sequencing read. As Bowtie writes alignments of query sequences in C their order in the query file, all alignments of a given read are D grouped together and the lists of all alignments for each read can  be gathered with  groupBy sameRead ;Sequence position of a mismatch on the reference sequence. %&%!(Read nt in reference strand orientation -Reference nt in reference strand orientation Offset from reference strand 5'% end in reference strand orientation Quality score of read nt Alignment output from SOAP &Reference strand orientation sequence *Reference strand orientation quality data 71-based index, as output by SOAP, of reference strand 5' end '()*+ &&Yet another direction data structure. WThe BED data type Note that the specification allows a variable number of fields, with T only the three first required. This definition requires all fields to be present. Range 0..1000    "Available BED files appear to not ( support this format. RGB is therefore  ignored (read and written as '0')  (Lists of lenght blockCount, blockStarts  are relative to chromStart  ,                    ' <Data structure allowing efficient lookup of target sequence C locations that overlap a query location. Target locations can be " paired with an arbitrary object. -. Create a / from an association list of target locations. <Insert a new target association into a target location map. BFind the (possibly empty) list of target locations and associated ; objects that contain a sequence position, in the sense of   BFind the (possibly empty) list of target locations and associated ; objects that overlap a sequence location, in the sense of  GRemove a target location and object association from the map, if it is ; present. If it is present multiple times, only the first  occurrence will be deleted. Generalized version of  that removes the first target  location /9 object association that satisfies a predicate function. /01(9A data structure for efficiently finding target sequence  locations ( SeqLoc.Loc-) that overlap query positions or locations. E Each target location can be associated with an arbitrary additional  value in the lookup map. Empty lookup map.  Creates a + from a list of target locations and their  associated objects =Inserts a new target location and associated object into the  location lookup map. BFind the (possibly empty) list of target locations and associated ; objects that contain a sequence position, in the sense of   Loc.isWithin. BFind the (possibly empty) list of target locations and associated ; objects that overlap a sequence location, in the sense of   Loc.overlaps. )  !"#$%&'()*+,2-34.5/067123456 !"#$%&'()*+,-./0123456()*+ !"#$%&',-./0123456  !"#$%&' !"#$%&'()*+)*+,-./0123456*7889:;9:;<=<=>?@AB 789:;<=>?@AB 7889:;<=>?@AB 7889:;<=>?@AB+C>?@DEFGHIJKLMANOPCDEFGHIJKLMNOPCFGEDHJIKLMNOPCDEFGHIJKLMNOP,BCDEQRSTUVWXYZF[\ QRSTUVWXYZ[\ QRSZTUVWXY[\ QRSTUVWXYZ[\-]JRead nucleotide sequences in any format - Fasta, SFF, FastQ, 2bit, PHD... ^<Read protein sequences in any supported format (i.e. Fasta) V]^V]^]^._ Progressive multiple alignment. > Calculate a tree from agglomerative clustering, then align G at each branch going bottom up. Returns a list of columns (rows?). `ODerive alignments indirectly, i.e. calculate A|C using alignments A|B and B|C.  This is central for Coffee5 evaluation of alignments, and T-Coffee construction  of alignments. _`_`_`G/01234567889:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXXYZ[\]^_`aabcdeefghiijklmnopqrstuvwxyzzfgbcYZ[\]^_`y{ | } } ~                               |      !"#$%&'()*+,,-./01223456789::;<=>?@ABCCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Pup    A    !!!!!A!!!!!!!!!!!!!!""""""""""""""##########$$$$$$$$$A$$$$$$$$|$$$$$$$$%%%%%%%%%%%%%%%A%%%%%%%%%|%%%%&n&o&p&&&&&&&&A&&&&&&''''''''(((((()))))))))A))))))))|)))))))))*********** *+ +|+ ++ +++++ ++++,,,,,,,,,,,,--. .!"#$%&'()* + , - . / 0 1 2 3 4 5 6 7 8 9 : ; <=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[X\]^_S`abcdeffgghhijklmnopqrrssttuvwxyz{|}~@XXw"""$$%%%%%&'''''))))))******+ +++,,,,, bio-0.4.7Bio.GFF3.EscapeBio.Util.ParsexBio.ClusteringBio.Alignment.PSLBio.UtilBio.Alignment.BlastDataBio.Alignment.BlastXMLBio.Alignment.BlastFlatBio.Alignment.BlastBio.Sequence.SFF_nameBio.Sequence.GeneOntologyBio.Sequence.KEGGBio.Sequence.GOABio.Sequence.EntropyBio.Sequence.SeqDataBio.Sequence.FastaBio.Sequence.FastQBio.Sequence.PhdBio.Sequence.TwoBitBio.Sequence.HashWordBio.Sequence.AminoPropertiesBio.Sequence.SFFBio.Sequence.SFF_filtersBio.Alignment.AlignDataBio.Alignment.MatricesBio.Alignment.SAlignBio.Alignment.AAlignBio.Alignment.QAlignBio.Alignment.ACEBio.Location.OnSeqBio.Location.StrandBio.Location.PositionBio.Location.ContigLocationBio.Location.LocationBio.Location.SeqLocationBio.Alignment.BowtieBio.Alignment.SoapBio.Alignment.BEDBio.Location.LocMapBio.Location.SeqLocMapBio.GFF3.FeatureBio.GFF3.FeatureHierBio.GFF3.FeatureHierSequences Bio.GFF3.SGD Bio.SequenceBio.Alignment.MultipleunEscapeByteStringescapeByteString escapeAllBut escapeAllOflazyMany ClusteredLeafBranch cluster_slPSLmatchmismatchrepmatchncount qgapcount qgaplength tgapcount tgaplengthstrandqnameqsizeqstartqendtnametsizetstarttend blockcount blocksizesqstartststartsreadPSLwritePSLparsePSL unparsePSL pslHeaderlinesmylines splitWhencountIO sequence' BlastMatchbitse_validentityq_fromq_toh_fromh_toauxBlastHitsubjectslengthmatches BlastRecordqueryqlengthhits BlastResult blastprogram blastversion blastdateblastreferencesdatabase dbsequencesdbcharsresultsAuxFrameStrandsStrandMinusPlusSeqIdreadXML BlastFlatflattenparseReadNamedatetimeregionx_locy_locdecodeReadNamedecodeLocation decodeDateencodeReadNameencodeLocation encodeRegion encodeDatedivModsdecode36decChencode36b36 EvidenceCodeNRTASRCANDNASISSIPIIMPIGIIGCIEPIEAIDAIC AnnotationAnn UniProtAccGoDefGoClassCompProcFuncGoTermGO GoHierarchyreadOboreadGOA readTerms decomment isCuratedKO genReadKeggdecodeUPdecodeKO removePrefixreadGOKWordskwordsentropyAminoXaaXleGlxAsxSTPValTrpTyrThrSerProPheMetLysLeuIleHisGlyGluGlnCysAspAsnArgAlaUnknownNucSequenceSeqQualDataQualSeqDataOffsetcastSeq castToNuc castToAminoseqToStrputSeqLnfromStrtoStr!? seqlengthseqlabel seqheaderseqdataseqqualhasqual appendHeader setHeader defragSeqseqmaprevcompl revcompl'compl translatetoIUPAC fromIUPAC readFasta writeFastareadQual writeQual readFastaQualwriteFastaQualhWriteFastaQual hReadFasta hWriteFasta hWriteQualmkSeqs countSeqs readFastQ hReadFastQ writeFastQ hWriteFastQunparsereadPhdhReadPhd decode2Bit encode2Bitread2Bit hRead2Bit write2Bit hWrite2BitShapeHashFHFhashhashesksortgenkeys contigousrcontigcompactrcpackedgappedisNn2kn2i'k2nvalunval complementAAProp aliphaticaromatic hydrophobicpolarsmalltinychargednegativepositive hydropathymasshelixPstrandP ReadBlock read_header flow_data flow_indexbasesquality ReadHeader name_length num_basesclip_qual_leftclip_qual_rightclip_adapter_leftclip_adapter_right read_name CommonHeader index_offset index_length num_reads key_length flow_length flowgram_fmtflowkeySFFIndexFlowreadSFFtrimKey sffToSequence trimFromTotrim flowToBasePos baseToFlowPos recoverSFFwriteSFF writeSFF'testconvert unpackFlows packFlowsflowgram TrimFilter DiscardFilter filter_empty filter_key filter_dots filter_mixed filter_length filter_sigintsigint filter_qual20qual20dlengthavg clipFlowsclipSeqSelectorSubstMxEditListEditReplDelInsChr AlignmentGapsDirRevFwd extractGaps insertGaps showalign toStringsisReplevalcolumnsonblosum45blosum62blosum80pam30pam70blastn_defaultsimpleMx global_score local_score global_align local_alignqualMx overlap_score overlap_alignAssemblyAsmcontig fragmentsreadsptestreadACEwriteACEOnSeqsOnSeq onSeqNameonSeqObjSeqName withSeqData andSameSeq onSameSeqperSeq perSeqUpdatewithNameAndSeqStrandedrevComplRevComplstrandedPosoffsetslideseqNt seqNtPaddeddisplay ContigLocoffset5length fromStartEnd fromPosLenboundsstartPosendPosseqData seqDataPaddedposIntoposOutofextendisWithinoverlapsLocSeqLoc ContigSeqLocSeqPos displaySeqPoswithinContigSeqLocdisplayContigSeqLocMismatchmmoffsetrefbasereadbaseAlignnamerefname leftoffsetseququal mismatches nmismatch querySequ queryQual refSeqPos refCSeqLocrefCLoc refSeqLocsameReadmismatchSeqPosSoapAlignMismatchSAMreadntrefntqualnt SoapAlignSAnhitpairendrefstart parseMismatchunparseMismatchgroupBEDchrom chromStartchromEndscore thickStartthickEnditemRGBblockSizeStartreadBEDwriteBEDLocMapfromListinsert lookupWithinlookupOverlapsdeletedeleteBycheckInvariants SeqLocMapemptyFeatureseqidsourceftypestartendphase attributesGFFAttrattrTag attrValuesparseWithFasta attrByTagids parentIds contigLoclocseqLoc FeatureHierfeatureslookupIdlookupIdChildrenparentschildrenparentsM childrenMFeatureHierSequences fromLists sequences getSequencefeatureSequencerunGFFrunGFFIOasksGFF chromosomesgenesrRNAs geneSequence geneSeqLoc geneCDSesnoncodingSequencenoncodingSeqLocnoncodingExons sortExonsnamedSLM geneCDS_SLMreadNucreadProt progressiveindirect escapeWord8 escapeTablebreaksgetFromshowSomexml2briter2rechit2hit hsp2match str_querystr_gt str_score str_refer str_datab str_searchqueriesqhitshmatchesparse_preamble parse_query parse_hit parse_matchmkGoHiergetGomkGoDefmkAnngetECnlognprobslinesplitsbeginHighlight endHighlightshowDNA numberizechunkifyclean highlight splitsSkiptakeSkipdropSkip testPrefixestrans1 trans_tbliupaciupac'$fShowSequencesplitsAtwHeadwFastawQualmkSeqmkQualblocksbase GHC.Classes>go Data.Listunfoldrbytestring-0.9.1.7Data.ByteString.Lazy.Char8mkPhdisSubstr TwoBitData SequenceData SequenceSize SequenceLabelSRLESRBESRdnaSize nBlockCount nBlockStarts nBlockSizesmaskBlockCountmaskBlockStartsmaskBlockSizes packedDna reserved2EntriesEntryHeaderswapversioncountreserved default_magicdefault_versioncheckbswapbytesunbytes swapEntryfromSRtoSRunSRBEunSRLEk2n'oneOforPartialReadHeader_pread_header_lenght _pname_length _pnum_bases_pclip_qual_left_pclip_qual_right_clip_adapter_left_pclip_adapter_right _pread_nameRSFF unRecoveredRBImagicversions writeReadspadskipgetRBputRB getSaneHeader decodeSaneH $fBinaryRBIGHC.Num*-columns' genMatrixg_scorel_scoreg_alignl_alignminf score_select align_selectQualMx QSelectoravg2adjustmax'fpoverlap_align' AceParserACEEmptyOtherDSQARDBSAFBQCOASStruwparse1aceace1asblankctgcosdataqdatareadIntsbqasmafbsreadInt'rdsrseqrdqadstokenize tokenize1posIntoContigsposOutofContigsoverlappingContigs parseInt64 qualScale parseIntBStrparseOffsetBStr parseCharBStrpack1defaultZonesizeposZonelocZones keySetLocsdot escapeField escapeSeqidgffFastaDirectiveidTag parentTag idToFeature idToChildren parentsFirst featureEdges featureHier sequenceMap catchIOErrorschromosomeTypecdsTypegeneTypenoncodingExonType exonSeqLoc