h      !"#$%&'()*+, - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~                                  ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { |}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~   !!!""" " " " " ################### #!#"###$#%#&#'#(#)#*#+#,#-#.#/#0#1#2#3#4#5#6#7#8#9#:#;#<#=#>#?#@#A#B#C#D#E#F#G#H#I#J#K#L#M#N#O#P#Q#R#S#T#U#V#W#X#Y#Z#[#\#]#^#_#`#a#b#c#d#e#f#g#$None !"09:;<=pDecompresses Gzip or Bgzf and passes everything else on. In reality, it simply decompresses Gzip, and when done, looks for another Gzip stream. Trailing garbage is returned as is, therefore, uncompressed files are passed through. Since there is a small chance to attempt compression of an uncompressed stream, the original data is returned in case of an error.Safe !"09:;<= /Random useful stuff I didn't know where to put.*calculates the Wilson Score interval. If (l,m,h) = wilson c x n, then m is the binary proportion and (l,h) it's c-confidence interval for x positive examples out of n observations. c" is typically something like 0.05.!JTry to estimate complexity of a whole from a sample. Suppose we sampled total things and among those singles9 occured only once. How many different things are there?Let the total number be mA. The copy number follows a Poisson distribution with paramter lambda. Let z := e^{lambda}, then we have:jP( 0 ) = e^{-lambda} = 1/z P( 1 ) = lambda e^{-lambda} = ln z / z P(>=1) = 1 - e^{-lambda} = 1 - 1/z-singles = m ln z / z total = m (1 - 1/z) D := totalsingles = (1 - 1(z) * z / ln z f := z - 1 - D ln z = 0To get z>, we solve using Newton iteration and then substitute to get m:df dz = 1 - DEz z' := z - z (z - 1 - D ln z) / (z - D) m = singles * z /log z$It converges as long as the initial z is large enough, and 10D (in the line for zz below) appears to work well." Computes log (exp x + exp y)D without leaving the log domain and hence without losing precision.# Computes  log (1+x) to a relative precision of 10^-8 even for very small x. Stolen from 0http://www.johndcook.com/cpp_log_one_plus_x.html$ Computes  exp x - 1 to a relative precision of 10^-10 even for very small x. Stolen from 'http://www.johndcook.com/cpp_expm1.html% Computes log (1 - exp x), following Martin Mchler.& Computes log (1 + exp x), following Martin Mchler.' Computes  \log ( \sum_i e^{x_i} ) < sensibly. The list must be sorted in descending(!) order.( Computes ' \log \left( c e^x + (1-c) e^y \right) .)Binomial coefficient: n ) k == n! / ((n-k)! k!)*>Kind-of sigmoid function that maps the reals to the interval [0,1)J. Good to compute a probability without introducing boundary conditions.+ Inverse of *. !"#$%&'()*+ !"#$%&'()*+ )!#$"%&'(*+ !"#$%&'()*+"5 Safe !"09:;<=/8A Priority Queue that can fall back to external storage.Note that such a Priority Queue automatically gives rise to an external sorting algorithm: enqueue everything, dequeue until empty.UWhatever is to be stored in this queue needs to be in Binary, because it may need to be moved to external storage on demand. We also need a way to estimate the memory consumption of an enqueued object. When constructing the queue, the maximum amount of RAM to consume is set. Note that open input streams use memory for buffering, too.Enqueued objects are kept in an in memory heap until the memory consumption becomes too high. At that point, the whole heap is sorted and dumped to external storage. If necessary, the file to do so is created and kept open. The newly created stream is added to a heap so that dequeueing objects amounts to performing a merge sort on multiple external streams. To conserve on file descriptors, we concatenate multiple streams into a single file, then use pread(2) on that as appropriate. If too many streams are open (how do we set that limit?), we do exactly that: merge-sort all streams and the in-memory heap into a single new stream. One file is created for each generation of streams, so that mergind handles streams of roughly equal length.xXXX Truth be told, this queue isn't backed externally, and ignores all limits. It *is* a Priority Queue, though!hXXX May want to add callbacks for significant events (new file, massive merge, deletion of file?)7XXX Need to track memory consumption of input buffers.XXX Need a way to decide when too many streams are open. That point is reached when seeking takes about as much time as reading (which depends on buffer size and system characteristics), so that an additional merge pass becomes economical.XXX These will be useful: unix-bytestring:System.Posix.IO.ByteString.fdPread temporary:System.IO.Temp.openBinaryTempFile lz4:Codec.Compression.LZ41 memory limit2[path to temporary files (a directory will be created) functions to report progress go here3|Creates a priority queue. Note that the priority queue creates files, which will only be cleaned up if deletePQ is called.4>Deletes the priority queue and all associated temporary files.6|Enqueues an element. This operation may result in the creation of a file or in an enormous merge of already created files.7Removes the minimum element from the queue. If the queue is already empty, nothing happens. As a result, it is possible that one or more file become empty and are deleted.8Returns the minimum element from the queue. If the queue is empty, Nothing is returned. Else the minimum element currently in the queue.hij,-.k/0123456789:lmnop,-./0123456789:,-/012.5346798:hij,-.k/0123456789:lmnop Safe !"09:;<=?;,Class of streams which can be filled from a q?. Typically these are streams which can be read from a file, Handle, or similar resource. ;<=>?@ABC;<;<;<=>?@ABCSafe !"09:;<=DAlas, GHC provides no function to read from Fd to an allocated buffer. The library function fdRead is not appropriate as it returns a string already. I'd rather get data from a buffer. Furthermore, fdRead (at least in GHC) allocates a new buffer each time it is called. This is a waste. Yet another problem with fdRead is in raising an exception on any IOError or even EOF. I'd rather avoid exceptions altogether.E3The following fseek procedure throws no exceptions.F\poll if file descriptors have something to read Return the list of read-pending descriptors rstuvDEwxyzF  DEF DE F rstuvDEwxyzFSafe !"09:;<=AT GAn Iteratee exception specified by a String.IThe Iteratee needs more data but received EOF.KA seek request within an Iteratee.MRoot of iteratee exceptions.O A class for iteratee exceptions. Only inheritants of  IterException$ should be instances of this class.RThe enumerator received an M it could not handle.T&Create an enumerator exception from a String.VThe iteratee diverged upon receiving EOF.Z+Root of the Iteratee exception hierarchy.  IFException derives from Control.Exception.SomeException. X, M*, and all inheritants are descendents of Z.\ Create an X from a string.] Convert an M to an X. Meant to be used within an  Enumerator* to signify that it could not handle the  IterException.` Create an iteratee exception1 from a string. This convenience function wraps G and ..GHIJKLMNOPQRSTUVWXYZ[{|}~\]^_`abcdefghijklmnopGHIJKLMNOPQRSTUVWXYZ[\]^_`Z[XYVWTURSOPQMNKLIJGH\`]^_#GHIJKLMNOPQRSTUVWXYZ[{|}~\]^_`abcdefghijklmnopNone!"09:;<=ADIRTw?A nucleotide base. We only represent A,C,G,T. The contained  ist guaranteed to be 0..3.A nucleotide base in an alignment. Experience says we're dealing with Ns and gaps all the type, so purity be damned, they are included as if they were real bases. To allow  Nucleotidess to be unpacked and incorporated into containers, we choose to represent them the same way as the BAM file format: as a 4 bit wide field. Gaps are encoded as 0 where they make sense, N is 15. The contained  is guaranteed to be 0..15.WQualities are stored in deciban, also known as the Phred scale. To represent a value p , we store -10 * log_10 pa. Operations work directly on the "Phred" value, as the name suggests. The same goes for the W instance: greater quality means higher "Phred" score, meand lower error probability.Common way of using .A positive floating point value stored in log domain. We store the natural logarithm (makes computation easier), but allow conversions to the familiar "Phred" scale used for  values.MRanges in genomes We combine a position with a length. In 'Range pos len', pos- is always the start of a stretch of length len. Positions therefore move in the opposite direction on the reverse strand. To get the same stretch on the reverse strand, shift r_pos by r_length, then reverse direction (or call reverseRange).Coordinates in a genome. The position is zero-based, no questions about it. Think of the position as pointing to the crack between two bases: looking forward you see the next base to the right, looking in the reverse direction you see the complement of the first base to the left.CTo encode the strand, we (virtually) reverse-complement any sequence and prepend it to the normal one. That way, reversed coordinates have a negative sign and automatically make sense. Position 0 could either be the beginning of the sequence or the end on the reverse strand... that ambiguity shouldn't really matter.sequence (e.g. some chromosome)offset, zero-basedlSequence identifiers are ASCII strings Since we tend to store them for a while, we use strict byte strings.Converts a character into a 5. The usual codes for A,C,G,T and U are understood,  and * become gaps and everything else is an N.Converts a character into a 5. The usual codes for A,C,G,T and U are understood,  and * become gaps and everything else is an N. Tests if a  is a base. Returns  for everything but gaps. Tests if a  is a proper base. Returns  for A,C,G,T only. Tests if a * is a gap. Returns true only for the gap.Complements a Nucleotides.Complements a Nucleotides.Moves a Positionf. The position is moved forward according to the strand, negative indexes move backward accordingly.Moves a Range. This is just  shiftPosition lifted. Reverses a  to give the same Range on the opposite strand.>Extends a range. The length of the range is simply increased.Expands a subrange. (range1  range2) interprets range1 as a subrange of range2? and computes its absolute coordinates. The sequence name of range1 is ignored.Wraps a range to a region. This simply normalizes the start position to be in the interval '[0,n)', which only makes sense if the Rangem is to be mapped onto a circular genome. This works on both strands and the strand information is retained.JFinds a file by searching the environment variable BIOHAZARD like a PATH.Qwxy>wxy>wxy;wxy8None !"09:;<=QR*Class of things that can be unpacked into s. Kind of the opposite of .A strict pair. Converts  into . This uses UTF8, but if there is an error, it pretends it was Latin1. Evil as this is, it tends to Just Work on files where nobody ever wasted a thought on encodings. Converts  into . This uses UTF8.h      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh ijklmnopqrstuvwxyz{|}~        !"#$% &'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~                            ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  wxy22 None!"09:;<=ADORT Monadic iteratee(Describe the status of a stream of data.A stream is a (continuing) sequence of elements bundled in Chunks. The first variant indicates termination of the stream. Chunk a gives the currently available part of the stream. The stream is not terminated yet. The case (null Chunk) signifies a stream with no currently available data but which is still continuing. A stream processor should, informally speaking, ``suspend itself'' and wait for more data to arrive.Nullable container classcNullPoint class. Containers that have a null representation, corresponding to Data.Monoid.mempty. Produce the [ error message. If the stream was terminated because of an error, keep the error message. Send  to the Iteratee} and disregard the unconsumed part of the stream. If the iteratee is in an exception state, that exception is thrown with $%'. Iteratees that do not terminate on EOF will throw I. Run an iteratee, returning either the result or the iteratee exception. Note that only internal iteratee exceptions will be returned; exceptions thrown with Control.Exception.throw or Control.Monad.CatchIO.throw will not be returned.See &' for details. "Transform a computation inside an Iteratee. 5Lift a computation in the inner monad of an iteratee.AA simple use would be to lift a logger iteratee to a monad stack. logger :: Iteratee String IO () logger = mapChunksM_ putStrLn loggerG :: MonadIO m => Iteratee String m () loggerG = ilift liftIO loggerpA more complex example would involve lifting an iteratee to work with interleaved streams. See the example at (). ^Lift a computation in the inner monad of an iteratee, while threading through an accumulator.Map a function over a stream./       !"7GHIJKLMNOPQRSTUVWXYZ[\]^_`          &       !" None!"09:;<=DOT%%9Each enumerator takes an iteratee and returns an iterateean Enumerator is an iteratee transformer. The enumerator normally stops when the stream is terminated or when the iteratee moves to the done state, whichever comes first. When to stop is of course up to the enumerator...(Report and propagate an unrecoverable error. Disregard the input first and then propagate the error. This error cannot be handled by N!, although it can be cleared by *.)NReport and propagate a recoverable error. This error can be handled by both N and *.*1Check if an iteratee produces an error. Returns Right a, if it completes without errors, otherwise Left SomeException. *: is useful for iteratees that may not terminate, such as Data.Iteratee.head with an empty stream.+;The identity iteratee. Doesn't do any processing of input.,%Get the stream status of an iteratee.-Skip the rest of the stream. Seek to a position in the stream/Map a monadic function over the chunks of the stream and ignore the result. Useful for creating efficient monadic iteratee consumers, e.g. ) logger = mapChunksM_ (liftIO . putStrLn)Cthese can be efficiently run in parallel with other iteratees via Data.Iteratee.ListLike.zip.0A fold over chunks1&Get the current chunk from the stream.2)Get a list of all chunks from the stream.3RUtility function for creating enumeratees. Typical usage is demonstrated by the breakE definition. breakE :: (Monad m, LL.ListLike s el, NullPoint s) => (el -> Bool) -> Enumeratee s s m a breakE cpred = eneeCheckIfDone (liftI . step) where step k (Chunk s) | LL.null s = liftI (step k) | otherwise = case LL.break cpred s of (str', tail') | LL.null tail' -> eneeCheckIfDone (liftI . step) . k $ Chunk str' | otherwise -> idone (k $ Chunk str') (Chunk tail') step k stream = idone (k stream) stream4The same as eneeCheckIfDonePass, with one extra argument: a handler which is used to process any exceptions in a separate method.7=Lifts a monadic action and combines it with a continuation.  mBind m f is the same as  lift m >>= f, but does not require a  constraint on the stream type.8RLifts a monadic action, ignored the result and combines it with a continuation.  mBind_ m f is the same as  lift m >>= f, but does not require a  constraint on the stream type.99Lifts an IO action and combines it with a continuation.  ioBind m f is the same as liftIO m >>= f, but does not require a  constraint on the stream type.:OLifts an IO action, ignores its result, and combines it with a continuation.  ioBind_ m f is the same as  liftIO m >> f, but does not require a  constraint on the stream type.;Convert one stream into another with the supplied mapping function. This function operates on whole chunks at a time, contrasting to  mapStream# which operates on single elements. Munpacker :: Enumeratee B.ByteString [Word8] m a unpacker = mapChunks B.unpack<Convert a stream of s to a stream of s' using the supplied function.==Convert one stream into another, not necessarily in lockstep.WThe transformer mapStream maps one element of the outer stream to one element of the nested stream. The transformer below is more general: it may take several elements of the outer stream to produce one element of the inner stream, or the other way around. The transformation from one stream to the other is specified as Iteratee s m s'.>The most general stream converter. Given a function to produce iteratee transformers and an initial state, convert the stream using iteratees generated by the function while continually updating the internal state.@ACollapse a nested iteratee. The inner iteratee is terminated by EOF.. Errors are propagated through the result.The stream resumes from the point of the outer iteratee; any remaining input in the inner iteratee will be lost. Differs from *+b in that the inner iteratee is terminated, and may have a different stream type than the result.A/Lift an iteratee inside a monad to an iteratee.B6Applies the iteratee to the given stream. This wraps C, D, and J1, calling the appropriate enumerator based upon .CThe most primitive enumerator: applies the iteratee to the terminated stream. The result is the iteratee in the Done state. It is an error if the iteratee does not terminate on EOF.DUAnother primitive enumerator: tell the Iteratee the stream terminated with an error.ECombines an Enumeratee from s to s'! and an Iteratee that consumes s'! into an Iteratee which consumes sF-Combines Enumerator which produces stream of s and  Enumeratee which transforms stream of s to stream of s'- to into Enumerator which produces stream of s'GEnumeratee composition Run the second enumeratee within the first. In this example, stream2list is run within the 'takeStream 10', which is itself run within 'takeStream 15', resulting in 15 elements being consumederun =<< enumPure1Chunk [1..1000 :: Int] (joinI $ (I.takeStream 15 ><> I.takeStream 10) I.stream2list)[1,2,3,4,5,6,7,8,9,10]H7enumeratee composition with the arguments flipped, see GI`Combine enumeration over two streams. The merging enumeratee would typically be the result of () or (, (see merge for example).JThe pure 1-chunk enumeratorIt passes a given list of elements to the iteratee in one chunk This enumerator does no IO and is useful for testing of base parsingKEnumerate chunks from a listL#Checks if an iteratee has finished.yThis enumerator runs the iteratee, performing any monadic actions. If the result is True, the returned iteratee is done.M-Create an enumerator from a callback functionNCreate an enumerator from a callback function with an exception handler. The exception handler is called if an iteratee reports an exception./ %&' ()*+,-./0123456789:;<=>?@ABCDEFGHIinner enumeratorouter enumeratormerging enumerateeJKLMNOPb GHIJKLMNOPQRSTUVWXYZ[\]^_`     %&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMN+&()*+-,789:/012;<=>?@A%'BCDJKLMN3465IFEGH. .  %&' ()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOP718191:1E0F1 None !"09:;<=LT&RCheck if a stream has received .S~Read a stream to the end and return all of its elements as a list. This iteratee returns all data from the stream *strictly*.TRead a stream to the end and return all of its elements as a stream. This iteratee returns all data from the stream *strictly*.UvAttempt to read the next element of the stream and return it Raise a (recoverable) error if the stream is terminated.The analogue of  List.headBecause headK can raise an error, it shouldn't be used when constructing iteratees for  convStream. Use tryHead instead.V Similar to  headStream, except it returns Nothing if the stream is terminated.WuAttempt to read the last element of the stream and return it Raise a (recoverable) error if the stream is terminatedThe analogue of  List.lastXGiven a sequence of characters, attempt to match them against the characters on the stream. Return the count of how many characters matched. The matched characters are removed from the stream. For example, if the stream contains abd, then (heads abc) will remove the characters ab and return 2.Y\Look ahead at the next element of the stream, without removing it from the stream. Return Just c if successful, return Nothing! if the stream is terminated by .ZReturn a chunk of t! elements length while consuming dN elements from the stream. Useful for creating a 'rolling average' with =.[6Drop n elements of the stream, if there are that many.The analogue of  List.drop\.Skip all elements while the predicate is true.The analogue of List.dropWhile]<Return the total length of the remaining part of the stream.,This forces evaluation of the entire stream.The analogue of  List.length^(Get the length of the current chunk, or Nothing if . This function consumes no input._Take n9 elements from the current chunk, or the whole chunk if n is greater.`Takes an element predicate and returns the (possibly empty) prefix of the stream. None of the characters in the string satisfy the character predicate. If the stream is not terminated, the first character of the remaining stream satisfies the predicate.N.B. a! should be used in preference to  breakStream.  breakStreamT will retain all data until the predicate is met, which may result in a space leak.The analogue of  List.breaka{Takes an element predicate and an iteratee, running the iteratee on all elements of the stream until the predicate is met.the following rule relates break to breakE break pred === joinI (breakE pred stream2stream)breakE! should be used in preference to break whenever possible.bRead n elements from a stream and apply the given iteratee to the stream of the read elements. Unless the stream is terminated early, we read exactly n elements, even if the iteratee has accepted fewer.The analogue of  List.takecRead n elements from a stream and apply the given iteratee to the stream of the read elements. If the given iteratee accepted fewer elements, we stop. This is the variation of bw with the early termination of processing of the outer stream once the processing of the inner stream finished early.Iteratees composed with cv will consume only enough elements to reach a done state. Any remaining data will be available in the outer stream. > let iter = do h <- joinI $ takeUpTo 5 I.head t <- stream2list return (h,t) > enumPureNChunk [1..10::Int] 3 iter >>= run >>= print (1,[2,3,4,5,6,7,8,9,10]) > enumPureNChunk [1..10::Int] 7 iter >>= run >>= print (1,[2,3,4,5,6,7,8,9,10])in each case, I.headS consumes only one element, returning the remaining 4 elements to the outer streamd{Takes an element predicate and an iteratee, running the iteratee on all elements of the stream while the predicate is met.This is preferred to  takeWhile.eMap a function over an >. This one is reimplemented and differs from the the one in Data.Iteratee.ListLike& in so far that it doesn't pass on an 8 received in the input, which is the expected behavior.fMap the stream rigidly.Like eF, but the element type cannot change. This function is necessary for  ByteString% and similar types that cannot have LooseMap& instances, and may be more efficient.g~Apply a function to the elements of a stream, concatenate the results into a stream. No giant intermediate list is produced.hApply a monadic function to the elements of a stream, concatenate the results into a stream. No giant intermediate list is produced.j Creates an  enumeratees with only elements from the stream that satisfy the predicate function. The outer stream is completely consumed.The analogue of  List.filterV XXX filterStream :: (ListLike s a, NullPoint s) => (a -> Bool) -> Enumeratee s s m rk'Apply a monadic filter predicate to an .l Grouping on s. groupStreamOn proj inner outer executes inner (proj e), where e+ is the first input element, to obtain an  i, then passes elements e to i as long as proj e produces the same result. If proj e) changes or the input ends, the pair of proj e and the result of run i is passed to outer#. At end of input, the resulting outer is returned.m Grouping on s. groupStreamBy cmp inner outer executes inner to obtain an  i, then passes elements e to i as long as cmp e0 e, where e0< is some preceeding element, is true. Else, the result of run i is passed to outer and m+ restarts. At end of input, the resulting outer is returned.n mergeStreams offers another way to nest iteratees: as a monad stack. This allows for the possibility of interleaving data from multiple streams. -- print each element from a stream of lines. logger :: (MonadIO m) => Iteratee [ByteString] m () logger = mapStreamM_ (liftIO . putStrLn . B.unpack) -- combine alternating lines from two sources -- To see how this was derived, follow the types from -- 'ileaveLines logger' and work outwards. run =<< enumFile 10 "file1" (joinI $ enumLinesBS $ ( enumFile 10 "file2" . joinI . enumLinesBS $ joinI (ileaveLines logger)) >>= run) ileaveLines :: (Functor m, Monad m) => Enumeratee [ByteString] [ByteString] (Iteratee [ByteString] m) [ByteString] ileaveLines = mergeStreams (\l1 l2 -> [B.pack "f1:\n\t" ,l1 ,B.pack "f2:\n\t" ,l2 ] oHA version of mergeStreams which operates on chunks instead of elements.'mergeByChunks offers more control than n. nt terminates when the first stream terminates, however mergeByChunks will continue until both streams are exhausted.o guarantees that both chunks passed to the merge function will have the same number of elements, although that number may vary between calls.phLeft-associative fold that is strict in the accumulator. This function should be used in preference to ( whenever possible.The analogue of  List.foldl'.q<Enumerate two iteratees over a single stream simultaneously. Compare to List.zip.uEnumerate over two iteratees in parallel as long as the first iteratee is still consuming input. The second iteratee will be terminated with EOF when the first iteratee has completed. An example use is to determine how many elements an iteratee has consumed: (snd <$> enumWith (dropWhile (<5)) length Compare to  zipStreamsvEnumerate a list of iteratees over a single stream simultaneously and discard the results. This is a different behavior than Prelude's sequence_ which runs iteratees in the list one after the other. Compare to Prelude.sequence_.wNTransform an iteratee into one that keeps track of how much data it consumes.xUThe pure n-chunk enumerator It passes a given stream of elements to the iteratee in n-sized chunks.yNMap a monadic function over the elements of the stream and ignore the result.zMap a monadic function over an .{ Fold a monadic function over an .*RSTUVWXYZlength of chunk (t)amount to consume (d)[\]^_`abnumber of elements to consumecdefghijklmnomerge functionpqrstuvwxyz{ GHIJKLMNOPQRSTUVWXYZ[\]^_`     %&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{*RST\[UVWXYZ]^_`abcdeghifjkmlnopxuqrstvwzy{*RSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{None !"09:;<=|Indicate endian-ness.}+Most Significant Byte is first (big-endian)~.Least Significan Byte is first (little-endian)Read 3 bytes in an endian manner. If the first bit is set (negative), set the entire first byte so the Int32 will be negative as well.|}~  |}~|}~|}~  None !"*09:;<=TThe enumerator of a POSIX File Descriptor. This version enumerates over the entire contents of a file, in order, unless stopped by the iteratee. In particular, seeking is not supported.:A variant of enumFd that catches exceptions raised by the Iteratee.:The enumerator of a POSIX File Descriptor: a variation of enumFd) that supports RandomIO (seek requests).Process a file using the given Iteratee.0A version of fileDriverFd that supports seeking.   Buffer size (number of elements)  Buffer size Buffer size Buffer size   None !"*09:;<=TThe (monadic) enumerator of a file Handle. This version enumerates over the entire contents of a file, in order, unless stopped by the iteratee. In particular, seeking is not supported. Data is read into a buffer of the specified size.OAn enumerator of a file handle that catches exceptions raised by the Iteratee.The enumerator of a Handle: a variation of enumHandle that supports RandomIO (seek requests). Data is read into a buffer of the specified size.Process a file using the given Iteratee. This function wraps  enumHandle as a convenience. A version of fileDriverHandle that supports seeking. )Buffer size (number of elements per read))Buffer size (number of elements per read))Buffer size (number of elements per read)  Buffer size Buffer size Buffer size Buffer size (number of elements) Buffer size (number of elements)  None !"*09:;<=3Default buffer size in elements. This was 1024 in  Data.Iteratee, which is obviously too small. Since we often want to merge many files, a read should take more time than a seek. This sets the sensible buffer size to somewhat more than one MB.WProcess a file using the given Iteratee. This function wraps enumFd as a convenience.HA version of fileDriver with a user-specified buffer size (in elements).]Process a file using the given Iteratee. This function wraps enumFdRandom as a convenience.  None !"09:;<=T4Print lines as they are received. This is the first impure; iteratee with non-trivial actions during chunk processingXOnly lines ending with a newline are printed, data terminated with EOF is not printed.!Print lines as they are received. All lines are printed, including a line with a terminating EOF. If the final line is terminated by EOF without a newline, no newline is printed. this function should be used in preference to printLines when possible, as it is more efficient with long lines.Convert the stream of characters to the stream of lines, and apply the given iteratee to enumerate the latter. The stream of lines is normally terminated by the empty line. When the stream of characters is terminated, the stream of lines is also terminated. This is the first proper iteratee-enumerator: it is the iteratee of the character stream and the enumerator of the line stream.Convert the stream of characters to the stream of words, and apply the given iteratee to enumerate the latter. Words are delimited by white space. This is the analogue of List.words  None !"09:;<= ]Would you believe it? The 2bit format stores blocks of Ns in a table at the beginning of a sequence, then packs four bases into a byte. So it is neither possible nor necessary to store Ns in the main sequence, and you would think they aren't stored there, right? And they aren't. Instead Ts are stored which the reader has to replace with Ns.The sensible way to treat these is probably to just say there are two kinds of implied annotation (repeats and large gaps for a typical genome), which can be interpreted in whatever way fits. And that's why we have  and .CTODO: use binary search for the Int->Int mappings on the raw data?Brings a 2bit file into memory. The file is mmap'ed, so it will not work on streams that are not actual files. It's also unsafe if the file is modified in any way. Repeat monadic action n` times. Returns result in reverse(!) order, but doesn't build a huge list of thunks in memory.}Merge blocks of Ns and blocks of Ms into single list of blocks with masking annotation. Gaps remain. Used internally only.Extract a subsequence and apply masking. TwoBit file can represent two kinds of masking (hard and soft), where hard masking is usually realized by replacing everything by Ns and soft masking is done by lowercasing. Here, we take a user supplied function to apply masking. Works only in forward direction.&Extract a subsequence without masking.pExtract a subsequence with typical masking: soft masking is ignored, hard masked regions are replaced with Ns.xExtract a subsequence with masking for biologists: soft masking is done by lowercasing, hard masking by printing an N.7limits a range to a position within the actual sequenceGSample a piece of random sequence uniformly from the genome. Only pieces that are not hard masked are sampled, soft masking is allowed, but not reported. On a 32bit platform, this will fail for genomes larger than 1G bases. However, if you're running this code on a 32bit platform, you have bigger problems to worry about.Gets a fragment from a 2bit file. The result always has the desired length; if necessary, it is padded with Ns. Be careful about the unconventional encoding: 0..4 == TCAGN!  2bit filedesired lengthRNGposition, sequence, new RNG None !"09:;<= A list of reference sequences.Reference sequence in Bam Bam enumerates the reference sequences and then sorts by index. We need to track that index if we want to reproduce the sorting order.Possible sorting orders from bam header. Thanks to samtools, which doesn't declare sorted files properly, we have to have the stupid  state, too.6Exactly two characters, for the "named" fields in bam.ZTests whether a reference sequence is valid. Returns true unless the the argument equals  invalidRefseq.PThe invalid Refseq. Bam uses this value to encode a missing reference sequence.HThe invalid position. Bam uses this value to encode a missing position.PTests whether a position is valid. Returns true unless the the argument equals  invalidPos.WThe empty list of references. Needed for BAM files that don't really store alignments.Compares two sequence names the way samtools does. samtools sorts by "strnum_cmp": . if both strings start with a digit, parse the initial sequence of digits and compare numerically, if equal, continue behind the numbers . else compare the first characters (possibly NUL), if equal continue behind them . else both strings ended and the shorter one counts as smaller (and that part is stupid) Normalizes a series of 5s and encodes them in the way BAM and SAM expect it. ]Computes the "distinct bin" according to the BAM binning scheme. If an alignment starts at pos# and its CIGAR implies a length of len* on the reference, then it goes into bin distinctBin pos len.G      @   @   .      None !"09:;<=^A mostly contiguous subset of a sequence, stored as a set of non-overlapping intervals in an IntMapG from start position to end position (half-open intervals, naturally).!cA subset of a genome. The idea is to map the reference sequence (represented by its number) to a  Subseqeunce. !"#$%&'()*+,- !"#$'&%()*+,-#$%&'!" ()*+,-  !"#$%&'()*+,-None !"09:;<=?Run an Iteratee, collect the input. When it finishes, return the result along with *all* input. Effectively allows lookahead. Be careful, this will eat memory if the Iteratee doesn't return speedily.@LCollects a string of a given length. Don't use this for long strings, use b instead.ARepeatedly apply an : to a value until end of stream. Returns the final value.B Convert a   into an . The  U is applied once, the decoded data is returned, unneded input remains in the stream.C Compose an 'Enumerator\'' with an ', giving a new 'Enumerator\''.D Merge two 'Enumerator\''/s into one. The header provided by the inner 'Enumerator\''E is passed to the output iterator, the header provided by the outer 'Enumerator\''" is passed to the merging iterateetXXX Something about those headers is unsatisfactory... there should be an unobtrusive way to combine headers.I:Parallel map of an IO action over the elements of a streamThis ' applies an W4 action to every chunk of the input stream. These WM actions are run asynchronously in a limited parallel way. Don't forget to K,Protects the terminal from binary junk. If i is an  that might write binary to  , then  protectTerm i is the same , but it will abort if   is a terminal device.LiA general progress indicator that prints some message after a set number of records have passed through.M>A simple progress indicator that prints the number of records.NWA simple progress indicator that prints a position every set number of passed records.T!A function to convert attoparsec Parsers into s.UEquivalent to $joinI $ takeStream n $ stream2vector, but more efficient.VReads the whole stream into a  .&3456789:;<=>?@ABCDinner enumeratorouter enumeratormerging enumerateeEFGHIJKLMNOPQRSTUVWX GHIJKLMNOPQRSTUVWXYZ[\]^_`     %&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~3456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVW/@BA?KIJLMNC EGF9:;<H>=D78OPQRS3456TVU W3456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXC1None!"09:;<=DRb@A collection of extension fields. The key is actually only two Charns, but that proved impractical. (Hmm... we could introduce a Key type that is a 16 bit int, then give it an instance IsString... practical?)c=Bam record in its native encoding along with virtual address.  A mutable vector that packs two # into one byte, just like Bam does.fA vector that packs two # into one byte, just like Bam does.g'internal representation of a BAM recordu$virtual offset for indexing purposes~{Cigar line in BAM coding Bam encodes an operation and a length into a single integer, we keep those integers in an array.extracts the aligned length from a cigar line This gives the length of an alignment as measured on the reference, which is different from the length on the query or the length of the alignment.?Smart constructor. Makes sure we got a at least a full record./Deletes all occurences of some extension field.yBlindly inserts an extension field. This can create duplicates (and there is no telling how other tools react to that).eDeletes all occurences of an extension field, then inserts it with a new value. This is safer than , but also more expensive.1Adjusts a named extension by applying a function.AA simple progress indicator that prints sequence id and position.MZ[\]^_`abc de f ghijklmnopqrstuvwxyz{|}~  FZ_\[]^`abcedfghijklmnopqrstuvwxyz{|}~Hcdedeghijklmnopqrstu~vwxyz{|}fbZ[\]^_`a+Z[\]^_`abc de  f ghijklmnopqrstuvwxyz{|}~  9 None !"059:;<=OTMWe need a simple priority queue. Here's a skew heap (specialized to strict G priorities and  values).The things we drag along in . Notes: * The active queue is a simple stack. We add at the front when we encounter reads, which reverses them. When traversing it, we traverse reads backwards, but since we accumulate the %, it gets reversed back. The new active queue, however, is no longer reversed (as it should be). So after the traversal, we reverse it again. (Yes, it is harder to understand than using a proper deque type, but it is cheaper. There may not be much point in the reversing, though.)`The pileup logic keeps a current coordinate (just two integers) and two running queues: one of active 9s that contribute to current genotype calling and on of waiting )s that will contribute at a later point.Oppan continuation passing style! Not only is the CPS version of the state monad (we have five distinct pieces of state) somewhat faster, we also need CPS to interact with the mechanisms of . It makes implementing , peek, and bump straight forward.!Simple single population model. 1 is the fraction of homozygous divergent sites, _ is the fraction of heterozygous variant sites among sites that are not homozygous divergent.RRaw pile. Bases and indels are piled separately on forward and backward strands.0Running pileup results in a series of piles. A  has the basic statistics of a VarCallj, but no GL values and a pristine list of variants instead of a proper call. We emit one pile with two  s (one for each strand) and one , (the one immediately following) at a time.Map quality and a list of encountered indel variants. The deletion has the reference sequence, if known, an insertion has the inserted sequence with damage information.bMap quality and a list of encountered bases, with damage information and reference base if known.Genotype likelihood values. A variant call consists of a position, some measure of qualities, genotype likelihood values, and a representation of variants. A note about the GL values: VCF would normalize them so that the smallest one becomes zero. We do not do that here, since we might want to compare raw values for a model test. We also store them in a EN to make arithmetics easier. Normalization is appropriate when converting to VCF.wIf GL is given, we follow the same order used in VCF: "the ordering of genotypes for the likelihoods is given by: F(j k) = (k*(k+1)2)+j. In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the ordering is: AA,AB,BB,AC,BC,CC, etc."Statistics about a genotype call. Probably only useful for fitlering (so not very useful), but we keep them because it's easy to track them.Represents our knowledge about a certain base, which consists of the base itself (A,C,G,T, encoded as 0..3; no Ns), the quality score (anything that isn't A,C,G,T becomes A with quality 0), and a substitution matrix representing post-mortem but pre-sequencing substitutions.Unfortunately, none of this can be rolled into something more simple, because damage and sequencing error behave so differently.Damage information is polymorphic. We might run with a simple version (a matrix) for calling, but we need more (a matrix and a mutable matrix, I think) for estimation.reference base from MD field called basequality of called basedamage informationdamage information more chunks)number of bases to wait due to a deletion map quality1Genotype Calling: like Samtools(?), but for aDNAThe goal for this module is to call haploid and diploid single nucleotide variants the best way we can, including support for aDNA. Indel calling is out of scope, we only do it "on the side".LThe cleanest way to call genotypes under all circumstances is probably the Dindel approach: define candidate haplotypes, align each read to each haplotype, then call the likely haplotypes with a quality derived from the quality scores. This approach neatly integrates indel calling with ancient DNA and makes a separate indel realigner redundant. However, it's rather expensive in that it requires inclusion of an aligner, and we'd need an aligner that is compatible with the chosen error model, which might be hard.Here we'll take a short cut: We do not really call indels. Instead, these variants are collected and are assigned an affine score. This works best if indels are 'left-aligned' first. In theory, one indel variant could be another indel variant with a sequencing error---we ignore that possibility for the most part. Once indels are taken care off, SNVs are treated separately as independent columns of the pileup.4Regarding the error model, there's a choice between samtools or the naive model everybody else (GATK, Rasmus Nielsen, etc.) uses. Naive is easy to marry to aDNA, samtools is (probably) better. Either way, we introduce a number of parameters (eta and kappa for samtools, lambda, delta, delta_ss for Johnson). Running a maximum likehood fit for those may be valuable. It would be cool, if we could do that without rerunning the complete genotype caller, but it's not a priority.So, outline of the genotype caller: We read BAM (minimally filtering; general filtering is somebody else's problem, but we might want to split by read group). We will scan each read's CIGAR line in concert with the sequence and effective quality. Effective quality is the lowest available quality score of QUAL, MAPQ, and BQ. For aDNA calling, the base is transformed into four likelihoods based on the aDNA substitution matrix.So, either way, we need something like "pileup", where indel variants are collected as they are (any length), while matches are piled up.Regarding output, we certainly don't want to write VCF or BCF. (No VCF because it's ugly, no BCF, because the tool support is non-existent.) It will definitely be something binary. For the GL values, small floating point formats may make sense: half-precision floating point's representable range would be 6.1E-5 to 6.5E+5, 0.4.4 minifloat from Bio.Util goes from 0 to 63488.-The primitive pieces for genotype calling: A position, a base represented as four likelihoods, an inserted sequence, and the length of a deleted sequence. The logic is that we look at a base followed by some indel, and all those indels are combined into a single insertion and a single deletion.0skip to position (at start or after N operation)1observed deletion and insertion between two basesnothing anymore7Decomposes a BAM record into chunks suitable for piling up. We pick apart the CIGAR and MD fields, and combine them with sequence and quality as appropriate. Clipped bases are removed/skipped as appropriate. We also do apply a substitution matrix to each base, which must be supplied along with the read.GComputes posterior genotype probabilities from likelihoods under the  model.The pileup enumeratee takes cIs, decomposes them, interleaves the pieces appropriately, and generates )s. The output will contain at most one  and one 2 for each position, piles are sorted by position.This top level driver receives cs. Unaligned reads and duplicates are skipped (but not those merely failing quality checks). Processing stops when the first read with invalid br_rname$ is encountered or a t end of file.hSends one piece of output downstream. You are not expected to understand how this works, but inlining 3" plugged an annoying memory leak.!The actual pileup algorithm. If actives contains something, continue here. Else find the coordinate to continue from, which is the minimum of the next waiting] coordinate and the next coordinate in input; if found, continue there, else we're all done.8Feeds input as long as it starts at the current positionChecks waitingM queue. If there is anything waiting for the current position, moves it to active queue.Separately scans the two active queues and makes one + from each. Also sees what's next in the : s contribute to two separate s, s are pushed back to the waiting queue, :s are removed, and everything else is added to two fresh active queues.NIN&None !"09:;<=  Argh, this business with the CIGAR operations is a mess, it gets worse when combined with MD. Okay, we will support CIGAR (no "=" and Xx operations) and MD. If we have MD on input, we generate it on output, too. And in between, we break everything into  very smallI operations. (Yes, the two terminating constructors are a weird hack.)6Removes duplicates from an aligned, sorted BAM stream.8The incoming stream must be sorted by coordinate, and we check for violations of that assumption. We cannot assume that length was taken into account when sorting (samtools doesn't do so), so duplicates may be separated by reads that start at the same position but have different length or different strand.We are looking at three different kinds of reads: paired reads, true single ended reads, merged or trimmed reads. They are somewhat different, but here's the situation if we wanted to treat them separately. These conditions define a set of duplicates:Merged or trimmed: We compare the leftmost coordinates and the aligned length. If the library prep is strand-preserving, we also compare the strand.jPaired: We compare both left-most coordinates (b_pos and b_mpos). If the library prep is strand-preserving, only first-mates can be duplicates of first-mates. Else a first-mate can be the duplicate of a second-mate. There may be pairs with one unmapped mate. This is not a problem as they get assigned synthetic coordinates and will be handled smoothly.True singles: We compare only the leftmost coordinate. It does not matter if the library prep is strand-preserving, the strand always matters.1Across these classes, we can see more duplicates:Merged/trimmed and paired: these can be duplicates if the merging failed for the pair. We would need to compare the outer coordinates of the merged reads to the two 5' coordinates of the pair. However, since we don't have access to the mate, we cannot actually do anything right here. This case should be solved externally by merging those pairs that overlap in coordinate space..Single and paired: in the single case, we only have one coordinate to compare. This will inevitably lead to trouble, as we could find that the single might be the duplicate of two pairs, but those two pairs are definitely not duplicates of each other. We solve it by removing the single read(s).jSingle and merged/trimmed: same trouble as in the single+paired case. We remove the single to solve it.9In principle, we might want to allow some wiggle room in the coordinates. So far, this has not been implemented. It adds the complication that groups of separated reads can turn into a set of duplicates because of the appearance of a new reads. Needs some thinking about... or maybe it's not too important.Once a set of duplicates is collected, we perform a majority vote on the correct CIGAR line. Of all those reads that agree on this CIGAR line, a consensus is called, quality scores are adjusted and clamped to a maximum, the MD field is updated and the XP field is assigned the number of reads in the original cluster. The new MAPQ becomes the RMSQ of the map qualities of all reads.Treatment of Read Groups: We generalize by providing a "label" function; only reads that have the same label are considered duplicates of each other. The typical label function would extract read groups, libraries or samples.  Workhorse for duplicate removal.ZUnmapped fragments should not be considered to be duplicates of mapped fragments. The unmapped= flag can serve for that: while there are two classes of unmapped reads (those that are not mapped and those that are mapped to an invalid position), the two sets will always have different coordinates. (Unfortunately, correct duplicate removal now relies on correct unmapped and  mate unmappedp flags, and we don't get them from unmodified BWA. So correct operation requires patched BWA or a run of  bam-fixpair.) sOther definitions (e.g. lack of CIGAR) don't work, because that information won't be available for the mate. This would amount to making the unmappedk flag part of the coordinate, but samtools is not going to take it into account when sorting.'Instead, both flags become part of the mate pos grouping criterion.~First Mates should (probably) not be considered duplicates of Second Mates. This is unconditionally true for libraries with A/B-style adapters (definitely 454, probably Mathias' ds protocol) and the ss protocol, it is not true for fork adapter protocols (vanilla Illumina protocol). So it has to be an option, which would ideally be derived from header information.HThis code ignores read groups, but it will do a majority vote on the RG field and call consensi for the index sequences. If you believe that duplicates across read groups are impossible, you must call it with an appropriately filtered stream.Half-Aligned Pairs (meaning one known coordinate, while the validity of the alignments is immaterial) are rather complicated: 2Given that only one coordinate is known (5' of the aligned mate), we want to treat them like true singles. But the unaligned mate should be kept if possible, though it should not contribute to a consensus sequence. We assume nothing about the unaligned mate, not even that it  shouldn't0 be aligned, never mind the fact that it couldn't[ be. (The difference is in the finite abilities of real world aligners, naturally.)8Therefore, aligned reads with unaligned mates go to the same potential duplicate set as true singletons. If at least one pair exists that might be a duplicate of those, all singletons and half-aligned mates are discarded. Else a consensus is computed and replaces the aligned mates.The unaligned mates end up in the same place in a BAM stream as the aligned mates (therefore we see them and can treat them locally). We cannot call a consensus, since these molecules may well have different length, so we select one. It doesn't really matter which one is selected, and since we're treating both mates at the same time, it doesn't even need to be reproducible without local information. This is made to be the mate of the consensus.See   for how it's actually done. dMerging information about true singles, merged singles, half-aligned pairs, actually aligned pairs.UWe collected aligned reads with unaligned mates together with aligned true singles (singlesq). We collected the unaligned mates, which necessarily have the exact same alignment coordinates, separately ( unaligned ). If we don't find a matching true pair (that case is already handled smoothly), we keep the highest quality unaligned mate, pair it with the consensus of the aligned mates and aligned singletons, and give it the lexically smallest name of the half-aligned pairs. Merging of half-aligned reads. The first argument is a map of unaligned reads (their mates are aligned to the current position), the second is a list of reads that are aligned (their mates are not aligned).So, suppose we're looking at a   that was passed through. We need to emit it along with its mate, which may be hidden inside a list. (Alternatively, we could force it to single, but that fails if we're passing everything along somehow.)Suppose we're looking at a  . We could pair it with some mate (which we'd need to duplicate), or we could turn it into a singleton. Duplication is ugly, so in this case, we force it to singleton.Normalize a read's alignment to fall into the canonical region of [0..l]. Takes the name of the reference sequence and its length. Returns Left xA if the coordinate decreased so the result is out of order now, Right x if the coordinate is unchanged.Wraps a read to be fully contained in the canonical interval [0..l]. If the read overhangs, it is duplicated and both copies are suitably masked. A piece with changed coordinate that is now out of order is returned as Left x+, if the order is fine, it is returned as Right x.  Split an   into two at some position. The position is counted in terms of the reference (therefore, deletions count, insertions don't). The parts that would be skipped if we were splitting lists are replaced by soft masks.!_Create an MD field from an extended CIGAR and place it in a record. We build it piecemeal (in go), call out to addNum, addRep, addDel to make sure the operations are not generated in a degenerate manner, and finally check if we're even supposed to create an MD field.0        !    !   !           !None !"09:;<=$&Trims from the 3' end of a sequence.  trim_3' p b% trims the 3' end of the sequence in b% at the earliest position such that p evaluates to true on every suffix that was trimmed off. Note that the 3' end may be the beginning of the sequence if it happens to be stored in reverse-complemented form. Also note that trimming from the 3' end may not make sense for reads that were constructed by merging paired end data (but we cannot take care of that here). Further note that trimming may break dependent information, notably the "mate" information of the mate and many optional fields.*4Trim predicate to get rid of low quality sequence. trim_low_quality q ns qs' evaluates to true if all qualities in qs are smaller (i.e. worse) than q.+gOverlap-merging of read pairs. We shall compute the likelihood for every possible overlap, then select the most likely one (unless it looks completely random), compute a quality from the second best merge, then merge and clamp the quality accordingly. (We could try looking for chimaera after completing the merge, if only we knew which ones to expect?)4Two reads go in, with two adapter lists. We return NZ if all merges looked mostly random. Else we return the two original reads, flagged as ' *and* the merged version, flagged as  and optionally R. All reads contain the computed qualities (in YM and YN), which we also return.>The merging automatically limits quality scores some of the time. We additionally impose a hard limit of 63 to avoid difficulties representing the result, and even that is ridiculous. Sane people would further limit the returned quality! (In practice, map quality later imposes a limit anyway, so no worries...),GTrimming for a single read: we need one adapter only (the one coming after the read), here provided as a list of options, and then we merge with an empty second read. Results in up to two reads (the original, possibly flagged, and the trimmed one, definitely flagged, and two qualities).-For merging, we don't need the complete adapters (length around 70!), only a sufficient prefix. Taking only the more-or-less constant part (length around 30), there aren't all that many different adapters in the world. To deal with pretty much every library, we only need the following forward adapters, which will be the default (defined here in the direction they would be sequenced in): Genomic R2, Multiplex R2, Fraft P7..Like ., these are the few adapters needed for the reverse read (defined in the direction they would be sequenced in as part of the second read): Genomic R1, CL 72.1QComputes overlap score for two reads (with qualities) assuming an insert length."#$%&'()*+,-./0123"#$%&'()*+,-./0123$%&'()*+,-./0#1"23"#$%&'()*+,-./0123None !"09:;<==One BGZF block: virtual offset and contents. Could also be a block of an uncompressed file, if we want to support indexing of uncompressed BAM or some silliness like that.A Decompressesn a plain file. What's actually happening is that the offset in the input stream is tracked and added to the Bytes s giving BlockEs. This results in the same interface as decompressing actual Bgzf.B*Decompress a BGZF stream into a stream of s.D*Decompress a BGZF stream into a stream of =s, np fold parallel. FDecodes a BGZF block header and returns the block size if successful.EGTests whether a stream is in BGZF format. Does not consume any input.F8Tests whether a stream is in GZip format. Also returns True@ on a Bgzf stream, which is technically a special case of GZip.GUMaximum block size for Bgzf: 64k with some room for headers and uncompressible stuffHThe EOF marker for BGZF files. This is just an empty string compressed as BGZF. Appended to BAM files to indicate their end. <Decompress a collection of strings into a single BGZF block.Ideally, we receive one decode chunk from a BGZF file, decompress it, and return it, in the process attaching the virtual address. But we might actually get more than one chunk, depending on the internals of the IterateeKs used. If so, we concatenate them; the first gets to assign the address.Now allocate space for uncompressed data, decompress the chunks we got, compute crc for each and check it, finally convert to  and emit. We could probably get away with unsafePerformIOo'ing everything in here, but then again, we only do this when we're writing output anyway. Hence, run in IO. :Compress a collection of strings into a single BGZF block.tOkay, performance was lacking... let's do it again, in a more direct style. We build our block manually. First check if the compressed data is going to fit---if not, that's a bug. Then alloc a buffer, fill with a dummy header, alloc a ZStream, compress the pieces we were handed one at a time. Calculate CRC32, finalize header, construct a byte string, return it. We could probably get away with unsafePerformIOo'ing everything in here, but then again, we only do this when we're writing output anyway. Hence, run in IO.IGet the current virtual offset. The virtual address in a BGZF stream contains the offset of the current block in the upper 48 bits and the current offset into that block in the lower 16 bits. This scheme is compatible with the way BAM files are indexed.JRuns an Iteratee for Bytes7s when decompressing BGZF. Adds internal bookkeeping.KCompresses a stream of Bytes,s into a stream of BGZF blocks, in parallel vBreaks a stream into chunks suitable to be compressed individually. Each chunk on output is represented as a list of C, each list must be reversed and concatenated to be compressed. (  does that.)LLike K, with sensible defaults.0456789:;< =>?@ ABCD EFGH IJK LMNOPQRSI456789:;<=>?@ABCDEFGHIJKLMN=>?@DCBAGHJI89:;<EFILMK4567N&456789:;< =>?@ ABCD EFGH IJK LMNOPQRSNone !"09:;<=^Things we are able to encode. Taking inspiration from binary-serialise-cbor, we define these as a lazy list-like thing and consume it in a interpreter.nWe manage a large buffer (multiple megabytes), of which we fill an initial portion. We remeber the size, the used part, and two marks where we later fill in sizes for the length prefixed BAM or BCF records. We move the buffer down when we yield a piece downstream, and when we run out of space, we simply move to a new buffer. Garbage collection should take care of the rest. Unused tG must be set to (maxBound::Int) so it doesn't interfere with flushing.wCreates a buffer.xQCreates a new buffer, copying the content from an old one, with higher capacity.y`Expand a chain of tokens into a buffer, sending finished pieces downstream as soon as possible.0UVWXYZ[\]^_`abcdefghijklmnopqrstuv wx yz{|}'UVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{'nopqrstuwzxy^_`abcdefghijklmUVWXYZ[\]v{UVWXYZ[\]^_`abcdefghijklmnopqrstuv wx yz{|}None !"09:;<= Printers for BAM. We employ an Iteratee interface, and we strive to keep BAM records in their encoded form. This is most compact and often faster, since it saves the time for repeated decoding and encoding, if that's not strictly needed.write in SAM format to stdout This is useful for piping to other tools (say, AWK scripts) or for debugging. No convenience function to send SAM to a file exists, because that's a stupid idea.mEncodes BAM records straight into a dynamic buffer, the BGZF's it. Should be fairly direct and perform well. writes BAM encoded stuff to a file XXX This should(!) write indexes on the side---a simple block index for MapReduce style slicing, a standard BAM index or a name index would be possible. When writing to a file, this makes even more sense than when writing to a Handle.lwrite BAM encoded stuff to stdout This send uncompressed BAM to stdout. Useful for piping to other tools.writes BAM encoded stuff to a HandleH We generate BAM with dynamic blocks, then stream them out to the file.XXX This could write indexes on the side---a simple block index for MapReduce style slicing, a standard BAM index or a name index would be possible.~   ~~ ~   None !"09:;<=5ATunes the compress algorithm but does not affact the correctness.Default strategypUse the filtered compression strategy for data produced by a filter (or predictor). Filtered data consists mostly of small values with a somewhat random distribution. In this case, the compression algorithm is tuned to compress them better. The effect of this strategy is to force more Huffman coding and less string matching; it is somewhat intermediate between  and .\Use the Huffman-only compression strategy to force Huffman encoding only (no string match).The  specifies how much memory should be allocated for the internal state. It is a tradeoff between memory usage, speed and compression. Using more memory allows faster and better compression.-The memory used for interal state, excluding 0, is 512 bits times 2 to power of memory level..The total amount of memory use depends on the  and .Default memory level set to 8.Use the small amount of memory (equivalent to memory level 1) - i.e. 1024b or 256 B. It slow and reduces the compresion ratio.{Maximum memory level for optimal compression speed (equivalent to memory level 9). The internal state is 256kib or 32kiB.0A specific level. It have to be between 1 and 9.This specify the size of compression level. Larger values result in better compression at the expense of highier memory usage.OThe compression window size is 2 to the power of the value of the window bits.2The total memory used depends on windows bits and ./The size of window bits. It have to be between 8+ (which corresponds to 256b i.e. 32B) and 15) (which corresponds to 32 kib i.e. 4kiB).%The default window size which is 4kiBSpecify the compression method..'Deflate' is so far the only method supported.IThe compression level specify the tradeoff between speed and compression.#Default compression level set at 6."No compression, just a block copy.9The fastest compression method (however less compression)-The best compression method (however slowest)+Compression level set by number from 1 to 94Specify the format for compression and decompressionsThe gzip format is widely used and uses a header with checksum and some optional metadata about the compress file.oIt is intended primarily for compressing individual files but is also used for network protocols such as HTTP.%The format is described in RFC 1952  #http://www.ietf.org/rfc/rfc1952.txt.zThe zlib format uses a minimal header with a checksum but no other metadata. It is designed for use in network protocols.%The format is described in RFC 1950 #http://www.ietf.org/rfc/rfc1950.txt[The 'raw' format is just the DEFLATE compressed data stream without and additionl headers.%Thr format is described in RFC 1951 #http://www.ietf.org/rfc/rfc1951.txtFormat for decompressing a  or  stream.<Set of parameters for decompression. For sane defaults see .2Window size - it have to be at least the size of  the stream was compressed with. Default in Y is the maximum window size - please do not touch it unless you know what you are doing./The size of output buffer. That is the size of ;s that will be emitted to inner iterator (except the last ).:Set of parameters for compression. For sane defaults use /The size of output buffer. That is the size of ;s that will be emitted to inner iterator (except the last ). ,Denotes the flush that can be sent to stream YAll pending output is flushed and all input that is available is sent to inner Iteratee. Flush all pending output and reset the compression state. It allows to restart from this point if compression was damaged but it can seriously affect the compression rate.'It may be only used during compression. If the iteratee is compressing it requests to stop when next block is emmited. On the beginning it skips only header if and only if it exists..Denotes error in compression and decompression?Decompression requires user-supplied dictionary (not supported)7Buffer error - denotes a library error | File ErrorState of steam inconsistentInput data corruptedNot enough memory Version error2Unexpected or unknown error - please report as bug*Incorrect state - denotes error in library(Denotes error is user-supplied parameter&Incorrect compression level was chosen*Incorrect number of window bits was chosen!Incorrect memory level was chosen.Compress the input and send to inner iteratee.xDecompress the input and send to inner iteratee. If there is data after the end of zlib stream, it is left unprocessed.=Inflate if Gzip format is recognized, otherwise pass through.~Enumerate synchronise flush. It cause the all pending output to be flushed and all available input is sent to inner Iteratee.Enumerate full flush. It flushes all pending output and reset the compression. It allows to restart from this point if compressed data was corrupted but it can affect the compression rate.'It may be only used during compression.Enumerate block flush. If the enumerator is compressing it allows to finish current block. If the enumerator is decompressing it forces to stop on next block boundary.y                        Format of inputParameters of compression<<E                                None !"09:;<= 'Parsers for BAM and SAM. We employ an Iteratee interface, and we strive to support everything possible in BAM. The implementation of nucleotides is somewhat lacking: the "=" symbol is not understood.TONOTDO: - Reader for gzippedbzippedjbgzf'ed SAM. Storing SAM is a bad idea, so why would anyone ever want to compress, much less index it?*Checks if a file contains BAM in any of the common forms, then decompresses it appropriately. If the stream doesn't contain BAM at all, it is instead decoded as SAM. Since SAM is next to impossible to recognize reliably, we don't even try. Any old junk is decoded as SAM and will fail later.@Iteratee-style parser for SAM files, designed to be compatible with the BAM parsers. Parses plain uncompressed SAM, nothing else. Since it is supposed to work the same way as the BAM parser, it requires the presense of the SQ header lines. These are stripped from the header text and turned into the symbol table.Parser for SAM that doesn't look for a header. Has the advantage that it doesn't stall on a pipe that never delivers data. Has the disadvantage that it never reads the header and therefore needs a list of allowed RNAMEs.Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses ?? internally, so it shouldn't consume anything from the stream. Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses ?? internally, so it shouldn't consume anything from the stream.Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses ?? internally, so it shouldn't consume anything from the stream.Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses ?? internally, so it shouldn't consume anything from the stream.Tests if a data stream is a Bam file. Recognizes plain Bam, gzipped Bam and bgzf'd Bam. If a file is recognized as Bam, a decoder (suitable Enumeratee) for it is returned. This uses ?? internally, so it shouldn't consume anything from the stream.Checks if a file contains BAM in any of the common forms, then decompresses it appropriately. We support plain BAM, Bgzf'd BAM, and Gzip'ed BAM.1The recommendation for these functions is to use  decodeAnyBam (or decodeAnyBamFile) for any code that can handle BamRaw input, but decodeAnyBamOrSam (or decodeAnyBamOrSamFile) for code that needs BamRecX. That way, SAM is supported automatically, and seeking will be supported if possible.TDecode a BAM stream into raw entries. Note that the entries can be unpacked using decodeBamEntry=. Also note that this is an Enumeratee in spirit, only the BamMeta and Refs need to get passed separately.   =>?@BCL=>?@CBL   None !"09:;<= Decode only those reads that fall into one of several regions. Strategy: We will scan the file mostly linearly, but only those regions that are actually needed. We filter the decoded stuff so that it actually overlaps our regions.:From the binning index, we get a list of segments per requested region. Using the checkpoints, we prune them: if we have a checkpoint to the left of the beginning of the interesting region, we can move the start of each segment forward to the checkpoint. If that makes the segment empty, it can be droppped.tThe resulting segment lists are merged, then traversed. We seek to the beginning of the earliest segment and start decoding. Once the virtual file position leaves the segment or the alignment position moves past the end of the requested region, we move to the next. Moving is a seek if it spans a sufficiently large gap or points backwards, else we just keep going.A  U has a start and an end offset, and an "end coordinate" from the originating region. Checkpoints. Each checkpoint is a position with the virtual offset where the first alignment crossing the position is found. In BAI, we get this from the ioffset$ vector, in CSI we get it from the loffsetg field: "Given a region [beg,end), we only need to visit chunks whose end file offset is larger than ioffset of the 16kB window containing beg*." (Sounds like a marginal gain, though.) !.Mapping from bin number to vector of clusters.Full index, unifying BAI and CSI style. In both cases, we have the binning scheme, parameters are fixed in BAI, but variable in CSI. Checkpoints are created from the linear index in BAI or from the loffset field in CSI.Minshift parameter from CSIDepth parameter from CSI/Best guess at where the unaligned records start!Room for stuff (needed for tabix)VRecords for the binning index, where each bin has a list of segments belonging to it.jKnown checkpoints of the form (pos,off) where off is the virtual offset of the first record crossing pos. "jMerges two lists of segments. Lists must be sorted, the merge sort merges overlapping segments into one./Reads any index we can find for a file. If the file name has a .bai or .csi extension, we read it. Else we look for the index by adding such an extension and by replacing the extension with these two, and finally in the file itself. The first file that exists and can actually be parsed, is used.Read an index in BAI or CSI format, recognized automatically. Note that TBI is supposed to be compressed using bgzip; it must be decompressed before being passed to .UReads a Tabix index. Note that tabix indices are compressed, this is taken care of. #KReads the list of segments from an index file and makes sure it is sorted."Seeks to a given sequence in a Bam file and enumerates only those records aligning to that reference. We use the first checkpoint available for the sequence. This requires an appropriate index, and the file must have been opened in such a way as to allow seeking. Enumerates over the BamRaw] records of the correct sequence only, doesn't enumerate at all if the sequence isn't found.jSeeks to the part of a Bam file that contains unaligned reads and enumerates those. Sort of the dual to u. We use the best guess at where the unaligned stuff starts. If no such guess is available, we decode everything. $Enumerates one  . Seeks to the start offset, unless reading over the skipped part looks cheaper. Enumerates until we either cross the end offset or the max position.Subsample randomly from a BAM file. If an index exists, this produces an infinite stream taken from random locations in the file.9XXX It would be cool if we could subsample from multiple BAM files. It's a bit annoying to code: we'd probably read the indices up front, estimate how many reads we'd find in each file, then open them recursively to form a monad stack where the merging function has to select randomly where to read from. Hm./ % & ' ( ) * + , - . / 0 1 2 3  4 5 ! 6 7 8 " 9 : # ; < $ = #$'&%#$%&'  % & ' ( ) * + , - . / 0 1 2 3  4 5 ! 6 7 8 " 9 : # ; < $ = "4None !"09:;<=/A quality filter is simply a transformation on BamRec/s. By convention, quality filters should set  flagFailsQC#, a further step can then remove the failed reads. Filtering of individual reads tends to result in mate pairs with inconsistent flags, which in turn will result in lone mates and all sort of troubles with programs that expect non-broken BAM files. It is therefore recommended to use  pairFilter4 with suitable predicates to do the post processing.*Quality filters adapted from old pipeline.A filter/transformation applied to pairs of reads. We supply a predicate to be applied to single reads and one to be applied to pairs, tha latter can get incomplete pairs, too, if mates have been separated or filtered asymmetrically.Simple complexity filter aka "Nancy Filter". A read is considered not-sufficiently-complex if the most common base accounts for greater than the cutoff fraction of all non-N bases.WFilter on order zero empirical entropy. Entropy per base must be greater than cutoff.>Filter on average quality. Reads without quality string pass.Filter on minimum quality. In qualityMinimum n q(, a read passes if it has no more than n bases with quality less than q&. Reads without quality string pass.[Convert quality scores from old Illumina scale (different formula and offset 64 in FastQ).ZConvert quality scores from new Illumina scale (standard formula but offset 64 in FastQ).  >  > None !"09:;<= Parser for  FastA/FastQ,  style, based on Data.Attoparsec6, and written such that it is compatible with module -.. This gives import of  FastA/FastQ* while respecting some local conventions.Reader for DNA (not protein) sequences in FastA and FastQ. We read everything vaguely looking like FastA or FastQ, then shoehorn it into a BAM record. We strive to extract information following more or less established conventions from the header, but we won't support everything under the sun. The recognized syntactical warts are converted into appropriate flags and removed. Only the canonical variant of FastQ is supported (qualities stored as raw bytes with base 33).!Supported additional conventions:A name suffix of /1 or /2X is turned into the first mate or second mate flag and the read is flagged as paired.Same for name prefixes of F_ or R_, respectively.A name prefix of M_* flags the sequence as unpaired and mergedA name prefix of T_+ flags the sequence as unpaired and trimmedA name prefix of C_U, either before or after any of the other prefixes, is turned into the extra flag XP:i:-1? (result of duplicate removal with unknown duplicate count).dA collection of tags separated from the name by an octothorpe is removed and put into the fields XI and XJ as text.In  only, if the first word of the description has at least four colon separated subfields, the first if used to flag first/second mate, the second is the "QC failed" flag, and the fourth is the index sequence.QEverything before the first sequence header is ignored. Headers can start with > or @j, we treat both equally. The first word of the header becomes the read name, the remainder of the header is ignored. The sequence can be split across multiple lines; whitespace, dashes and dots are ignored, IUPAC-IUB ambiguity codes are accepted as bases, anything else causes an error. The sequence ends at a line that is either a header or starts with +, in the latter case, that line is ignored and must be followed by quality scores. There must be exactly as many Q-scores as there are bases, followed immediately by a header or end-of-file. Whitespace is ignored.Same as , but a custom function can be applied to the description string (the part of the header after the sequence name), which can modify the parsed record. Note that the quality field can end up empty. ? @ ? @!None !"09:;<=Fixes abuse of flags valued 0x800 and 0x1000. We used them for low quality and low complexity, but they have since been redefined. If set, we clear them and store them into the ZD field. Also fixes abuse of the combination of the paired, 1st mate and 2nd mate flags used to indicate merging or trimming. These are canonicalized and stored into the FF field. This function is unsafe on BAM files of unclear origin!WFixes typical inconsistencies produced by Bwa: sometimes, 'mate unmapped' should be set, and we can see it, because we match the mate's coordinates. Sometimes 'properly paired' should not be set, because one mate in unmapped. This function is generally safe, but needs to be called only on the output of affected (older?) versions of Bwa.SRemoves syntactic warts from old read names or the read names used in FastQ files./None !"09:;<= GHIJKLMNOPQRSTUVWXYZ[\]^_`     %&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~    #$'&%3456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWZ_\[]^`abcdefghijklmnopqrstuvwxyz{|}~"#$%&'()*+,-./0123=>?@BCL~"None !"09:;<=Mode argument for  *, determines where free gaps are allowed.*align globally, without gaps at either end:align so that the second sequence is a prefix of the first :align so that the first sequence is a prefix of the second Align two strings. myersAlign maxd seqA mode seqB tries to align seqA to seqB*, which will work as long as no more than maxd( gaps or mismatches are incurred. The modeX argument determines if either of the sequences is allowed to have an overhanging tail.The result is the triple of the actual distance (gaps + mismatches) and the two padded sequences. These sequences are the original sequences with dashes inserted for gaps.HThe algorithm is the O(nd) algorithm by Myers, implemented in C. A gap and a mismatch score the same. The strings are supposed to code for DNA, the code understands IUPAC-IUB ambiguity codes. Two characters match iff there is at least one nucleotide both can code for. Note that N is a wildcard, while X matches nothing. Nicely print an alignment. An alignment is simply a list of strings with inserted gaps to make them align. We split them into manageable chunks, stack them vertically and add a line showing asterisks in every column where all aligned strings agree. The result is almost the Clustal format. A         A  #None !"059:;<=#Collected "traditional" statistics:~Base composition near 5' end and near 3' end. Each consists of five vectors of counts of A,C,G,T, and everything else.  begins with context% bases to the left of the 5' end,  ends with context" bases to the right of the 3' end.Substitutions. Counted from the reconstructed alignment, once around the 5' end and once around the 3' end. For a total of 2*4*4 different substitutions. Positions where the query has a gap are skipped.Substitutions at CpG motifs. Also counted from the reconstructed alignment, and a CpG site is simply the sequence CG in the reference. Gaps may confuse that definition, so that CpHpG still counts as CpG, because the H is gapped. That might actually be desirable.Conditional substitutions. The 5' and 3' ends count as damaged if the very last position has a C-to-T substitution. With that in mind, ,  , " are like , but counting only reads where the 5' end is damaged, where the 3' end is damaged, and where both ends are damaged, respectively.9XXX This got kind of ugly. We'll see where this goes...6*Parameters for the universal damage model.We assume the correct model is either no damage, or single strand damage, or double strand damage. Each of them comes with a probability. It turns out that blending them into one is simply accomplished by multiplying these probabilities onto the deamination probabilities.PFor single stranded library prep, only one kind of damage occurs (C frequency (8a) in single stranded parts, and the overhang length is distributed exponentially with parameter : at the 5' end and ; at the 3' end. (Without UDG treatment, those will be equal. With UDG, those are much smaller and in fact don't literally represent overhangs.)eFor double stranded library prep, we get C->T damage at the 5' end and G->A at the 3' end with rate <% and both in the interior with rate =. Everything is symmetric, and therefore the orientation of the aligned read doesn't matter either. Both overhangs follow a distribution with parameter >.AA AS is a function that gives substitution matrices for each position in a read. The A can depend on whether the alignment is reversed, the length of the read and the position. (In practice, we should probably memoize precomputed damage models somehow.)D3Things specific to ancient DNA, e.g. damage models.For aDNA, we need a substitution probability. We have three options: use an empirically determined PSSM, use an arithmetically defined PSSM based on the Johnson3 model, use a context sensitive PSSM based on the Johnson model and an alignment. Using Dindel, actual substitutions relative to a called haplotype would be taken into account. Since we're not going to do that, taking alignments into account is difficult, somewhat approximate, and therefore not worth the hassle./We represent substitution matrices by the type Dy. Internally, this is a vector of packed vectors. Conveniently, each of the packed vectors represents all transition into the given nucleotide.FRConvenience function to access a substitution matrix that has a mnemonic reading.JAdds the two matrices of a mutable substitution model (one for each strand) appropriately, normalizes the result (to make probabilities from pseudo-counts), and freezes that into one immutable matrix. We add a single count everywhere to avoid getting NaNs from bizarre data.KA for undamaged DNA. The likelihoods follow directly from the quality score. This needs elaboration to see what to do with amibiguity codes (even though those haven't actually been observed in the wild). BTGeneric substitution matrix, has C->T and G->A deamination as parameters. Setting p or qP to 0 as appropriate makes this apply to the single stranded or undamaged case.OmEnumeratee (almost) that computes some statistics from plain BAM (no MD field needed) and a 2bit file. The  m is also reconstructed and passed downstream. The result of any downstream processing is available in the & field of the result.Get the reference sequence including both contexts once. If this includes invalid sequence (negative coordinate), pad suitably.dAccumulate counts for the valid parts around 5' and 3' ends as appropriate from flags and config.cCombine the part that was aligned to (so no context) with the read to reconstruct the alignment.Arguments are the table of reference names, the 2bit file with the reference, the amount of context outside the alignment desired, and the amount of context inside desired.For l fragments, we cut the read in the middle, so the 5' and 3' plots stay clean from each other's influence.  and 8 fragments count completely towards the appropriate end.P^Enumeratee (almost) that computes some statistics from plain BAM with a valid MD field. The  m is also reconstructed and passed downstream. The result of any downstream processing is available in the & field of the result.2Reconstruct the alignment from CIGAR, SEQ, and MD.BFilter the alignment to get the reference sequence, accumulate it.)Accumulate everything over the alignment.5The argument is the amount of context inside desired.For l fragments, we cut the read in the middle, so the 5' and 3' plots stay clean from each other's influence.  and 8 fragments count completely towards the appropriate end.Q*Common logic for statistics. The function get_ref_and_aln reconstructs reference sequence and alignment from a Bam record. It is expected to construct the alignment with respect to the forwards strand of the reference; we reverse-complement it if necessary. CyReconstructs the alignment from reference, query, and cigar. Only positions where the query is not gapped are produced. DrReconstructs the alignment from query, cigar, and md. Only positions where the query is not gapped are produced.R&Number of mismatches allowed by BWA. bwa_cal_maxdiff thresh len# returns the number of mismatches bwa aln -n $tresh! would allow in a read of length lenI. For reference, here is the code from BWA that computes it (we assume  err = 0.02, just like BWA): Iint bwa_cal_maxdiff(int l, double err, double thres) { double elambda = exp(-l * err); double sum, y = 1.0; int k, x = 1; for (k = 1, sum = elambda; k < 1000; ++k) { y *= l * err; x *= k; sum += elambda * y / x; if (1.0 - sum < thres) return k; } return 2; } double sum, y = 1.0; int k, x = 1; for (k = 1, sum = elambda; k < 1000; ++k) { y *= l * err; x *= k; sum += elambda * y / x; if (1.0 - sum < thres) return k; } return 2; } @M  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJK BLMNOPQ E C DRSTUF  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRF !"#$%&NQPO6789:;<=>+,-./012345'()*AFG ?@KLMDEBCHIJR!  !"#$%&'()*+ ,-./0123456789:;<=>?@ABCDEFGHIJK BLMNOPQ E C DRSTU@9 F8 F012013456789:;<=>?0@A0@B0CD0CE0FG0FH0CI0JK0JK0LM0LN0LO0LPQRSQRTUVWUXYZ[\78]78^_`a=bcdefghijklmnopqrs t u v w w x y z { | } ~   ''      !"#$%&'()*+,-./ 0 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~                                           ,                  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOOPPQQRSTUVWXYZ[\]^_``abccdefghijklmnopqrstuvwxyz{|}~<      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUUVWXYZ[\]]^_`abcdefghijklmnopqrsttuvwxyz{|}~UU        !!!""""" "!""###$#%#&#'#(#)#*#+#,#-#.#.#/#0#1#2#3#4#5#6#7#8#9#:#;#<#=#>#?#@#A#B#C#D#E#F#G#H#I#J#K#L#M#N#O#P#Q#R#S#T#U#V#W#W#X#X#Y#Z#[#\#]#^#_#`#a#b#c#d#e#f#g#h#i#j#k#l#m#n#o#p#q#r#s#t#u#v#w#x#y#z {   v | } ~  00000000000000000000*000+0000000000000000000000000000000000000000000000000000000000000 0 0 0 0 00000000000 0!0"0#0$0%0&0'0(0)0*0+0,0-0.0/000102030405060708090:;0:<0:=0:>0?@0?A0?B0?C0?D0E0FG0FH0I0J0K0L0M0N0OP0OQ0OR0OS0OT0OU0OV0OW0OX0OY0OZ0O[0O\0O]0OU0O^0O_0`a0`b0`c0`d0`e0fg0h0i0j0klmn0op0oq0or0ostuv0w0x0yz{|}0~001010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 000000000000000 0!0"#0"$0"%0&'0&(0&)0&*0+,0+-0+.0/00/1023045046047048090:0;0<0=0>0?0@0A0B0CD0CE0CF0CG0HI0HJ0HK0HL0MN0MO0MP0MQ0MR0MS0MT0MU0MV0WX0WY0WZ0W[0W\0W]0W^0W_0W`0Wa0*b0*c0*d0*e0*f0*g0*h0*i0*j0*k0*l0*m0*n0*o0`p0`q0`r0`s0`t0`u0vw0vx0vy0vz0v{0v|0v}0v~0v~0v00000000000000000000000000000000000000000000000000000000000000000000$0$0$0$000000000000000000000000000000000000000000000000000000000000000000000 0 0F 0F 0F 0F 0F 0F 0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0FG0F0F0F0F0F0F0F0F0F 0F!0F"0F#0$0%0&0'0(0)0*0+0,0-0.0/000102030405060708090:0;0<0=0>0?0@0A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0V0W0W0X0Y0Z0[0[0\0\0]0]0^0^0_0_0`0`0a0b0c0d0e0f0g0h0i0j0k0l0m0n0o0p0q0r0s0t0u0v0w0x0y0z0{0|0}0~00000O0O0O0O0O0O0O0O0O0O0O0O0O0O0O0O0O0O0O0O00000000000000000000000000000000000000000000000000000000000000000000L%0L0L0L0L0L0L0L0L0L0L0000000000000000000000000000000000000000000 0 0 0 0 00000010101000000000 0 0!"0!#0!$0!%0!&0!'0!(0!)0!*0!+0!,0!-0!.0!/0!00!10!20!30!40!50!60!70!80!90!:0!;0!<0!=0!>0!?0!@0!A0!B0!~0!C0!D0!E0!F0!G0!H0!I0!}0!J0!K0!L0!M0!N0!O0!P0!Q0!R0!S0!T0!U0!V0!W0XY0XZ0XZ0[\0[]0[^0[_0[`0a0b0c0d0e0fg0fh0fi0fj0fk0lm0ln0lo0lp0q0r0s0tu0tv0tw0tx0ty0tz0t{0t|0t}0t~0t0t0t0t0t0t0t0t0t0t0t0t0t0t0t0t0t0t0t0t0t000000000000000000000000000000000000000000000000000000000000000000000:0:0:0:0:0:0:0000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0  0  0  0 0 0 0 0 0 0 !0 "0 #0 $0 %0 &0 '0 (0 )0 *0 +0 ,0 - .0 - /0 - 00L 10L 1 2 3 4 5 6 4 5 7 4 5 8 4 5 9 4 5 : 4 5 ; 4 5 < 4 5 = 4 5 > ? @ A ? @ B ? C D ? C E ? C F ? C G ? C H ? C I ? C J ? C K ? C L ? C M ? C N ? C O ? C P ? C Q ? C R ? C S ? C T ? C U ? C V ? C W ? C X ? Y Z ? Y [ ? Y \ ? Y ] ? Y ^ ? Y _ ? Y ` ? Y a ? Y b ? Y c ? Y d ? Y e ? Y f ? Y g ? Y h ? Y i ? Y j ? Y k ? Y l ? Y m ? Y n ? Y o ? Y p ? Y q ? Y r ? Y s ? Y t ? Y u ? Y v ? Y w ? Y x ? Y y ? Y z ? Y { ? Y | ? Y } ? Y ~ ? Y  ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? Y ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  ?  ?  ?  ?  ?  ? ? ? ? ? ?  ?  ?  ?  ?  ?  ?  ?  ?  ?                                    7                                          ]                 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L Mg N O P Q Q R S T U V W X Y G Z [ \ ] ^ _ ` a b c d e" f# g# h# i# j k'biohazard-0.6.13-76o7sosvfZnFlVSh4LfswKBio.Base Bio.Bam.Rec Bio.Iteratee Bio.PreludeBio.Iteratee.IO.BaseBio.Iteratee.Exception Bio.Util.ZlibBio.Util.NumericBio.PriorityQueueBio.Iteratee.ReadableChunkBio.Iteratee.BaseBio.Iteratee.IterateeBio.Iteratee.ListLikeBio.Iteratee.BinaryBio.Iteratee.IO.FdBio.Iteratee.IO.HandleBio.Iteratee.IOBio.Iteratee.Char Bio.TwoBitBio.Bam.HeaderBio.Bam.RegionsBio.Bam.Pileup Bio.Bam.Rmdup Bio.Bam.TrimBio.Iteratee.BgzfBio.Iteratee.BuilderBio.Bam.WriterBio.Iteratee.ZLibBio.Bam.Reader Bio.Bam.IndexBio.Bam.Filter Bio.Bam.Fastq Bio.Bam.Evan Bio.AlignBio.AdnaControl.ExceptionthrowData.Iteratee.Exception IFExceptionData.Iteratee.ListLikemerge Control.Monadjoin mergeByChunksBioBamBio.BambaseGHC.WordWord8Word32#ListLike-4.5-3g6MY2hojKV91K6cugZ47cData.ListLike.BaseListLike'hashable-1.2.4.0-Ctl752zbguF6QanxurLOm2Data.Hashable.ClassHashable#text-1.2.2.1-9Yh8rJoh8fO2JMLWffT3QsData.Text.InternalText2unordered-containers-0.2.7.1-5INwdG7O5Jdakf1CqKoOBData.HashMap.BaseHashMapControl.Monad.IO.ClassliftIOMonadIOGHC.IO.Handle.FDstderrstdinSystem.Posix.TypesFd FileOffsetstdoutForeign.C.ErrorErrno GHC.ExceptiondisplayException fromException toException Exceptionbytestring-0.10.8.1Data.ByteString.Internalc2ww2ccontainers-0.5.7.1Data.IntMap.BaseIntMapData.IntSet.BaseIntSet'exceptions-0.8.3-5OTPYzRazb4DJ75sPncYEhControl.Monad.Catch MonadMaskhash hashWithSalttransformers-0.5.2.0Control.Monad.Trans.Classlift Data.HashSetHashSetdecompressGzipwilsonshowNumshowOOM invnormcdfestimateComplexity<#>log1pexpm1log1mexplog1pexplsumllerpchoosesigmoid2 isigmoid2Sizeable usedBytesPQPQ_Confmax_mb temp_pathmakePQdeletePQwithPQ enqueuePQ dequeuePQ peekMinPQgetMinPQsizePQ ReadableChunk readFromPtr$fReadableChunkByteStringWord8$fReadableChunkByteStringWord80$fReadableChunk[]Word$fReadableChunk[]Word32$fReadableChunk[]Word16$fReadableChunk[]Word8$fReadableChunk[]CharmyfdReadmyfdSeekselect'read'pendingIterStringException EofException SeekException IterException IExceptiontoIterExceptionfromIterExceptionEnumUnhandledIterExceptionEnumStringExceptionDivergentException EnumExceptionenStrExc wrapIterExciterExceptionToExceptioniterExceptionFromException iterStrExc$fIExceptionIterStringException$fExceptionIterStringException$fIExceptionEofException$fExceptionEofException$fIExceptionSeekException$fExceptionSeekException$fIExceptionIterException$fExceptionIterException$fShowIterException%$fExceptionEnumUnhandledIterException$fExceptionEnumStringException$fExceptionDivergentException$fExceptionEnumException$fShowEnumException$fExceptionIFException$fShowIFException$fShowDivergentException$fShowEnumStringException $fShowEnumUnhandledIterException$fShowSeekException$fShowEofException$fShowIterStringException NucleotideNunN$fEqNucleotide$fOrdNucleotide$fEnumNucleotide$fIxNucleotide$fStorableNucleotide NucleotidesNsunNs$fBoundedNucleotide$fVectorVectorNucleotide$fMVectorMVectorNucleotide$fUnboxNucleotide$fEqNucleotides$fOrdNucleotides$fEnumNucleotides$fIxNucleotides$fStorableNucleotidesQualQunQ nucToNucs$fBoundedNucleotides$fVectorVectorNucleotides$fMVectorMVectorNucleotides$fUnboxNucleotides$fEqQual $fOrdQual$fStorableQual $fBoundedQualProbProb'PrunPrtoQualfromQualfromQualRaised $fShowQual$fVectorVectorQual$fMVectorMVectorQual $fUnboxQual $fEqProb' $fOrdProb'$fStorableProb'Ranger_posr_lengthPositionPosp_seqp_startSeqidpowtoProbfromProb qualToProb probToQualnucAnucCnucGnucTgapnucsAnucsCnucsGnucsTnucsN p_is_reverse toNucleotide toNucleotidesisBase isProperBase properBasesisGapshowNucleotideshowNucleotidescomplcompls shiftPosition shiftRange reverseRange extendRange insideRange wrapRange findAuxFile$fReadNucleotides$fShowNucleotides$fReadNucleotide$fShowNucleotide$fFractionalProb' $fNumProb' $fShowProb'$fVectorVectorProb'$fMVectorMVectorProb' $fUnboxProb'$fShowPosition $fEqPosition $fOrdPosition $fShowRange $fEqRange $fOrdRangeUnpackunpack:!:PairLazyText LazyBytesBytesfdPut fdPutLazywithFd decodeBytes encodeBytes $fUnpack[] $fUnpackText$fUnpackByteString$fEqPair $fOrdPair $fShowPair $fReadPair $fBoundedPair$fIxPairIterateerunIter StreamStatus DataRemaining EofNoErrorEofErrorStreamEOFChunkNullablenullC NullPointemptyPsetEOFidoneicontliftIidoneMicontMruntryRun mapIterateeiliftifold$fMonadBaseControlbIteratee$fMonadTransControlIteratee$fMonadMaskIteratee$fMonadCatchIteratee$fMonadThrowIteratee$fMonadIOIteratee$fMonadBasebIteratee$fMonadTransIteratee$fMonadIteratee$fApplicativeIteratee$fFunctorIteratee$fFunctorStream$fMonoidStream $fEqStream$fNullableByteString$fNullableByteString0 $fNullable[]$fNullPointByteString$fNullPointByteString0 $fNullPoint[]$fNullPointEndo $fShowStream$fShowStreamStatus EnumeratorEnumerateeHandler EnumerateethrowErrthrowRecoverableErrcheckErrunitIterisStreamFinished skipToEofseek mapChunksM_ foldChunksMgetChunk getChunkseneeCheckIfDoneeneeCheckIfDoneHandleeneeCheckIfDonePasseneeCheckIfDoneIgnoremBindmBind_ioBindioBind_ mapChunks mapChunksM convStreamunfoldConvStreamunfoldConvStreamCheckjoinIjoinIM enumChunkenumEofenumErr=$$=><><>< mergeEnumsenumPure1ChunkenumListenumCheckIfDoneenumFromCallbackenumFromCallbackCatch$fIExceptionNotAnException$fExceptionNotAnException$fShowNotAnException isFinished stream2list stream2stream headStreamtryHead lastStreamheads peekStreamroll dropStreamdropWhileStream lengthStream chunkLength takeFromChunk breakStreambreakE takeStreamtakeUpTo takeWhileE mapStreamrigidMapStreamconcatMapStreamconcatMapStreamMmapMaybeStream filterStream filterStreamM groupStreamOn groupStreamBy mergeStreams foldStream zipStreams zipStreams3 zipStreams4 zipStreams5enumWithsequenceStreams_ countConsumedenumPureNChunk mapStreamM_ mapStreamM foldStreamMEndianMSBLSB endianRead2 endianRead3 endianRead3i endianRead4 endianRead8readWord16be_bsreadWord16le_bsreadWord32be_bsreadWord32le_bsreadWord64be_bsreadWord64le_bs $fEqEndian $fOrdEndian $fShowEndian $fEnumEndianenumFd enumFdCatch enumFdRandom fileDriverFdfileDriverRandomFdenumFileenumFileRandom enumHandleenumHandleCatchenumHandleRandomfileDriverHandlefileDriverRandomHandledefaultBufSize fileDriverfileDriverVBuffileDriverRandomfileDriverRandomVBuf printLinesprintLinesUnterminated enumLines enumWords enumWordsBS enumLinesBSMaskNoneSoftHardBothTwoBitSequenceTBS tbs_n_blocks tbs_m_blockstbs_dna_offset tbs_dna_size TwoBitFileTBFtbf_rawtbf_seqs openTwoBit takeOverlapgetFwdSubseqWith mergeBlocks getSubseqWith getLazySubseq getSubseqgetSubseqMaskedgetSubseqAscii getSeqnameslookupSequence getSeqLength clampPosition getRandomSeq getFragment getFwdSubseqV$fEqMask $fOrdMask $fEnumMask $fShowMaskMdOpMdNumMdRepMdDelRefsRefsequnRefseq BamOtherShit BamSortingUnknownUnsortedGrouped Queryname Coordinate GroupSortedBamSQsq_name sq_length sq_other_shit BamHeader hdr_version hdr_sortinghdr_other_shitBamKeyBamMetameta_hdr meta_refsmeta_other_shit meta_commentaddPG parseBamMetaparseBamMetaLine showBamMeta isValidRefseq invalidRefseq invalidPos isValidPos unknownMapq isKnownMapqnoRefsgetRef flagPairedflagProperlyPaired flagUnmappedflagMateUnmapped flagReversedflagMateReversed flagFirstMateflagSecondMate flagAuxillary flagFailsQC flagDuplicate eflagTrimmed eflagMergedeflagVestigial compareNamesreadMdshowMd distinctBin $fEnumRefseq $fShowRefseq$fMonoidBamHeader$fMonoidBamMeta $fShowBamKey$fIsStringBamKey $fEqBamKey $fOrdBamKey$fShowBamSorting$fEqBamSorting $fShowBamSQ $fEqBamSQ$fShowBamHeader $fEqBamHeader $fEqRefseq $fOrdRefseq $fIxRefseq $fShowBamMeta $fShowMdOp SubsequenceRegionsRegionrefseqstartendtoListfromListaddaddIntoverlapslookupLT $fEqRegion $fOrdRegion $fShowRegion$fShowSubsequence $fShowRegions ParseError errorContexts errorMessageQQ Ordering'LessEqualNotLess Enumeratee' Enumerator' iLookAhead iGetStringiterLoopiterGet$== mergeEnums' enumAuxFileenumDefaultInputs enumInputsmergeSortStreamsparMapChunksIOparRunIO protectTerm progressGen progressNum progressPosemptyQlengthQpushQpopQ cancelAllparserToIterateestream2vectorN stream2vector withFileFd$fExceptionParseError$fShowParseErrorExtIntFloatBinCharIntArrFloatArr ExtensionsBamRaw virt_offsetraw_dataVector_Nucs_halfBamRecb_qnameb_flagb_rnameb_posb_mapqb_cigarb_mrnmb_mposb_isizeb_seqb_qualb_extsb_virtual_offsetCigOpMatInsDelNopSMaHMaPadCigar:* alignedLength nullBamRecgetMdbamRaw unpackBamdeleteEinsertEupdateEadjustEisPairedisProperlyPaired isUnmappedisMateUnmapped isReversedisMateReversed isFirstMate isSecondMate isAuxillary isFailsQC isDuplicate isTrimmedisMerged isVestigial type_maskextAsInt extAsString setQualFlag progressBam$fShowVector_Nucs_half%$fMVectorMVector_Nucs_halfNucleotides#$fVectorVector_Nucs_halfNucleotides$fStorableCigar $fShowCigar $fEqCigOp $fOrdCigOp $fEnumCigOp $fShowCigOp$fBoundedCigOp $fIxCigOp $fEqCigar $fOrdCigar $fShowExt$fEqExt$fOrdExt $fShowBamRecHeapEmptyNodePileFPileMrunPileM SinglePopprob_divprob_hetPilePile'p_refseqp_pos p_snp_stat p_snp_pile p_indel_stat p_indel_pile IndelPileBasePile IndelVariant deleted_basesinserted_basesV_NucsV_NucGL CallStats read_depth reads_mapq0sum_mapqsum_mapq_squaredDmgToken fromDmgToken DamagedBaseDBdb_calldb_qual db_dmg_tk db_dmg_posdb_ref PosPrimChunksPrimBaseBase_pb_wait_pb_base_pb_mapq _pb_chunks PrimChunksSeekIndel EndOfRead decomposesingle_pop_posteriorpileup get_refseqget_posupd_pos yieldPilepileup'pileup'' p'feed_inputp'check_waiting p'scan_activeunionH getMinKeyHviewMinH $fMonadPileM$fApplicativePileM$fFunctorPileM$fMonoidCallStats$fShowDamagedBase$fShowPrimChunks$fShowPrimBase$fShowCallStats $fEqCallStats$fGenericCallStats $fEqV_Nuc $fOrdV_Nuc $fShowV_Nuc $fEqV_Nucs $fOrdV_Nucs $fShowV_Nucs$fEqIndelVariant$fOrdIndelVariant$fShowIndelVariant$fGenericIndelVariant $fShowPile'ECigWithMD WithoutMDMat'Rep'Ins'Del'Nop'SMa'HMa'Pad'Collapse cons_collapsecons_collapse_keepcheap_collapsecheap_collapse_keeprmdup check_sort normalizeTowrapTotoECigtoCigarsetMDprim_match_reads prim_match_adtrim_3'trim_3trim_back_cigartrim_fwd_cigarsanitize_cigar trim_cigartrim_low_quality merge_overlap trim_adapterdefault_fwd_adaptersdefault_rev_adapters merge_score match_adapter match_readstwoMins mergeTrimBamCompressParamscompression_level queue_depth BgzfChunk SpecialChunk RecordChunk LeftoverChunkNoChunkBlock block_offsetblock_contentsdecompressPlaindecompressBgzfdecompressBgzfBlocksdecompressBgzfBlocks'isBgzfisGzip maxBlockSize bgzfEofMarker getOffset liftBlock compressBgzf' compressBgzfcompressBgzfLv compressChunk$fNullableBgzfChunk$fNullPointBgzfChunk $fMonoidBlock$fNullableBlock$fNullPointBlock$fShowCompressParamsBclArgsBclSpecialType BclNucsBin BclNucsAsc BclNucsAscRev BclQualsBin BclQualsAscBclQualsAscRev BgzfTokensTkWord32TkWord16TkWord8TkFloatTkDoubleTkString TkDecimal TkLnString TkSetMark TkEndRecordTkEndRecordPart1TkEndRecordPart2TkEnd TkBclSpecial TkLowLevelBBbuffersizeoffusedmarkmark2int_loop newBuffer expandBuffer encodeBgzf fillBufferloop_bcl_special$fNullableEndo$fShowBBIsBamRecpushBam pipeSamOutput encodeBamWith writeBamFile pipeBamOutputwriteBamHandlepackBam$fIsBamRecEither$fIsBamRecBamRec$fIsBamRecBamRawCompressionStrategyDefaultStrategyFiltered HuffmanOnly MemoryLevelDefaultMemoryLevelMinMemoryLevelMaxMemoryLevel WindowBitsDefaultWindowBitsMethodDeflatedCompressionLevelDefaultCompression NoCompression BestSpeedBestCompressionFormatGZipZlibRaw GZipOrZlibDecompressParamsdecompressWindowBitsdecompressBufferSizedecompressDictionary compressLevelcompressMethodcompressWindowBitscompressMemoryLevelcompressStrategycompressBufferSizecompressDictionary ZLibExceptionNeedDictionary BufferError StreamError DataError MemoryError VersionError UnexpectedIncorrectStateZLibParamsExceptionIncorrectCompressionLevelIncorrectWindowBitsIncorrectMemoryLeveldefaultCompressParamsdefaultDecompressParams enumDeflate enumInflateenumInflateAny enumSyncFlush enumFullFlushenumBlockFlush$fExceptionZLibException$fExceptionZLibParamsException$fShowZLibException$fShowZLibParamsException$fExceptionZlibFlush$fShowZlibFlush$fEqZLibParamsException$fEqZLibException $fEqZlibFlush $fEqFormat BamEnumerateeBamrawEnumeratee isBamOrSamdecodeAnyBamOrSamdecodeAnyBamOrSamFile decodeSam decodeSam'isBam isPlainBam isBgzfBam isGzipBam decodeAnyBamdecodeAnyBamFileconcatDefaultInputs concatInputsmergeDefaultInputs mergeInputscombineCoordinates combineNames decodeBam getBamRawBamIndexminshiftdepth unaln_off extensions refseq_binsrefseq_ckpoints readBamIndex readBaiIndex readTabix eneeBamRefseqeneeBamUnaligned eneeBamSubseqeneeBamRegions subsampleBam$fShowBamIndex $fShowSegment$fShowTabFormat $fShowTabMeta QualFilter filterPairs complexSimplecomplexEntropyqualityAveragequalityMinimumqualityFromOldIlluminaqualityFromNewIllumina parseFastqparseFastqCassava parseFastq'fixupFlagAbuse fixupBwaFlags removeWartsModeGlobally HasPrefixIsPrefix myersAlign showAligned $fEnumMode AlignmentALN a_sequencea_fragment_typeNPairFragTypeCompleteLeadingTrailingSubstitutionStatsCompositionStatsDmgStats basecompo5 basecompo3substs5substs3 substs5d5 substs3d5 substs5d3 substs3d3 substs5dd substs3dd substs5cpg substs3cpg stats_moreGenDamageParameters UnknownDamage OldDamage NewDamageNewDamageParametersNDP dp_gc_fracdp_mudp_nu dp_alpha5dp_beta5dp_alphadp_beta dp_alpha3dp_beta3DamageParametersDP ssd_sigma ssd_delta ssd_lambda ssd_kappa dsd_sigma dsd_delta dsd_lambdaSubst:-> DamageModelMMat44DMat44Dbangnudge scalarMatcomplMat freezeMatsnoDamage univDamage empDamage addFragTypedamagePatternsIter2BitdamagePatternsIterMDdamagePatternsIterbwa_cal_maxdiff$fMonoidDmgStats$fFromJSONMat44D$fToJSONMat44D $fShowMat44D$fGenericMat44D $fEqSubst $fOrdSubst $fIxSubst $fShowSubst$fReadDamageParameters$fShowDamageParameters$fGenericDamageParameters$fReadNewDamageParameters$fShowNewDamageParameters$fGenericNewDamageParameters$fShowGenDamageParameters$fGenericGenDamageParameters$fReadGenDamageParameters$fShowDmgStats$fShowFragType $fEqFragTypeSkewHeap singletonunioninsertgetMindropMinGHC.PtrPtrTIMEVALFDSETc_selectcLSeekcReadfd2fds bitSize_FDSETfds2mfdunFdifExceptionToExceptionifExceptionFromExceptionenumExceptionToExceptionenumExceptionFromExceptionghc-prim GHC.ClassesOrdGHC.Num-Control.Category. GHC.TypesTrue V_Nucleotide MV_NucleotideD:R:VectorNucleotide0 V_NucleotidesMV_NucleotidesD:R:VectorNucleotides0V_QualMV_QualD:R:VectorQual0V_Prob'MV_Prob'D:R:VectorProb'0GHC.BaseString Data.StringIsString++GHC.PrimseqGHC.Listfilterzip System.IOprint Data.Tuplefstsnd otherwiseassert GHC.MagiclazyGHC.IO.Exception assertErrorinlinemapGHC.Exts groupWith$GHC.Real fromIntegral realToFracguardData.Typeable.InternalmkAppTy Data.DynamictoDynGHC.EnumBoundedminBoundmaxBoundEnumenumFrom enumFromThenenumFromThenTo enumFromTofromEnumtoEnumsuccpredEq==/= GHC.FloatFloatingpiexplogsqrt**logBasesincostanasinacosatansinhcoshtanhasinhacoshatanh Fractional fromRational/recipIntegral toIntegerquotremdivmodquotRemdivModMonadfail>>=>>return Data.DataDatagfoldlgunfoldtoConstr dataTypeOf dataCast1 dataCast2gmapTgmapQlgmapQrgmapQgmapQigmapMgmapMpgmapMoFunctorfmap<$Num*+negate fromIntegerabssignum>=minmax><<=compareGHC.ReadReadreadList readsPrecreadPrec readListPrecReal toRational RealFloat floatRadix floatDigits floatRange decodeFloat encodeFloatexponent significand scaleFloatisNaN isInfiniteisDenormalizedisNegativeZeroisIEEEatan2RealFracproperFractiontruncateroundceilingfloorGHC.ShowShow showsPrecshowshowListGHC.ArrIxindexrangeinRange rangeSizeTypeableControl.Monad.FixMonadFixmfix fromString Applicativepure<*>*><* Data.FoldableFoldablefoldlfoldl'foldl1foldrfoldr'foldr1foldfoldMapnulllengthmaximumminimumelemsumproductData.Traversable TraversablesequencemapMtraverse sequenceA GHC.GenericsGenericMonoidmemptymappendmconcatBoolFalseDoubleGHC.IntInt8Int16Int32Int64 integer-gmpGHC.Integer.TypeIntegerMaybeNothingJustOrderingLTEQGTRatioRational RealWorldIOWordWord16Word64 Data.EitherEitherLeftRightTypeRepTyConnot Data.Functor<$> Unsafe.Coerce unsafeCoerce Data.Monoid<>GHC.IO.Handle.TypesHandleGHC.STST GHC.Conc.SyncthrowToforkOnWithUnmaskforkIOWithUnmaskforkOnControl.ConcurrentforkOSThreadId GHC.UnicodeisSpaceisAlphaisDigit Text.Readread Alternativeemptymany<|>some MonadPlusmzeromplusuntangle ioException heapOverflow stackOverflowallocationLimitExceededblockedIndefinitelyOnSTMblockedIndefinitelyOnMVarunsupportedOperationsortWith Text.PrintfhPrintfprintf Data.Fixed showFixedmod'divMod'div'FixedMkFixed HasResolution resolutionE0UniE1DeciE2CentiE3MilliE6MicroE9NanoE12Pico Data.Complexphase magnitudepolarcismkPolar conjugateimagPartrealPartComplex:+ tyconModule tyconUQname isNorepType mkNoRepType mkCharConstr mkRealConstrmkIntegralConstr mkCharType mkFloatType mkIntTypemaxConstrIndex constrIndex indexConstr isAlgType readConstr showConstr constrFixity constrFieldsdataTypeConstrsmkConstr mkDataType repConstr constrRep constrType dataTypeRep dataTypeName fromConstrM fromConstrB fromConstrDataTypeConstrDataRepIntRepFloatRepAlgRepCharRepNoRep ConstrRep AlgConstr IntConstr FloatConstr CharConstrConIndexFixityPrefixInfix Data.Version makeVersion parseVersion showVersionVersion versionBranch versionTagsSystem.EnvironmentgetEnvironment withProgNamewithArgsunsetEnvsetEnv lookupEnvgetEnv getProgNamegetArgs!System.Environment.ExecutablePathgetExecutablePathSystem.TimeouttimeoutSystem.Mem.StableName eqStableNamehashStableNamemakeStableName StableName System.Mem performGCperformMajorGCperformMinorGC System.Exitdie exitSuccess exitFailureexitWith Data.Unique hashUnique newUniqueUnique Data.STRef modifySTRef' modifySTRef Data.RatioapproxRationalData.Bifunctor BifunctorfirstsecondbimapthreadWaitWriteSTMthreadWaitReadSTMthreadWaitWritethreadWaitReadrunInUnboundThreadrunInBoundThreadisCurrentThreadBoundforkOSWithUnmask forkFinallyrtsSupportsBoundThreadsControl.Concurrent.QSemN signalQSemN waitQSemNnewQSemNQSemNControl.Concurrent.QSem signalQSemwaitQSemnewQSemQSemControl.Concurrent.ChanwriteList2ChangetChanContents isEmptyChan unGetChandupChanreadChan writeChannewChanChan Debug.Trace traceMarkerIO traceMarker traceEventIO traceEvent traceStack traceShowtraceIdtrace putTraceMsgtraceIOmfilter<$!>unless replicateM_ replicateMfoldM_foldM zipWithM_zipWithM mapAndUnzipMforever<=<>=>filterMfoldMapDefault fmapDefault mapAccumR mapAccumLforMforControl.Applicativeoptional WrappedMonad WrapMonad unwrapMonad WrappedArrow WrapArrow unwrapArrowZipList getZipList Control.ArrowleftApp^<<<<^>>^^>>returnAArrowarr***&&&Kleisli runKleisli ArrowZero zeroArrow ArrowPlus<+> ArrowChoice|||+++leftright ArrowApplyapp ArrowMonad ArrowLooploop>>><<<CategoryidhPrintreadIOreadLn appendFile writeFilereadFileinteract getContentsgetLinegetCharputStrLnputStrputChar GHC.IO.HandlehCloseGHC.IO.Handle.Text hPutStrLnhPutStr GHC.Conc.IO registerDelay threadDelay closeFdWithioManagerCapabilitiesChangedensureIOManagerIsRunningGHC.Conc.Signal runHandlers setHandlerSignal HandlerFunControl.Concurrent.MVar mkWeakMVaraddMVarFinalizermodifyMVarMaskedmodifyMVarMasked_ modifyMVar modifyMVar_withMVarMaskedwithMVarswapMVarSystem.IO.Unsafe unsafeFixIOallowInterruptcatchesHandlerSystem.IO.Error catchIOErrorannotateIOError modifyIOErrorioeSetFileName ioeSetHandleioeSetLocationioeSetErrorStringioeSetErrorTypeioeGetFileName ioeGetHandleioeGetLocationioeGetErrorStringioeGetErrorTypeisUserErrorTypeisPermissionErrorTypeisIllegalOperationErrorTypeisEOFErrorTypeisFullErrorTypeisAlreadyInUseErrorTypeisDoesNotExistErrorTypeisAlreadyExistsErrorType userErrorTypepermissionErrorTypeillegalOperationErrorType eofErrorType fullErrorTypealreadyInUseErrorTypedoesNotExistErrorTypealreadyExistsErrorType isUserErrorisPermissionErrorisIllegalOperation isEOFError isFullErrorisAlreadyInUseErrorisDoesNotExistErrorisAlreadyExistsError mkIOError tryIOErrorControl.Exception.BasebracketOnErrorbracket_finallybracket onExceptiontryJusttry mapException handleJusthandle catchJustcatchPatternMatchFail RecSelError RecConError RecUpdError NoMethodError TypeErrorNonTerminationNestedAtomically readMaybe readEitherreadsCDevCInoCModeCOffCPidCSsizeCGidCNlinkCUidCCcCSpeedCTcflagCRLim LinkCountUserIDGroupID ByteCount ClockTick EpochTimeDeviceIDFileIDFileMode ProcessIDProcessGroupIDLimitgetUncaughtExceptionHandlersetUncaughtExceptionHandler reportErrorreportStackOverflow writeTVarreadTVar readTVarIO newTVarIOnewTVaralwaysalwaysSucceedscatchSTMthrowSTMorElseretry atomically unsafeIOToSTMmkWeakThreadIdthreadCapability threadStatus runSparksparpseq labelThreadyield myThreadId killThread childHandler numSparksgetNumProcessorssetNumCapabilitiesgetNumCapabilitiesnumCapabilitiesforkIOdisableAllocationLimitenableAllocationLimitgetAllocationCountersetAllocationCounter BlockReason BlockedOnMVarBlockedOnBlackHoleBlockedOnException BlockedOnSTMBlockedOnForeignCallBlockedOnOther ThreadStatus ThreadRunningThreadFinished ThreadBlocked ThreadDiedSTMTVarioErrorasyncExceptionFromExceptionasyncExceptionToExceptionBlockedIndefinitelyOnMVarBlockedIndefinitelyOnSTMDeadlockAllocationLimitExceededAssertionFailedSomeAsyncExceptionAsyncException StackOverflow HeapOverflow ThreadKilled UserInterruptArrayExceptionIndexOutOfBoundsUndefinedElementExitCode ExitSuccess ExitFailure IOErrorType AlreadyExists NoSuchThing ResourceBusyResourceExhaustedIllegalOperationPermissionDenied UserErrorUnsatisfiedConstraints SystemError ProtocolError OtherErrorInvalidArgumentInappropriateType HardwareFaultUnsupportedOperation TimeExpiredResourceVanished InterruptedData.Functor.ConstConstgetConstForeign.StorableStorablefindnotElem minimumBy maximumByallanyorand concatMapconcatmsumasum sequence_ sequenceA_forM_mapM_for_ traverse_foldlMfoldrMDualgetDualEndoappEndoAllgetAllAnygetAnySumgetSumProduct getProductFirstgetFirstLastgetLastAltgetAlt Data.IORefatomicWriteIORefatomicModifyIORef'atomicModifyIORef modifyIORef' modifyIORef mkWeakIORef GHC.IORef writeIORef readIORefnewIORefIORefGHC.IOevaluateuninterruptibleMaskuninterruptibleMask_maskmask_getMaskingState interruptiblethrowIOstToIOFilePath MaskingStateUnmaskedMaskedInterruptibleMaskedUninterruptible userError IOExceptionIOError ioe_handleioe_type ioe_locationioe_description ioe_errno ioe_filename dynTypeRepdynAppdynApply fromDynamicfromDynDynamic ErrorCallErrorCallWithLocationArithExceptionOverflow UnderflowLossOfPrecision DivideByZeroDenormalRatioZeroDenominator Data.Typeablegcast2gcast1gcasteqTcast rnfTypeRep showsTypeReptypeOf7typeOf6typeOf5typeOf4typeOf3typeOf2typeOf1typeOftypeRep typeRepArgs typeRepTyCon funResultTy splitTyConAppmkFunTy mkTyConApptypeRepFingerprintrnfTyConmkTyCon3tyConFingerprint tyConString tyConName tyConModule tyConPackage Typeable1 Typeable2 Typeable3 Typeable4 Typeable5 Typeable6 Typeable7NumericshowOctshowHex showIntAtBase showGFloatAlt showFFloatAlt showGFloat showFFloat showEFloatshowInt readSigned readFloatreadHexreadDecreadOctreadInt byteSwap64 byteSwap32 byteSwap16isRightisLeftpartitionEithersrightsleftseitherData.Type.Equality:~:Refl Data.ProxyProxy Data.OldListunwordswordsunlineslinesunfoldrsortBysort permutations subsequencestailsinitsgroupBygroupdeleteFirstsByunzip7unzip6unzip5unzip4zipWith7zipWith6zipWith5zipWith4zip7zip6zip5zip4genericReplicate genericIndexgenericSplitAt genericDrop genericTake genericLengthinsertBy partition transpose intercalate intersperse intersectBy intersectunionBy\\deleteBydeletenubBynub isInfixOf isSuffixOf isPrefixOf findIndices findIndex elemIndices elemIndex stripPrefix dropWhileEndData.Ord comparingDown Data.Char isSeparatorisNumberisMarkisLetter digitToInt lexDigits readLitChar lexLitCharlex readParenText.ParserCombinators.ReadPrec readS_to_Prec readPrec_to_S readP_to_Prec readPrec_to_PReadPrecText.ParserCombinators.ReadP readS_to_P readP_to_SReadSReadPfromRat floatToDigits showFloat Data.BitstoIntegralSizedpopCountDefaulttestBitDefault bitDefaultBitsshiftRshiftL.&..|.xor complementshiftrotatezeroBitsbitsetBitclearBit complementBittestBit bitSizeMaybebitSizeisSigned unsafeShiftL unsafeShiftRrotateLrotateRpopCount FiniteBits finiteBitSizecountLeadingZeroscountTrailingZeros Data.FunctiononfixvoidtoTitletoUppertoLowerisLowerisUpperisPrint isControl isAlphaNumisSymbol isPunctuation isHexDigit isOctDigit isAsciiUpper isAsciiLowerisLatin1isAsciigeneralCategoryGeneralCategoryControlUppercaseLetterLowercaseLetterTitlecaseLetterModifierLetter OtherLetterNonSpacingMarkSpacingCombiningMark EnclosingMark DecimalNumber LetterNumber OtherNumberConnectorPunctuationDashPunctuationOpenPunctuationClosePunctuation InitialQuote FinalQuoteOtherPunctuation MathSymbolCurrencySymbolModifierSymbol OtherSymbolSpace LineSeparatorParagraphSeparator Surrogate PrivateUse NotAssigned GHC.STRef writeSTRef readSTRefnewSTRefSTRefrunSTfixSTlcmgcd^^^oddeven showSigned denominator numerator%GHC.Charchr intToDigit showLitChar showParen showStringshowCharshowsShowSunzip3unzipzipWith3zipWithzip3!!lookupreversebreakspansplitAtdroptake dropWhile takeWhilecycle replicaterepeatiteratescanr1scanrscanl'scanl1scanlfoldl1'initlasttailhead Data.MaybemapMaybe catMaybes listToMaybe maybeToList fromMaybefromJust isNothingisJustmaybeswapuncurrycurrysubtractGHC.MVar isEmptyMVar tryReadMVar tryPutMVar tryTakeMVarputMVarreadMVartakeMVarnewMVar newEmptyMVarMVar GHC.IO.UnsafeunsafeInterleaveIOunsafeDupablePerformIOunsafePerformIOasTypeOfuntil$!flipconstordapliftM5liftM4liftM3liftM2liftMwhen=<<liftA3liftA2liftA<**>GHC.Err undefinederrorWithoutStackTraceerror SomeException&&||+base-prelude-1.0.1.1-50PByGWQp6O3J1SglYyZmP BasePreludesortOnunconsisSubsequenceOf&$> traceShowMtraceM traceShowIdbool unix-2.7.2.0System.Posix.IO createFileopenFdSystem.Posix.Files getPathVar setFileSizetouchSymbolicLink touchFilesetSymbolicLinkTimesHiRessetFileTimesHiRes setFileTimessetSymbolicLinkOwnerAndGroupsetOwnerAndGrouprenamereadSymbolicLinkcreateSymbolicLink removeLink createLink createDevicecreateNamedPipegetSymbolicLinkStatus getFileStatus fileExist fileAccess setFileModeSystem.Posix.Files.Common getFdPathVar setFdSizesetFdOwnerAndGrouptouchFdsetFdTimesHiRes getFdStatusisSocketisSymbolicLink isDirectory isRegularFile isNamedPipeisCharacterDevice isBlockDevicestatusChangeTimeHiResmodificationTimeHiResaccessTimeHiResstatusChangeTimemodificationTime accessTimefileSizespecialDeviceID fileGroup fileOwner linkCountfileModefileIDdeviceIDsetFileCreationMask setFdMode socketModesymbolicLinkMode directoryModeregularFileMode namedPipeModecharacterSpecialModeblockSpecialMode fileTypeModesintersectFileModesunionFileModes accessModes otherModes groupModes ownerModes stdFileModesetGroupIDMode setUserIDModeotherExecuteModeotherWriteMode otherReadModegroupExecuteModegroupWriteMode groupReadModeownerExecuteModeownerWriteMode ownerReadMode nullFileMode FileStatusPathVar FileSizeBits LinkLimitInputLineLimitInputQueueLimit FileNameLimit PathNameLimitPipeBufferLimitSymbolicLinkLimitSetOwnerAndGroupIsRestrictedFileNamesAreNotTruncated VDisableCharAsyncIOAvailablePrioIOAvailableSyncIOAvailableSystem.Posix.IO.Common fdWriteBuffdWrite fdReadBuffdRead waitToSetLocksetLockgetLockfdSeek setFdOption queryFdOption handleToFd fdToHandlecloseFddefaultFileFlagsstdError stdOutputstdInputdupTodup createPipeOpenModeReadOnly WriteOnly ReadWrite OpenFileFlagsappend exclusivenocttynonBlocktruncFdOption AppendOnWrite CloseOnExecNonBlockingReadSynchronousWrites LockRequestReadLock WriteLockUnlockFileLock bindIterateeNotAnException excDivergent endianReadNword16'word16word32'word32word64'word64makefdCallback enumFile'makeHandleCallback terminatorsrepM mkBlockIndexbad_seqbinary-0.8.3.0Data.Binary.Get.InternalGet&vector-0.11.0.0-6uB77qGCxR6GPLxI2sqsX3Data.Vector.Generic.BaseVectorMVector_Nucs_halfmemsetunpackExtensionsdo_rmdup merge_singles merge_halvesRepresentative Consensus split_ecigMdFailMPosDecision fromDecisioncollapse originals b_mate_posb_totally_alignedaccumMap do_collapse minViewBy mk_new_md' consensusdo_cheap_collapse replaceXPoplusmask_allget_bgzf_header decompress1 compress1 bgzfBlocksZStreamc_crc32 c_inflateEnd c_deflateEnd c_inflate c_deflatec_inflateInit2_c_deflateInit2_get_bgzf_blockz_checkc_deflateInit2c_inflateInit2qual_loop_asc_rev qual_loop_asc qual_loopnuc_loop_asc_rev nuc_loop_ascnuc_loopcompressChunk'encodeSamEntry pushBamRaw pushBamRec ZlibFlush SyncFlush FullFlush EnumerateeSFlushing FinishingInvalidFullOutEmptyInInitialinflateSetDictionarydeflateSetDictionary inflateEnd deflateEnddeflateinflate inflateInit2_ deflateInit2_ fromFlush withZStream fromMethodfromCompressionLevelfromWindowBitsfromMemoryLevelfromCompressionStrategy fromErrno convParamwithByteString mkByteString putOutBuffer putInBuffer pullOutBuffer pullInBuffereneeErr insertOutfillswapOutdoRunflushfinish deflateInit2 inflateInit2deflate'inflate' mkCompress mkDecompress isEmptyBam decodeSamLoop parseSamRecSegmentCkpointsBins~~getSegmentArrayeneeBamSegment TabFormat SamFormat VcfFormat ZeroBasedTabMetaformatcol_seqcol_begcol_end comment_char skip_linesnamesTabIndexSegments segmentLists rgnToSegmentsbinList getIntervalsgetIndexArraysreduceMloopMlookupLEcountskipJunk makeRecord myers_diff genSubstMat aln_from_ref aln_from_md revcom_both