Îõ³h*  #  ‚Æ                    	  
                                               !  "  #  $  %  &  '  (  )  *  +  ,  -  .  /  0  1  2  3  4  5  6  7  8  9  :  ;  <  =  >  ?  @  A  B  C  D  E  1.0         Safe-Inferred "#%&  i twobitreaderÛ This is a (piece of a) reference sequence.  It consists of
 stretches with uniform masking.The offset is stored as a  FÔ.  This is done because on a 32 bit
 platform, every bit counts.  This limits the genome to approximately
 four gigabases, which would be a file of about one gigabyte.  That's
 just about enough to work with the human genome.  On a 64 bit
 platform, the file format itself imposes a limit of four gigabytes,
 or about 16 gigabases in total.èIf length is zero, the piece is empty and the mask, pointer, and
 offset fields may not be valid.  If length is positive, ptr+offset
 points at the first base of the piece.  If length is negative,
 ptr+offset points just past the end of the piece, ptr+offset+length
 points to the first base of the piece, and the sequence in meant to
 be reverse complemented.In a   &, length must not be negative.  In a
 TwoBitSequence' Bidirectional%, length can be positive or negative. twobitreaderÎ 2bit supports two kinds of masking, typically rendered as lowercase
 letters (MaskSoft
) and Ns (MaskHard).  They can overlap
 (MaskBothÒ ), and even the hard masking has underlying sequence
 (which is normally ignored). twobitreaderÝ Lazily generated sequence in forward direction; the argument is the offset of the first base. twobitreaderÊLazily generated sequence in reverse direction; the argument is the offset of the first base to the
 right of the beginning.  (The first base generated is the complement of the base found at (offset-1). twobitreaderÖFinds a named scaffold in the reference.  If it doesn't find the
 exact name, it will try to compensate for the crazy naming
 differences between NCBI and UCSC.  This doesn't work in general, but
 is good enough in the common case.  In particular, "1" maps to "chr1"
 and back, "GL000192.1" to "chr1_gl000192_random" and back, and "chrM"
 to MT 
 and back. twobitreader´Brings a 2bit file into memory.  The file is mmap'ed, so it will
 not work on streams that are not actual files.  It's also unsafe if
 the file is concurrently modified in any way.G twobitreaderParses a 2bit file.  The FilePathø  argument is only used in error
 messages, what is really parsed is the memory block, typically from
 mmapping the file.1The workhorse in here is the construction of the   and
  Ö functions.  When called, they first run a binary search
 on the mask lists, then produce a list of blocks with uniform
 masking.  Both parts of the algorithm are fast and directly use the
 on-disk data structures.µIn theory, there could be 2bit files in big endian format out there.
 We nominally support them, but since I've never seen one in the wild,
 this may well fail in a spectacular way.! twobitreader–Unpacks a reference sequence into a (very long) list of bytes.
 Each byte contains the nucleotide in bits 0 and 1 with valjues 0..3
 corresponding to TCAG Á , and the soft and hard mask bits in bits 2
 and 3, respectively." twobitreaderö Unpacks a reference sequence into a (very long) list of ASCII
 characters.  Hard masked nucleotides become the letter N, others
 become TCAG .# twobitreaderçUnpacks a reference sequence into a list of ASCII characters,
 interpreting masking in the customary way.  Specifically, hard
 masking produces Ns, soft masking produces lower case letters, and
 dual masking produces lower case Ns.H twobitreaderì Reads a 32 bit word from an address, which doesn't need to be
 aligned.  The byte order used is unspecified.I twobitreader6Equivalent to peekUnalnWord32 followed by a byte swap.  twobitreaderhow is it masked? twobitreader0primitive bases in 2bit encoding:  [0..3] = TCAG twobitreaderoffset in bases(!) twobitreaderlength in bases$	
 !"# $	
 !"#            Safe-Inferred "#%&  `	J twobitreaderô A way to accumulate bytes.  If the accumulated bytes will hang
 around in memory, this has much lower overhead than Builder.  If it
 has short lifetime, Builder is much more convenient.K twobitreaderŸA cDNA or mRNA or transcript (these are all synonymous), with some
 metainformation collected from the annotation.  Whatever the input
 was called, we call it cdna in the transciptome.? twobitreaderÉExtracts the reference from a VCF.  This assumes the presence of at
 least one record per site.  The VCF must be sorted by position.  When
 writing out, we try to match the order of the contigs as listed in
 the header.  Unlisted contigs follow at the end with their order
 preserved; contigs without data are not written at all.L twobitreader!Appends bytes to a collection of  M in such a way that the
  MÞ  keep doubling in size.  This ensures O(n) time and space
 complexity and fairly low overhead.N twobitreaderƒCollects stretches of Ns by looking at one character at a time.  In
 reality, anything that isn't one of "ACGT" is treated as an N.O twobitreaderï Collects stretches of masked dna by looking at one letter at a
 time.  Anything lowercase is considered masked.P twobitreaderß Collects bases in 2bit format.  It accumulates 4 bases in one word,
 then collects bytes in an  J.  From the 2bit spec:ºpackedDna - the DNA packed to two bits per base, represented as
             so: T - 00, C - 01, A - 10, G - 11. The first base is
             in the most significant 2-bit byte; the last base is
             in the least significant 2 bits. For example, the
             sequence TCAG is represented as 00011011.@ twobitreader½Parses annotations in GFF format.  We want to turn an annotation
 and a 2bit file into a FastA of the transcriptome (one sequence per
 annotated transcript), that looks like the stuff Lior Pachter feeds
 into Kallisto.  Annotations come in two dialects of GFF, either GFF3
 or GTF.  We autodetect and understand both.Q twobitreaderâParses the random stuff in GFF into a hash table.  Returns 'Just
 (Left _)' if the file uses assignment style ("foo=bar;"), returns
 'Just (Right _)' if the file uses statement style ("foo "bar";"),
 otherwise returns Nothing.R  twobitreaderlength twobitreaderlist of N stretches twobitreaderlist of mask stretches twobitreaderaccumulated bases23456789:<>;@=?23456789:<>;@=?  Ó             	  
  
                                                                  !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /   0   1   2   3   4  5  5  6   7   8   9   :   ;   <   =   >   ?   @   A   B   C   D   E   F   G HIJ   K   L   M  N  O   P  Q   R   S   T   U   V× 'twobitreader-1.0-2XkAlGA7Nho5GM6BESlxI8
Bio.TwoBitBio.TwoBit.TooltwobitreaderTwoBitSequenceBidirectionalUnidrectionalTwoBitSequence'SomeSeqRefEndMaskingTwoBitChromosomeTBCtbc_rawtbc_name	tbc_indextbc_dna_offsettbc_dna_sizetbc_fwd_seqtbc_rev_seq
TwoBitFileTBFtbf_rawtbf_sizetbf_path
tbf_chroms
tbf_chrmaptbf_chrnames	findChrom
openTwoBitisSoftMaskedisHardMasked
noneMasked
softMasked
hardMasked
bothMaskedunpackRSRawunpackRSunpackRSMasked$fExceptionTwoBitError$fBoundedMasking$fEnumMasking$fMonoidMasking$fSemigroupMasking$fReadMasking$fShowMasking$fShowTwoBitSequence'$fEqMasking$fOrdMasking$fShowBlock	$fEqBlock
$fOrdBlock$fShowTwoBitErrorEncodeProgressEncoded
ep_seqnameep_positionep_hardmaskedep_softmaskedep_enclengthep_tail
formatCdna
buildFasta
twoBitToFa
faToTwoBitvcfToTwoBit	parseAnno$fExceptionGffError$fShowGffError$fShowGffErrorDetail
$fShowCdna$fShowRangeghc-prim	GHC.TypesWordparseTwoBitpeekUnalnWord32peekUnalnWord32SwapAccuCdnagrowBytes
collect_Ns
collect_mscollect_bases
parseStuff
encode_seq