sw      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvNone8How many taxa should we expect in the incoming dataset? ?Explicitly ignore this setting in favor of comparing all trees I (even if some are missing taxa). This only works with certain modes. WIn the future we may automatically pick a behavior. Now this one is usually an error. "Supplied by the user. Committed. *Supported modes for computing RFDistance. WDue to the number of configuration options for the driver, we pack them into a record. .Branches less than this length are collapsed. YBootStrap values less than this result in the intermediate node being collapsed. KA common type of tree contains the standard decorator and also a table for * restoring the human-readable node names. &/The standard decoration includes everything in - plus  some extra cached data:  branch length from parent to this node $ (2) bootstrap values for the node  subtree weights for future use E (defined as number of LEAVES, not counting intermediate nodes) 3 (4) sorted lists of labels for symmetry breaking ,OAdditionally includes some scratch data that is used by the binning algorithm. -GThe barebones default decorator for NewickTrees contains BOOTSTRAP and G BRANCHLENGTH. The bootstrap values, if present, will range in [0..100] .'Map labels back onto meaningful names. /ZLabels are inexpensive unique integers. The table is necessary for converting them back. 0FEven though the Newick format allows it, here we ignore interior node ) labels. (They are not commonly used.) 'Note that these trees are rooted. The  normalize function ensures that a 6 single, canonical rooted representation is chosen. 36Display a tree WITH the bootstrap and branch lengths. ! This prints in NEWICK format. 4LThe same, except with no bootstrap or branch lengths. Any tree annotations  ignored. 6"The default phybin configuration. 7DHow many nodes (leaves and interior) are contained in a NewickTree? 8<This counts only leaf nodes, which should include all taxa. <<Average branch length across all branches in all all trees. =?Retrieve all the bootstraps values actually present in a tree. >=Apply a function to all the *labels* (leaf names) in a tree. ?-Return all the labels contained in the tree. @BThis function allows one to collapse multiple trees while looking  only at the horizontal slice$ of all the annotations *at a given  position* in the tree.  Isomorphic8 must apply both to the shape and the name labels or it % is an error to apply this function. M  !"#$%&'()*+,-./012w345678x9:;<=>?@yz{|}~A  !"#$%&'()*+,-./0123456789:;<=>?@A021-&'()*+, !"#347859:;>?@<= 6 /.$%'   !"#$%&'()*+,-./021w345678x9:;<=>?@yz{|}~None (Hack: we store the names in the leaves. A?Parse a bytestring into a NewickTree with branch lengths. The C first argument is file from which the data came and is just for  better error messages. SIf the single bytestring contains more than one tree, then a number is appended to  the tree names. BZParse a list of trees, starting with an empty map of labels and accumulating a final map. C6A version which takes in-memory trees as ByteStrings. DThis is used to post-facto splice metadata into the data structure. DIThis parser ASSUMES that whitespace has been prefiltered from the input.  Parse a normal, decimal number. 'Parse a number in scientific notation. JNames are a mess... they contain all kinds of garbage sometimes it seems. j Thus we are very permissive. We allow anything that is not something we specifically need to reserve. ABCDEABCDEDACBEABCDENoneFGPrune the leaves of the tree to only those leaves in the provided set. FIf ALL leaves are pruned from the set, this function returns nothing. GDRemoves branches that do not meet a predicate, leaving a shallower, bushier R tree. This does NOT change the set of leaves (taxa), it only removes interior  nodes. `$collapseBranches pred collapser tree` uses  to test the meta-data to see T if collapsing the intermediate node below the branch is necessary, and if it is,  it uses  collapser> to reduce all the metadata for the collapsed branches into a  single piece of metadata. HHA common configuration. Collapse branches based on a length threshold. IFA common configuration. Collapse branches based on bootstrap values. FGHIFGHIGHIFFGHINone K-Dense sets of taxa, aka Bipartitions or BiPs a We assume that taxa labels have been mapped onto a dense, contiguous range of integers [0,N). ARule: Bipartitions are really two disjoint sets. But as long as 4 the parent set (the union of the partitions, aka all taxa) then a bipartition Q can be represented just by *one* subset. Yet we must choose WHICH subset for N consistency. We use the rule that we always choose the SMALLER. Thus the O DenseLabelSet should always be half the size or less, compared to the total  number of taxa. DA set that is more than a majority of the taxa can be normalized by flipping, 2 i.e. taking the taxa that are NOT in that set. R%Print a BiPartition in a pretty form S6Assume that total taxa are 0..N-1, invert membership: T:Returns a triangular distance matrix encoded as a vector. = Also return the set-of-BIPs representation for each tree. :This uses a naive method, directly computing the pairwise ( distance between each pair of trees. 2This method is TOLERANT of differences in the laba/taxa sets between two trees. N It simply prunes to the intersection before doing the distance comparison. S Other scoring methods may be added in the future. (For example, penalizing for  missing taxa.) PThe number of bipartitions implied by a tree is one per EDGE in the tree. Thus T each interior node carries a list of BiPs the same length as its list of children. V.Get all non-singleton BiPs implied by a tree. WOThis version slices the problem a different way. A single pass over the trees T populates the table of bipartitions. Then the table can be processed (locally) to : produce (non-localized) increments to a distance matrix. Y9Which of a set of trees are compatible with a consensus? Z`compatibleWith consensus tree`+ -- Is a tree compatible with a consensus? E This is more efficient if partially applied then used repeatedly. ?Note, tree compatibility is not the same as an exact match. It's  like (<=) rather than (==). The  star topology is consistent with the @ all trees, because it induces the empty set of bipartitions. GConsensus between two trees, which may even have different label maps. [<Take only the bipartitions that are agreed on by all trees. \1Convert from bipartitions BACK to a single tree. JKLMNOPQRSTUVWXYZ[\JKLMNOPQRSTUVWXYZ[\KJVUR[\YZNMPOQSLTWXJKLMNOPQRSTUVWXYZ[\None QWhen we first convert to a graph representation, there is a bunch of information  hanging off of each node. ]AConvert to a dotGraph. Some duplicated code with dotNewickTree. ^RCreate a graph using TreeNames for node labels and edit-distance for edge labels. (The plot looks nicer when the names aren'$t bloated with repeated stuff. This W replaces all tree names with potentially shorter names by removing the common prefix. + Returns how many characters were dropped. _$Open a GUI window to displaya tree. 'Fork a thread that then runs graphviz. = The channel retuned will carry a single message to signal ! completion of the subprocess. `Convert a .dot file to .pdf. a=Convert a NewickTree to a graphviz Dot graph representation. 1Some arbitrarily chosen colors from the X11 set: bThis version shows the ordered/)rooted structure of the normalized tree. "Common prefix of a list of lists. ]^_`ab]^_`aba`_b^]]^_`abNone c=Index the results of binning by topology-only stripped trees ( that have their decorations removed. d>Ignore metadata (but keep weights) for the purpose of binning eQWhen binning, the members of a OneCluster are isomorphic trees. When clustering B based on robinson-foulds distance they are merely similar trees. h-A version lifted to operate over full trees. iThis is it, here'7s the routine that transforms a tree into normal form. + This relies HEAVILY on lazy evaluation. jThe binning function. D Takes labeled trees, classifies labels into equivalence classes. 8This version accepts trees that are already normalized: kAFor binning. Remove branch lengths and labels but leave weights. l*Add the metadata that is used for binning m-Take the extra annotations away. Inverse of l. cdefghijklmno cdefghijklmno jihlmefgcdnokcdefghijklmnoNoneq3Expand out directories to find all the tree files. r)Step carefully in case of cycles (argh). pqrspqrspqrspqrsNone>A dendrogram PLUS consensus trees at the intermediate points. t>Driver to put all the pieces together (parse, normalize, bin) NConvert a flat list of clusters into a map from individual trees to clusters. PMap each tree NAME onto the one-based index (in sorted order) of the cluster it  comes from. 7Turn a hierarchical clustering into a flat clustering. uDParse extra trees in addition to the main inputs (for --highlight). vPCreate a predicate that tests trees for consistency with the set of --highlight  (consensus) trees. =Note, tree consistency is not the same as an exact match. It's  like (<8=) rather than (==). All trees are consistent with the   star topology. tuv ijlmoqtuv tjiloqmuvtuv      !"#$%&'(()*+,-../0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijkllmnopqrstLuvwxyz{|}~ phybin-0.3Bio.Phylogeny.PhyBin.CoreTypesBio.Phylogeny.PhyBin.Parser!Bio.Phylogeny.PhyBin.PreProcessorBio.Phylogeny.PhyBin.RFDistanceBio.Phylogeny.PhyBin.VisualizeBio.Phylogeny.PhyBin.BinningBio.Phylogeny.PhyBin.UtilBio.Phylogeny.PhyBin ClustMode ClusterThemlinkageBinThemNumTaxaVariableUnknownExpected WhichRFMode TolerantNaiveHashRF PhyBinConfigPBCverbosenum_taxa name_hack output_dirinputsdo_graphdo_draw clust_mode highlightsshow_trees_in_dendroshow_interior_consensusrfmodepreprune_labelsprint_rfmatrix dist_threshbranch_collapse_threshbootstrap_collapse_threshTreeNameFullTreetreename labelTablenwtree HasBranchLen getBranchLen StandardDecor branchLen bootStrap subtreeWeight sortedLabels AnnotatedTreeDefDecor LabelTableLabel NewickTree NTInteriorNTLeafdisplayDefaultTreedisplayStrippedTreeliftFTdefault_phybin_configtreeSize numLeavesget_decset_dec get_children avg_branchlenget_bootstraps map_labels all_labelsfoldIsomorphicTrees parseNewickparseNewickFiles parseNewicks newick_parser unitTestspruneTreeLeavescollapseBranchescollapseBranchLenThreshcollapseBranchBootStrapThreshDistanceMatrix DenseLabelSet markLabel mkEmptyDense mkSingleDense denseUnionsbipSize denseDiffdispBip invertDensenaiveDistMatrixfoldBipsallBipshashRF printDistMatfilterCompatiblecompatibleWith consensusTree bipsToTree dotDendrogramdendrogramToGraphviewNewickTreedotToPDF dotNewickTreedotNewickTree_debug BinResults StrippedTree OneCluster clustMembers normalizeFT normalizebinthemanonymize_annotatedannotateWLabLists deAnnotate get_weightis_regular_fileacquireTreeFilessafePrintDendro sanityCheckdriverretrieveHighlightsmatchAnyHighlight BranchLen map_but_last$fPrettyStandardDecor$fHasBranchLen(,)$fHasBranchLenStandardDecor $fPrettyMap$fPrettyFullTree$fPrettyNewickTree$fFunctorFullTree$fFoldableFullTree$fFoldableNewickTree$fFunctorNewickTreeTempTreetagnumber sciNotationnameNameHackrunBextractLabelTablesubtree defaultMetaleafinternal branchsetbranch branchMetadattre1run errortestbaseGHC.Enumpred labelBipsconsensusTreeFull denseIsSubsettraverseDense_normBipfor_ $fPrettySetNdLabel truncateNamesdefaultPaletteV commonPrefixuidtre clumpSize consensus DendroGraphUniqueNodeNametoGraphtoGraph2 default_cmd myShowFloatdefaultTimeoutdefaultPalette altPalette getEdgeAttrsweighted prettyPrint'binthem_normeddebugcompare_childtrees verify_sortedttnorm4norm5all_edge_weightsmergedemerge maybeInsertget_label_list add_weightsubtract_weighttre1drawtre1dotnormnorm2 is_directory file_exists DendroPlus clustsToMapclustsToNameMap sliceDendro TempDecorDPBranchDPLeafAsyncasyncwait filePrefixdoBins doCluster reportClusts flattenDendrooutputClusters outputBins mapAccumMfst3snd3thd3 avg_treesavg