phybin-0.3: Utility for clustering phylogenetic trees in Newick format based on Robinson-Foulds distance.

Tree and tree decoration types

data NewickTree a Source

Even though the Newick format allows it, here we ignore interior node labels. (They are not commonly used.)

Note that these trees are rooted. The normalize function ensures that a single, canonical rooted representation is chosen.


NTLeaf a !Label 
NTInterior a [NewickTree a] 


type DefDecor = (Maybe Int, BranchLen)Source

The barebones default decorator for NewickTrees contains BOOTSTRAP and BRANCHLENGTH. The bootstrap values, if present, will range in [0..100]

data StandardDecor Source

The standard decoration includes everything in DefDecor plus some extra cached data:

  1. branch length from parent to this node (2) bootstrap values for the node
  2. subtree weights for future use (defined as number of LEAVES, not counting intermediate nodes) (4) sorted lists of labels for symmetry breaking




branchLen :: BranchLen
bootStrap :: Maybe Int
subtreeWeight :: Int
sortedLabels :: [Label]

type AnnotatedTree = NewickTree StandardDecorSource

Additionally includes some scratch data that is used by the binning algorithm.

data FullTree a Source

A common type of tree contains the standard decorator and also a table for restoring the human-readable node names.




data ClustMode Source




linkage :: Linkage

data NumTaxa Source

How many taxa should we expect in the incoming dataset?


Expected Int

Supplied by the user. Committed.


In the future we may automatically pick a behavior. Now this one is usually an error.


Explicitly ignore this setting in favor of comparing all trees (even if some are missing taxa). This only works with certain modes.

Tree operations

displayDefaultTree :: FullTree DefDecor -> DocSource

Display a tree WITH the bootstrap and branch lengths. This prints in NEWICK format.

displayStrippedTree :: FullTree a -> DocSource

The same, except with no bootstrap or branch lengths. Any tree annotations ignored.

treeSize :: NewickTree a -> IntSource

How many nodes (leaves and interior) are contained in a NewickTree?

numLeaves :: NewickTree a -> IntSource

This counts only leaf nodes, which should include all taxa.

map_labels :: (Label -> Label) -> NewickTree a -> NewickTree aSource

Apply a function to all the *labels* (leaf names) in a tree.

all_labels :: NewickTree t -> [Label]Source

Return all the labels contained in the tree.

foldIsomorphicTrees :: ([a] -> b) -> [NewickTree a] -> NewickTree bSource

This function allows one to collapse multiple trees while looking only at the horizontal slice of all the annotations *at a given position* in the tree.

Isomorphic must apply both to the shape and the name labels or it is an error to apply this function.

Utilities specific to StandardDecor:

avg_branchlen :: HasBranchLen a => [NewickTree a] -> DoubleSource

Average branch length across all branches in all all trees.

get_bootstraps :: NewickTree StandardDecor -> [Int]Source

Retrieve all the bootstraps values actually present in a tree.

Command line config options

data PhyBinConfig Source

Due to the number of configuration options for the driver, we pack them into a record.




verbose :: Bool
num_taxa :: NumTaxa
name_hack :: String -> String
output_dir :: String
inputs :: [String]
do_graph :: Bool
do_draw :: Bool
clust_mode :: ClustMode
highlights :: [FilePath]
show_trees_in_dendro :: Bool
show_interior_consensus :: Bool
rfmode :: WhichRFMode
preprune_labels :: Maybe [String]
print_rfmatrix :: Bool
dist_thresh :: Maybe Int
branch_collapse_thresh :: Maybe Double

Branches less than this length are collapsed.

bootstrap_collapse_thresh :: Maybe Int

BootStrap values less than this result in the intermediate node being collapsed.

default_phybin_config :: PhyBinConfigSource

The default phybin configuration.

data WhichRFMode Source

Supported modes for computing RFDistance.



General helpers

type Label = IntSource

Labels are inexpensive unique integers. The table is necessary for converting them back.

type LabelTable = Map Label StringSource

Map labels back onto meaningful names.

Experimenting with abstracting decoration operations

class HasBranchLen a whereSource


getBranchLen :: a -> BranchLenSource