Safe Haskell	None

Bio.Phylogeny.PhyBin.RFDistance

Contents

Types
Bipartition (Bip) utilities
ADT for dense sets
Methods for computing distance matrices
Output

Synopsis

Types

type DenseLabelSet = IntSet Source

Dense sets of taxa, aka Bipartitions or BiPs We assume that taxa labels have been mapped onto a dense, contiguous range of integers [0,N).

Rule: Bipartitions are really two disjoint sets. But as long as the parent set (the union of the partitions, aka all taxa) then a bipartition can be represented just by *one* subset. Yet we must choose WHICH subset for consistency. We use the rule that we always choose the SMALLER. Thus the DenseLabelSet should always be half the size or less, compared to the total number of taxa.

A set that is more than a majority of the taxa can be normalized by flipping, i.e. taking the taxa that are NOT in that set.

type DistanceMatrix = Vector (Vector Int)Source

Bipartition (Bip) utilities

allBips :: NewickTree a -> Set DenseLabelSet Source

Get all non-singleton BiPs implied by a tree.

foldBips :: Monoid m => (DenseLabelSet -> m) -> NewickTree a -> mSource

dispBip :: LabelTable -> DenseLabelSet -> String Source

Print a BiPartition in a pretty form

consensusTree :: Int -> [NewickTree a] -> NewickTree ()Source

Take only the bipartitions that are agreed on by all trees.

bipsToTree :: Int -> Set DenseLabelSet -> NewickTree ()Source

Convert from bipartitions BACK to a single tree.

filterCompatible :: NewickTree a -> [NewickTree b] -> [NewickTree b]Source

Which of a set of trees are compatible with a consensus?

compatibleWith :: NewickTree a -> NewickTree b -> Bool Source

`compatibleWith consensus tree` -- Is a tree compatible with a consensus? This is more efficient if partially applied then used repeatedly.

Note, tree compatibility is not the same as an exact match. It's like (<=) rather than (==). The star topology is consistent with the all trees, because it induces the empty set of bipartitions.

ADT for dense sets

mkSingleDense :: Int -> Label -> DenseLabelSet Source

mkEmptyDense :: Int -> DenseLabelSet Source

bipSize :: DenseLabelSet -> Int Source

denseUnions :: Int -> [DenseLabelSet] -> DenseLabelSet Source

denseDiff :: IntSet -> IntSet -> IntSet Source

invertDense :: Int -> DenseLabelSet -> DenseLabelSet Source

Assume that total taxa are 0..N-1, invert membership:

markLabel :: Label -> DenseLabelSet -> DenseLabelSet Source

Methods for computing distance matrices

naiveDistMatrix :: [NewickTree DefDecor] -> (DistanceMatrix, Vector (Set DenseLabelSet))Source

Returns a triangular distance matrix encoded as a vector. Also return the set-of-BIPs representation for each tree.

This uses a naive method, directly computing the pairwise distance between each pair of trees.

This method is TOLERANT of differences in the laba/taxa sets between two trees. It simply prunes to the intersection before doing the distance comparison. Other scoring methods may be added in the future. (For example, penalizing for missing taxa.)

hashRF :: Int -> [NewickTree a] -> DistanceMatrix Source

This version slices the problem a different way. A single pass over the trees populates the table of bipartitions. Then the table can be processed (locally) to produce (non-localized) increments to a distance matrix.

Output

printDistMat :: Handle -> Vector (Vector Int) -> IO ()Source