- type DenseLabelSet = IntSet
- type DistanceMatrix = Vector (Vector Int)
- allBips :: NewickTree a -> Set DenseLabelSet
- foldBips :: Monoid m => (DenseLabelSet -> m) -> NewickTree a -> m
- dispBip :: LabelTable -> DenseLabelSet -> String
- consensusTree :: Int -> [NewickTree a] -> NewickTree ()
- bipsToTree :: Int -> Set DenseLabelSet -> NewickTree ()
- filterCompatible :: NewickTree a -> [NewickTree b] -> [NewickTree b]
- compatibleWith :: NewickTree a -> NewickTree b -> Bool
- mkSingleDense :: Int -> Label -> DenseLabelSet
- mkEmptyDense :: Int -> DenseLabelSet
- bipSize :: DenseLabelSet -> Int
- denseUnions :: Int -> [DenseLabelSet] -> DenseLabelSet
- denseDiff :: IntSet -> IntSet -> IntSet
- invertDense :: Int -> DenseLabelSet -> DenseLabelSet
- markLabel :: Label -> DenseLabelSet -> DenseLabelSet
- naiveDistMatrix :: [NewickTree DefDecor] -> (DistanceMatrix, Vector (Set DenseLabelSet))
- hashRF :: Int -> [NewickTree a] -> DistanceMatrix
- printDistMat :: Handle -> Vector (Vector Int) -> IO ()
Dense sets of taxa, aka Bipartitions or BiPs We assume that taxa labels have been mapped onto a dense, contiguous range of integers [0,N).
Rule: Bipartitions are really two disjoint sets. But as long as the parent set (the union of the partitions, aka all taxa) then a bipartition can be represented just by *one* subset. Yet we must choose WHICH subset for consistency. We use the rule that we always choose the SMALLER. Thus the DenseLabelSet should always be half the size or less, compared to the total number of taxa.
A set that is more than a majority of the taxa can be normalized by flipping, i.e. taking the taxa that are NOT in that set.
Bipartition (Bip) utilities
Take only the bipartitions that are agreed on by all trees.
Convert from bipartitions BACK to a single tree.
Which of a set of trees are compatible with a consensus?
`compatibleWith consensus tree` -- Is a tree compatible with a consensus? This is more efficient if partially applied then used repeatedly.
Note, tree compatibility is not the same as an exact match. It's like (<=) rather than (==). The star topology is consistent with the all trees, because it induces the empty set of bipartitions.
ADT for dense sets
Assume that total taxa are 0..N-1, invert membership:
Methods for computing distance matrices
Returns a triangular distance matrix encoded as a vector. Also return the set-of-BIPs representation for each tree.
This uses a naive method, directly computing the pairwise distance between each pair of trees.
This method is TOLERANT of differences in the laba/taxa sets between two trees. It simply prunes to the intersection before doing the distance comparison. Other scoring methods may be added in the future. (For example, penalizing for missing taxa.)
This version slices the problem a different way. A single pass over the trees populates the table of bipartitions. Then the table can be processed (locally) to produce (non-localized) increments to a distance matrix.