hierarchical-clustering-0.3.0.1: Algorithms for single, average/UPGMA and complete linkage clustering.

Data.Clustering.Hierarchical

Synopsis

# Dendrogram data type

data Dendrogram d a Source

Data structure for storing hierarchical clusters. The distance between clusters is stored on the branches. Distances between leafs are the distances between the elements on those leafs, while distances between branches are defined by the linkage used (see `Linkage`).

Constructors

 Leaf a The leaf contains the item `a` itself. Branch d (Dendrogram d a) (Dendrogram d a) Each branch connects two clusters/dendrograms that are `d` distance apart.

Instances

 Functor (Dendrogram d) Does not recalculate the distances! Foldable (Dendrogram d) Traversable (Dendrogram d) (Eq d, Eq a) => Eq (Dendrogram d a) (Ord d, Ord a) => Ord (Dendrogram d a) (Show d, Show a) => Show (Dendrogram d a)

elements :: Dendrogram d a -> [a]Source

List of elements in a dendrogram.

cutAt :: Ord d => Dendrogram d a -> d -> [Dendrogram d a]Source

`dendro `cutAt` threshold` cuts the dendrogram `dendro` at all branches which have distances strictly greater than `threshold`.

For example, suppose we have

``` dendro = Branch 0.8
(Branch 0.5
(Branch 0.2
(Leaf 'A')
(Leaf 'B'))
(Leaf 'C'))
(Leaf 'D')
```

Then:

``` dendro `cutAt` 0.9 == dendro `cutAt` 0.8 == [dendro] -- no changes
dendro `cutAt` 0.7 == dendro `cutAt` 0.5 == [Branch 0.5 (Branch 0.2 (Leaf 'A') (Leaf 'B')) (Leaf 'C'), Leaf 'D']
dendro `cutAt` 0.4 == dendro `cutAt` 0.2 == [Branch 0.2 (Leaf 'A') (Leaf 'B'), Leaf 'C', Leaf 'D']
dendro `cutAt` 0.1 == [Leaf 'A', Leaf 'B', Leaf 'C', Leaf 'D'] -- no branches at all
```

# Linkage data type

The linkage type determines how the distance between clusters will be calculated. These are the linkage types currently available on this library.

Constructors

 SingleLinkage The distance between two clusters `a` and `b` is the minimum distance between an element of `a` and an element of `b`. CompleteLinkage The distance between two clusters `a` and `b` is the maximum distance between an element of `a` and an element of `b`. UPGMA Unweighted Pair Group Method with Arithmetic mean, also called "average linkage". The distance between two clusters `a` and `b` is the arithmetic average between the distances of all elements in `a` to all elements in `b`. FakeAverageLinkage This method is usually wrongly called "average linkage". The distance between cluster `a = a1 U a2` (that is, cluster `a` was formed by the linkage of clusters `a1` and `a2`) and an old cluster `b` is `(d(a1,b) + d(a2,b)) / 2`. So when clustering two elements to create a cluster, this method is the same as UPGMA. However, in general when joining two clusters this method assigns equal weights to `a1` and `a2`, while UPGMA assigns weights proportional to the number of elements in each cluster. See, for example: http://www.cs.tau.ac.il/~rshamir/algmb/00/scribe00/html/lec08/node21.html, which defines the real UPGMA and gives the equation to calculate the distance between an old and a new cluster. http://github.com/JadeFerret/ai4r/blob/master/lib/ai4r/clusterers/average_linkage.rb, code for "average linkage" on ai4r library implementing what we call here `FakeAverageLinkage` and not UPGMA.

Instances

# Generic clustering function

Arguments

 :: (Ord d, Fractional d) => Linkage Linkage type to be used. -> [a] Items to be clustered. -> (a -> a -> d) Distance function between items. -> Dendrogram d a Complete dendrogram.

O(n^3) Calculates a complete, rooted dendrogram for a list of items and a linkage type. If your distance type has an `Ord` instance but not a `Fractional` one, then please use specific functions `singleLinkage` or `completeLinkage` that have less restrictive types.

# Functions for specific linkages

singleLinkage :: Ord d => [a] -> (a -> a -> d) -> Dendrogram d aSource

O(n^3) Like `dendrogram`, but specialized to single linkage (see `SingleLinkage`) which does not require `Fractional`.

completeLinkage :: Ord d => [a] -> (a -> a -> d) -> Dendrogram d aSource

O(n^3) Like `dendrogram`, but specialized to complete linkage (see `CompleteLinkage`) which does not require `Fractional`.

upgma :: (Fractional d, Ord d) => [a] -> (a -> a -> d) -> Dendrogram d aSource

O(n^3) Like `dendrogram`, but specialized to `UPGMA`.

fakeAverageLinkage :: (Fractional d, Ord d) => [a] -> (a -> a -> d) -> Dendrogram d aSource

O(n^3) Like `dendrogram`, but specialized to fake average linkage (see `FakeAverageLinkage`).