hierarchical-clustering-0.1: Algorithms for single, average/UPGMA and complete linkage clustering.

Data.Clustering.Hierarchical

Synopsis

Documentation

data Dendrogram d a Source

Data structure for storing hierarchical clusters.

Constructors

Leaf a

The leaf contains the item a itself.

Branch d (Dendrogram d a) (Dendrogram d a)

Each branch connects two clusters/dendrograms that are d distance apart.

Instances

Functor (Dendrogram d)

Does not recalculate the distances!

Foldable (Dendrogram d) 
Traversable (Dendrogram d) 
(Eq d, Eq a) => Eq (Dendrogram d a) 
(Ord d, Ord a) => Ord (Dendrogram d a) 
(Show d, Show a) => Show (Dendrogram d a) 

data Linkage Source

The linkage type determines how the distance between clusters will be calculated.

Constructors

SingleLinkage

The distance between two clusters a and b is the minimum distance between an element of a and an element of b.

CompleteLinkage

The distance between two clusters a and b is the maximum distance between an element of a and an element of b.

UPGMA

Unweighted Pair Group Method with Arithmetic mean, also called "average linkage". The distance between two clusters a and b is the arithmetic average between the distances of all elements in a to all elements in b.

FakeAverageLinkage

This method is usually wrongly called "average linkage". The distance between cluster a = a1 U a2 (that is, cluster a was formed by the linkage of clusters a1 and a2) and an old cluster b is (d(a1,b) + d(a2,b)) / 2. So when clustering two elements to create a cluster, this method is the same as UPGMA. However, in general when joining two clusters this method assigns equal weights to a1 and a2, while UPGMA assigns weights proportional to the number of elements in each cluster. See, for example:

completeDendrogram :: (Fractional d, Ord d) => Linkage -> [a] -> (a -> a -> d) -> Dendrogram d aSource

O(n^2) Calculates a complete, rooted dendrogram for a list of items and a distance function.