hierarchical-clustering-0.4.2: Fast algorithms for single, average/UPGMA and complete linkage clustering.

Data.Clustering.Hierarchical

Synopsis

# Dendrogram data type

data Dendrogram a Source

Data structure for storing hierarchical clusters. The distance between clusters is stored on the branches. Distances between leafs are the distances between the elements on those leafs, while distances between branches are defined by the linkage used (see Linkage).

Constructors

 Leaf a The leaf contains the item a itself. Branch !Distance (Dendrogram a) (Dendrogram a) Each branch connects two clusters/dendrograms that are d distance apart.

Instances

 Functor Dendrogram Does not recalculate the distances! Foldable Dendrogram Traversable Dendrogram Eq a => Eq (Dendrogram a) Ord a => Ord (Dendrogram a) Show a => Show (Dendrogram a)

type Distance = DoubleSource

A distance is simply a synonym of Double for efficiency.

elements :: Dendrogram a -> [a]Source

List of elements in a dendrogram.

cutAt :: Dendrogram a -> Distance -> [Dendrogram a]Source

dendro `cutAt` threshold cuts the dendrogram dendro at all branches which have distances strictly greater than threshold.

For example, suppose we have

dendro = Branch 0.8
(Branch 0.5
(Branch 0.2
(Leaf 'A')
(Leaf 'B'))
(Leaf 'C'))
(Leaf 'D')

Then:

dendro `cutAt` 0.9 == dendro `cutAt` 0.8 == [dendro] -- no changes
dendro `cutAt` 0.7 == dendro `cutAt` 0.5 == [Branch 0.5 (Branch 0.2 (Leaf 'A') (Leaf 'B')) (Leaf 'C'), Leaf 'D']
dendro `cutAt` 0.4 == dendro `cutAt` 0.2 == [Branch 0.2 (Leaf 'A') (Leaf 'B'), Leaf 'C', Leaf 'D']
dendro `cutAt` 0.1 == [Leaf 'A', Leaf 'B', Leaf 'C', Leaf 'D'] -- no branches at all

The linkage type determines how the distance between clusters will be calculated. These are the linkage types currently available on this library.

Constructors

Instances

# Clustering function

Arguments

 :: Linkage Linkage type to be used. -> [a] Items to be clustered. -> (a -> a -> Distance) Distance function between items. -> Dendrogram a Complete dendrogram.

Calculates a complete, rooted dendrogram for a list of items and a linkage type. The following are the time and space complexities for each linkage: