clustering-0.2.0: High performance clustering algorithms

Copyright(c) 2015 Kai Zhang
LicenseMIT
Maintainerkai@kzhang.org
Stabilityexperimental
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

AI.Clustering.Hierarchical

Contents

Description

High performance agglomerative hierarchical clustering library. Example:

>>> :set -XOverloadedLists
>>> import qualified Data.Vector as V
>>> let points = [[2, 3, 4], [2, 1, 2], [2, 1, 6], [2, 4, 6], [5, 1, 2]] :: V.Vector (V.Vector Double)
>>> let dendro = hclust Average points euclidean
>>> print dendro
Branch 5 4.463747440868191 (Branch 3 2.914213562373095 (Leaf (fromList [2.0,1.0,6.0]))
(Branch 2 2.23606797749979 (Leaf (fromList [2.0,3.0,4.0])) (Leaf (fromList [2.0,4.0,6.0]))))
(Branch 2 3.0 (Leaf (fromList [2.0,1.0,2.0])) (Leaf (fromList [5.0,1.0,2.0])))
>>> putStr $ drawDendrogram $ fmap show dendro
h: 4.4637
|
+- h: 2.9142
|  |
|  +- fromList [2.0,1.0,6.0]
|  |
|  `- h: 2.2361
|     |
|     +- fromList [2.0,3.0,4.0]
|     |
|     `- fromList [2.0,4.0,6.0]
|
`- h: 3.0000
   |
   +- fromList [2.0,1.0,2.0]
   |
   `- fromList [5.0,1.0,2.0]

Synopsis

Documentation

data Dendrogram a Source

Constructors

Leaf !a 
Branch !Size !Distance !(Dendrogram a) !(Dendrogram a) 

Instances

size :: Dendrogram a -> Int Source

O(1) Return the size of a dendrogram

data Linkage Source

Different hierarchical clustering schemes.

Constructors

Single

O(n^2) Single linkage, $d(A,B) = min_{a in A, b in B} d(a,b)$.

Complete

O(n^2) Complete linkage, $d(A,B) = max_{a in A, b in B} d(a,b)$.

Average

O(n^2) Average linkage or UPGMA, $d(A,B) = frac{sum_{a in A}sum_{b in B}d(a,b)}{|A||B|}$.

Weighted

O(n^2) Weighted linkage.

Ward

O(n^2) Ward's method.

Centroid

O(n^3) Centroid linkage, not implemented.

Median

O(n^3) Median linkage, not implemented.

hclust :: Vector v a => Linkage -> v a -> DistFn a -> Dendrogram a Source

Perform hierarchical clustering.

cutAt :: Dendrogram a -> Distance -> [Dendrogram a] Source

Cut a dendrogram at given height.

flatten :: Dendrogram a -> [a] Source

Return the elements of a dendrogram in pre-order.

drawDendrogram :: Dendrogram String -> String Source

2-dimensional drawing of a dendrogram

Distance functions

euclidean :: Vector v Double => DistFn (v Double) Source

Compute euclidean distance between two points.

hamming :: (Vector v a, Vector v Bool, Eq a) => DistFn (v a) Source

Hamming distance.

References

Müllner D (2011). Modern Hierarchical, Agglomerative Clustering Algorithms. ArXiv:1109.2378 [stat.ML]. http://arxiv.org/abs/1109.2378