- data Dendrogram d a
- = Leaf a
- | Branch d (Dendrogram d a) (Dendrogram d a)
- elements :: Dendrogram d a -> [a]
- cutAt :: Ord d => Dendrogram d a -> d -> [Dendrogram d a]
- data Linkage
- dendrogram :: (Ord d, Fractional d) => Linkage -> [a] -> (a -> a -> d) -> Dendrogram d a
- singleLinkage :: Ord d => [a] -> (a -> a -> d) -> Dendrogram d a
- completeLinkage :: Ord d => [a] -> (a -> a -> d) -> Dendrogram d a
- upgma :: (Fractional d, Ord d) => [a] -> (a -> a -> d) -> Dendrogram d a
- fakeAverageLinkage :: (Fractional d, Ord d) => [a] -> (a -> a -> d) -> Dendrogram d a
Dendrogram data type
data Dendrogram d a Source
Data structure for storing hierarchical clusters. The
distance between clusters is stored on the branches.
Distances between leafs are the distances between the elements
on those leafs, while distances between branches are defined
by the linkage used (see Linkage
).
Leaf a | The leaf contains the item |
Branch d (Dendrogram d a) (Dendrogram d a) | Each branch connects two clusters/dendrograms that are
|
Functor (Dendrogram d) | Does not recalculate the distances! |
Foldable (Dendrogram d) | |
Traversable (Dendrogram d) | |
(Eq d, Eq a) => Eq (Dendrogram d a) | |
(Ord d, Ord a) => Ord (Dendrogram d a) | |
(Show d, Show a) => Show (Dendrogram d a) |
elements :: Dendrogram d a -> [a]Source
List of elements in a dendrogram.
cutAt :: Ord d => Dendrogram d a -> d -> [Dendrogram d a]Source
dendro `cutAt` threshold
cuts the dendrogram dendro
at
all branches which have distances strictly greater than
threshold
.
For example, suppose we have
dendro = Branch 0.8 (Branch 0.5 (Branch 0.2 (Leaf 'A') (Leaf 'B')) (Leaf 'C')) (Leaf 'D')
Then:
dendro `cutAt` 0.9 == dendro `cutAt` 0.8 == [dendro] -- no changes dendro `cutAt` 0.7 == dendro `cutAt` 0.5 == [Branch 0.5 (Branch 0.2 (Leaf 'A') (Leaf 'B')) (Leaf 'C'), Leaf 'D'] dendro `cutAt` 0.4 == dendro `cutAt` 0.2 == [Branch 0.2 (Leaf 'A') (Leaf 'B'), Leaf 'C', Leaf 'D'] dendro `cutAt` 0.1 == [Leaf 'A', Leaf 'B', Leaf 'C', Leaf 'D'] -- no branches at all
Linkage data type
The linkage type determines how the distance between clusters will be calculated. These are the linkage types currently available on this library.
SingleLinkage | The distance between two clusters |
CompleteLinkage | The distance between two clusters |
UPGMA | Unweighted Pair Group Method with Arithmetic mean, also
called "average linkage". The distance between two
clusters |
FakeAverageLinkage | This method is usually wrongly called "average linkage".
The distance between cluster
|
Generic clustering function
:: (Ord d, Fractional d) | |
=> Linkage | Linkage type to be used. |
-> [a] | Items to be clustered. |
-> (a -> a -> d) | Distance function between items. |
-> Dendrogram d a | Complete dendrogram. |
O(n^3) Calculates a complete, rooted dendrogram for a list
of items and a linkage type. If your distance type has an
Ord
instance but not a Fractional
one, then please use
specific functions singleLinkage
or completeLinkage
that
have less restrictive types.
Functions for specific linkages
singleLinkage :: Ord d => [a] -> (a -> a -> d) -> Dendrogram d aSource
O(n^3) Like dendrogram
, but specialized to single
linkage (see SingleLinkage
) which does not require
Fractional
.
completeLinkage :: Ord d => [a] -> (a -> a -> d) -> Dendrogram d aSource
O(n^3) Like dendrogram
, but specialized to complete
linkage (see CompleteLinkage
) which does not require
Fractional
.
upgma :: (Fractional d, Ord d) => [a] -> (a -> a -> d) -> Dendrogram d aSource
O(n^3) Like dendrogram
, but specialized to UPGMA
.
fakeAverageLinkage :: (Fractional d, Ord d) => [a] -> (a -> a -> d) -> Dendrogram d aSource
O(n^3) Like dendrogram
, but specialized to fake average
linkage (see FakeAverageLinkage
).