Portability	portable
Stability	experimental
Maintainer	amy@nualeargais.ie
Safe Haskell	Safe-Inferred

Data.Datamining.Clustering.SOM

Contents

Using the SOM
Numeric vectors as patterns

Description

A Kohonen Self-organising Map (SOM). A SOM maps input patterns onto a regular grid (usually two-dimensional) where each node in the grid is a model of the input data, and does so using a method which ensures that any topological relationships within the input data are also represented in the grid. This implementation supports the use of non-numeric patterns.

In layman's terms, a SOM can be useful when you you want to discover the underlying structure of some data. A tutorial is available at https://github.com/mhwombat/som/wiki

References:

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43 (1), 59–69.

Synopsis

Documentation

class Pattern p v | p -> v whereSource

A pattern to be learned or classified by a self-organising map.

Methods

difference :: p -> p -> vSource

Compares two patterns and returns a non-negative number representing how different the patterns are. A result of 0 indicates that the patterns are identical.

makeSimilar :: p -> v -> p -> pSource

makeSimilar target amount pattern returns a modified copy of pattern that is more similar to target than pattern is. The magnitude of the adjustment is controlled by the amount parameter, which should be a number between 0 and 1. Larger values for amount permit greater adjustments. If amount=1, the result should be identical to the target. If amount=0, the result should be the unmodified pattern.

Instances

(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) a
(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) a

Using the SOM

train :: (Ord v, Pattern p v, Grid g s k) => (Int -> v) -> GridMap g k p -> p -> GridMap g k pSource

If f d is a function that returns the learning rate to apply to a node based on its distance dfrom the node that best matches the input pattern, then train f c pattern returns a modified copy of the classifier c that has partially learned the target.

trainBatch :: (Ord v, Grid g s k, Pattern p v) => (Int -> v) -> GridMap g k p -> [p] -> GridMap g k pSource

Same as train, but applied to multiple patterns.

classify :: (Ord v, Pattern p v) => GridMap g k p -> p -> kSource

classify c pattern returns the position of the node in c whose pattern best matches the input pattern.

classifyAndTrain :: (Eq k, Ord v, Pattern p v, Grid g s k) => (Int -> v) -> GridMap g k p -> p -> (k, GridMap g k p)Source

If f is a function that returns the learning rate to apply to a node based on its distance from the node that best matches the target, then classifyAndTrain f c target returns a tuple containing the position of the node in c whose pattern best matches the input target, and a modified copy of the classifier c that has partially learned the target. Invoking classifyAndTrain f c p may be faster than invoking (p classify c, train f c p), but they should give identical results.

diffs :: Pattern p v => GridMap g k p -> p -> GridMap g k vSource

diffs c pattern returns the positions of all nodes in c, paired with the difference between pattern and the node's pattern.

differences :: Pattern p v => p -> GridMap g k p -> GridMap g k vSource

Deprecated: Use diffs instead

pattern `'differences'\` c returns the positions of all nodes in c, paired with the difference between pattern and the node's pattern. This function has been replaced with diffs, which swaps the parameter order to be consistent with classify.

diffAndTrain :: (Eq k, Ord v, Pattern p v, Grid g s k) => (Int -> v) -> GridMap g k p -> p -> (GridMap g k v, GridMap g k p)Source

If f is a function that returns the learning rate to apply to a node based on its distance from the node that best matches the target, then diffAndTrain f c target returns a tuple containing: 1. The positions of all nodes in c, paired with the difference between pattern and the node's pattern 2. A modified copy of the classifier c that has partially learned the target. Invoking diffAndTrain f c p may be faster than invoking (p differences c, train f c p), but they should give identical results.

Numeric vectors as patterns

Normalised vectors

normalise :: Floating a => [a] -> NormalisedVector aSource

Normalises a vector

data NormalisedVector a Source

A vector that has been normalised, i.e., the magnitude of the vector = 1.

Instances

Show a => Show (NormalisedVector a)
(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) a

Scaled vectors

scale :: Fractional a => [(a, a)] -> [a] -> ScaledVector aSource

Given a vector qs of pairs of numbers, where each pair represents the maximum and minimum value to be expected at each position in xs, scale qs xs scales the vector xs element by element, mapping the maximum value expected at that position to one, and the minimum value to zero.

data ScaledVector a Source

A vector that has been scaled so that all elements in the vector are between zero and one. To scale a set of vectors, use scaleAll. Alternatively, if you can identify a maximum and minimum value for each element in a vector, you can scale individual vectors using scale.

Instances

Show a => Show (ScaledVector a)
(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) a

Useful functions

If you wish to use a SOM with raw numeric vectors, use no-warn-orphans and add the following to your code:

 instance (Floating a, Fractional a, Ord a, Eq a) ⇒ Pattern [a] a where
   difference = euclideanDistanceSquared
   makeSimilar = adjustVector

adjustVector :: (Num a, Ord a, Eq a) => [a] -> a -> [a] -> [a]Source

adjustVector target amount vector adjusts vector to move it closer to target. The amount of adjustment is controlled by the learning rate r, which is a number between 0 and 1. Larger values of r permit more adjustment. If r=1, the result will be identical to the target. If amount=0, the result will be the unmodified pattern.

euclideanDistanceSquared :: Num a => [a] -> [a] -> aSource

Calculates the square of the Euclidean distance between two vectors.

gaussian :: Double -> Double -> Int -> Double Source

Calculates ce^(-d^2/2w^2). This form of the Gaussian function is useful as a learning rate function. In gaussian c w d, c specifies the highest learning rate, which will be applied to the SOM node that best matches the input pattern. The learning rate applied to other nodes will be applied based on their distance d from the best matching node. The value w controls the 'width' of the Gaussian. Higher values of w cause the learning rate to fall off more slowly with distance.