Portability | portable |
---|---|

Stability | experimental |

Maintainer | amy@nualeargais.ie |

Safe Haskell | Safe-Inferred |

A Kohonen Self-organising Map (SOM). A SOM maps input patterns onto a regular grid (usually two-dimensional) where each node in the grid is a model of the input data, and does so using a method which ensures that any topological relationships within the input data are also represented in the grid. This implementation supports the use of non-numeric patterns.

In layman's terms, a SOM can be useful when you you want to discover the underlying structure of some data. A tutorial is available at https://github.com/mhwombat/som/wiki

References:

- Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43 (1), 59–69.

- class Pattern p where
- type Metric p
- difference :: p -> p -> Metric p
- makeSimilar :: p -> Metric p -> p -> p

- train :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> gm p
- trainBatch :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> [p] -> gm p
- classify :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, Ord m, k ~ Index (BaseGrid gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> p -> k
- classifyAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (Index (gm p), gm p)
- diff :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, BaseGrid gm p ~ BaseGrid gm m) => gm p -> p -> gm m
- diffAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (gm m, gm p)
- normalise :: Floating a => [a] -> NormalisedVector a
- data NormalisedVector a
- scale :: Fractional a => [(a, a)] -> [a] -> ScaledVector a
- data ScaledVector a
- adjustVector :: (Num a, Ord a, Eq a) => [a] -> a -> [a] -> [a]
- euclideanDistanceSquared :: Num a => [a] -> [a] -> a
- gaussian :: Double -> Double -> Int -> Double

# Documentation

A pattern to be learned or classified by a self-organising map.

difference :: p -> p -> Metric pSource

Compares two patterns and returns a *non-negative* number
representing how different the patterns are. A result of `0`

indicates that the patterns are identical.

makeSimilar :: p -> Metric p -> p -> pSource

returns a modified copy of
`makeSimilar`

target amount pattern`pattern`

that is more similar to `target`

than `pattern`

is. The
magnitude of the adjustment is controlled by the `amount`

parameter, which should be a number between 0 and 1. Larger
values for `amount`

permit greater adjustments. If `amount`

=1,
the result should be identical to the `target`

. If `amount`

=0,
the result should be the unmodified `pattern`

.

(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) | |

(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) |

# Using the SOM

train :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> gm pSource

If `f d`

is a function that returns the learning rate to apply to a
node based on its distance `d`

from the node that best matches the
input pattern, then

returns a modified copy
of the classifier `train`

c f pattern`c`

that has partially learned the `target`

.

trainBatch :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> [p] -> gm pSource

Same as `train`

, but applied to multiple patterns.

classify :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, Ord m, k ~ Index (BaseGrid gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> p -> kSource

`classify c pattern`

returns the position of the node in `c`

whose pattern best matches the input `pattern`

.

classifyAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (Index (gm p), gm p)Source

If `f`

is a function that returns the learning rate to apply to a
node based on its distance from the node that best matches the
`target`

, then

returns a tuple
containing the position of the node in `classifyAndTrain`

c f target`c`

whose pattern best
matches the input `target`

, and a modified copy of the classifier
`c`

that has partially learned the `target`

.
Invoking `classifyAndTrain c f p`

may be faster than invoking
`(p `

, but they should give identical
results.
`classify`

c, train c f p)

diff :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, BaseGrid gm p ~ BaseGrid gm m) => gm p -> p -> gm mSource

returns the positions of all nodes in
`diff`

c pattern`c`

, paired with the difference between `pattern`

and the node's
pattern.

diffAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (gm m, gm p)Source

If `f`

is a function that returns the learning rate to apply to a
node based on its distance from the node that best matches the
`target`

, then

returns a tuple
containing:
1. The positions of all nodes in `diffAndTrain`

c f target`c`

, paired with the difference
between `pattern`

and the node's pattern
2. A modified copy of the classifier `c`

that has partially
learned the `target`

.
Invoking `diffAndTrain c f p`

may be faster than invoking
`(p `

, but they should give identical
results.
`diff`

c, train c f p)

# Numeric vectors as patterns

## Normalised vectors

normalise :: Floating a => [a] -> NormalisedVector aSource

Normalises a vector

data NormalisedVector a Source

A vector that has been normalised, i.e., the magnitude of the vector = 1.

Show a => Show (NormalisedVector a) | |

(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) |

## Scaled vectors

scale :: Fractional a => [(a, a)] -> [a] -> ScaledVector aSource

Given a vector `qs`

of pairs of numbers, where each pair represents
the maximum and minimum value to be expected at each position in
`xs`

,

scales the vector `scale`

qs xs`xs`

element by element,
mapping the maximum value expected at that position to one, and the
minimum value to zero.

data ScaledVector a Source

A vector that has been scaled so that all elements in the vector
are between zero and one. To scale a set of vectors, use

. Alternatively, if you can identify a maximum and
minimum value for each element in a vector, you can scale
individual vectors using `scaleAll`

.
`scale`

Show a => Show (ScaledVector a) | |

(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) |

## Useful functions

If you wish to use a SOM with raw numeric vectors, use `no-warn-orphans`

and add the following to your code:

instance (Floating a, Fractional a, Ord a, Eq a) ⇒ Pattern [a] a where difference = euclideanDistanceSquared makeSimilar = adjustVector

adjustVector :: (Num a, Ord a, Eq a) => [a] -> a -> [a] -> [a]Source

adjusts `adjustVector`

target amount vector`vector`

to move it
closer to `target`

. The amount of adjustment is controlled by the
learning rate `r`

, which is a number between 0 and 1. Larger values
of `r`

permit more adjustment. If `r`

=1, the result will be
identical to the `target`

. If `amount`

=0, the result will be the
unmodified `pattern`

.

euclideanDistanceSquared :: Num a => [a] -> [a] -> aSource

Calculates the square of the Euclidean distance between two vectors.

gaussian :: Double -> Double -> Int -> DoubleSource

Calculates `c`

.
This form of the Gaussian function is useful as a learning rate
function. In *e*^(-d^2/2w^2)

, `gaussian`

c w d`c`

specifies the highest learning
rate, which will be applied to the SOM node that best matches the
input pattern. The learning rate applied to other nodes will be
applied based on their distance `d`

from the best matching node.
The value `w`

controls the 'width' of the Gaussian. Higher values
of `w`

cause the learning rate to fall off more slowly with
distance.