Portability | portable |
---|---|
Stability | experimental |
Maintainer | amy@nualeargais.ie |
Safe Haskell | Safe-Inferred |
A Kohonen Self-organising Map (SOM). A SOM maps input patterns onto a regular grid (usually two-dimensional) where each node in the grid is a model of the input data, and does so using a method which ensures that any topological relationships within the input data are also represented in the grid. This implementation supports the use of non-numeric patterns.
In layman's terms, a SOM can be useful when you you want to discover the underlying structure of some data. A tutorial is available at https://github.com/mhwombat/som/wiki
References:
- Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43 (1), 59–69.
NOTE: Version 3.0 changed the order of parameters for many functions. This makes it easier for the user to write mapping and folding operations.
- class Pattern p where
- type Metric p
- difference :: p -> p -> Metric p
- makeSimilar :: p -> Metric p -> p -> p
- train :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> gm p
- trainBatch :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> [p] -> gm p
- classify :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, Ord m, k ~ Index (BaseGrid gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> p -> k
- classifyAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (Index (gm p), gm p)
- diff :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, BaseGrid gm p ~ BaseGrid gm m) => gm p -> p -> gm m
- diffAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (gm m, gm p)
- normalise :: Floating a => [a] -> NormalisedVector a
- data NormalisedVector a
- scale :: Fractional a => [(a, a)] -> [a] -> ScaledVector a
- data ScaledVector a
- adjustVector :: (Num a, Ord a, Eq a) => [a] -> a -> [a] -> [a]
- euclideanDistanceSquared :: Num a => [a] -> [a] -> a
- gaussian :: Double -> Double -> Int -> Double
Patterns
A pattern to be learned or classified by a self-organising map.
difference :: p -> p -> Metric pSource
Compares two patterns and returns a non-negative number
representing how different the patterns are. A result of 0
indicates that the patterns are identical.
makeSimilar :: p -> Metric p -> p -> pSource
returns a modified copy of
makeSimilar
target amount patternpattern
that is more similar to target
than pattern
is. The
magnitude of the adjustment is controlled by the amount
parameter, which should be a number between 0 and 1. Larger
values for amount
permit greater adjustments. If amount
=1,
the result should be identical to the target
. If amount
=0,
the result should be the unmodified pattern
.
(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) | |
(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) |
Using the SOM
train :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> gm pSource
If f d
is a function that returns the learning rate to apply to a
node based on its distance d
from the node that best matches the
input pattern, then
returns a modified copy
of the classifier train
c f patternc
that has partially learned the target
.
trainBatch :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> [p] -> gm pSource
Same as train
, but applied to multiple patterns.
classify :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, Ord m, k ~ Index (BaseGrid gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> p -> kSource
classify c pattern
returns the position of the node in c
whose pattern best matches the input pattern
.
classifyAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (Index (gm p), gm p)Source
If f
is a function that returns the learning rate to apply to a
node based on its distance from the node that best matches the
target
, then
returns a tuple
containing the position of the node in classifyAndTrain
c f targetc
whose pattern best
matches the input target
, and a modified copy of the classifier
c
that has partially learned the target
.
Invoking classifyAndTrain c f p
may be faster than invoking
(p
, but they should give identical
results.
classify
c, train c f p)
diff :: (GridMap gm p, Pattern p, GridMap gm m, Metric p ~ m, BaseGrid gm p ~ BaseGrid gm m) => gm p -> p -> gm mSource
returns the positions of all nodes in
diff
c patternc
, paired with the difference between pattern
and the node's
pattern.
diffAndTrain :: (Ord m, GridMap gm p, GridMap gm m, GridMap gm (Int, p), GridMap gm (m, p), Grid (gm p), Pattern p, Metric p ~ m, Index (BaseGrid gm p) ~ Index (gm p), BaseGrid gm m ~ BaseGrid gm p) => gm p -> (Int -> m) -> p -> (gm m, gm p)Source
If f
is a function that returns the learning rate to apply to a
node based on its distance from the node that best matches the
target
, then
returns a tuple
containing:
1. The positions of all nodes in diffAndTrain
c f targetc
, paired with the difference
between pattern
and the node's pattern
2. A modified copy of the classifier c
that has partially
learned the target
.
Invoking diffAndTrain c f p
may be faster than invoking
(p
, but they should give identical
results.
diff
c, train c f p)
Numeric vectors as patterns
Normalised vectors
normalise :: Floating a => [a] -> NormalisedVector aSource
Normalises a vector
data NormalisedVector a Source
A vector that has been normalised, i.e., the magnitude of the vector = 1.
Show a => Show (NormalisedVector a) | |
(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) |
Scaled vectors
scale :: Fractional a => [(a, a)] -> [a] -> ScaledVector aSource
Given a vector qs
of pairs of numbers, where each pair represents
the maximum and minimum value to be expected at each position in
xs
,
scales the vector scale
qs xsxs
element by element,
mapping the maximum value expected at that position to one, and the
minimum value to zero.
data ScaledVector a Source
A vector that has been scaled so that all elements in the vector
are between zero and one. To scale a set of vectors, use
. Alternatively, if you can identify a maximum and
minimum value for each element in a vector, you can scale
individual vectors using scaleAll
.
scale
Show a => Show (ScaledVector a) | |
(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) |
Useful functions
If you wish to use a SOM with raw numeric vectors, use no-warn-orphans
and add the following to your code:
instance (Floating a, Fractional a, Ord a, Eq a) ⇒ Pattern [a] a where difference = euclideanDistanceSquared makeSimilar = adjustVector
adjustVector :: (Num a, Ord a, Eq a) => [a] -> a -> [a] -> [a]Source
adjusts adjustVector
target amount vectorvector
to move it
closer to target
. The amount of adjustment is controlled by the
learning rate r
, which is a number between 0 and 1. Larger values
of r
permit more adjustment. If r
=1, the result will be
identical to the target
. If amount
=0, the result will be the
unmodified pattern
.
euclideanDistanceSquared :: Num a => [a] -> [a] -> aSource
Calculates the square of the Euclidean distance between two vectors.
gaussian :: Double -> Double -> Int -> DoubleSource
Calculates ce^(-d^2/2w^2)
.
This form of the Gaussian function is useful as a learning rate
function. In
, gaussian
c w dc
specifies the highest learning
rate, which will be applied to the SOM node that best matches the
input pattern. The learning rate applied to other nodes will be
applied based on their distance d
from the best matching node.
The value w
controls the 'width' of the Gaussian. Higher values
of w
cause the learning rate to fall off more slowly with
distance.