Portability | portable |
---|---|
Stability | experimental |
Maintainer | amy@nualeargais.ie |
Safe Haskell | Safe-Inferred |
A Kohonen Self-organising Map (SOM). A SOM maps input patterns onto a regular grid (usually two-dimensional) where each node in the grid is a model of the input data, and does so using a method which ensures that any topological relationships within the input data are also represented in the grid. This implementation supports the use of non-numeric patterns.
In layman's terms, a SOM can be useful when you you want to discover the underlying structure of some data. A tutorial is available at https://github.com/mhwombat/som/wiki
References:
- Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43 (1), 59–69.
- class Pattern p v | p -> v where
- difference :: p -> p -> v
- makeSimilar :: p -> v -> p -> p
- train :: (Ord v, Pattern p v, Grid g s k) => (Int -> v) -> GridMap g k p -> p -> GridMap g k p
- trainBatch :: (Ord v, Grid g s k, Pattern p v) => (Int -> v) -> GridMap g k p -> [p] -> GridMap g k p
- classify :: (Ord v, Pattern p v) => GridMap g k p -> p -> k
- classifyAndTrain :: (Eq k, Ord v, Pattern p v, Grid g s k) => (Int -> v) -> GridMap g k p -> p -> (k, GridMap g k p)
- differences :: Pattern p v => p -> GridMap g k p -> GridMap g k v
- normalise :: Floating a => [a] -> NormalisedVector a
- data NormalisedVector a
- scale :: Fractional a => [(a, a)] -> [a] -> ScaledVector a
- data ScaledVector a
- adjustVector :: (Num a, Ord a, Eq a) => [a] -> a -> [a] -> [a]
- euclideanDistanceSquared :: Num a => [a] -> [a] -> a
- gaussian :: Double -> Double -> Int -> Double
Documentation
class Pattern p v | p -> v whereSource
A pattern to be learned or classified by a self-organising map.
difference :: p -> p -> vSource
Compares two patterns and returns a non-negative number representing
how different the patterns are. A result of 0
indicates that the
patterns are identical.
makeSimilar :: p -> v -> p -> pSource
returns a modified copy of
makeSimilar
target amount patternpattern
that is more similar to target
than pattern
is. The
magnitude of the adjustment is controlled by the amount
parameter,
which should be a number between 0 and 1. Larger values for amount
permit greater adjustments. If amount
=1, the result should be
identical to the target
. If amount
=0, the result should be the
unmodified pattern
.
(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) a | |
(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) a |
Using the SOM
train :: (Ord v, Pattern p v, Grid g s k) => (Int -> v) -> GridMap g k p -> p -> GridMap g k pSource
If f d
is a function that returns the learning rate to apply to a node
based on its distance d
from the node that best matches the input
pattern, then
returns a modified copy of the
classifier train
f c patternc
that has partially learned the target
.
trainBatch :: (Ord v, Grid g s k, Pattern p v) => (Int -> v) -> GridMap g k p -> [p] -> GridMap g k pSource
Same as train
, but applied to multiple patterns.
classify :: (Ord v, Pattern p v) => GridMap g k p -> p -> kSource
returns the position of the node in classify
pattern cc
whose pattern best matches the input pattern
.
classifyAndTrain :: (Eq k, Ord v, Pattern p v, Grid g s k) => (Int -> v) -> GridMap g k p -> p -> (k, GridMap g k p)Source
If f
is a function that returns the learning rate to apply to a node
based on its distance from the node that best matches the target
, then
returns a tuple containing the position
of the node in classifyAndTrain
f c targetc
whose pattern best matches the input target
, and a
modified copy of the classifier c
that has partially learned the
target
.
differences :: Pattern p v => p -> GridMap g k p -> GridMap g k vSource
pattern `'differences'\` c
returns the positions of all nodes in
c
, paired with the difference between pattern
and the node's
pattern.
Numeric vectors as patterns
Normalised vectors
normalise :: Floating a => [a] -> NormalisedVector aSource
Normalises a vector
data NormalisedVector a Source
A vector that has been normalised, i.e., the magnitude of the vector = 1.
Show a => Show (NormalisedVector a) | |
(Floating a, Fractional a, Ord a, Eq a) => Pattern (NormalisedVector a) a |
Scaled vectors
scale :: Fractional a => [(a, a)] -> [a] -> ScaledVector aSource
Given a vector qs
of pairs of numbers, where each pair represents the
maximum and minimum value to be expected at each position in xs
,
scales the vector scale
qs xsxs
element by element, mapping the
maximum value expected at that position to one, and the minimum value to
zero.
data ScaledVector a Source
A vector that has been scaled so that all elements in the vector are
between zero and one. To scale a set of vectors, use
.
Alternatively, if you can identify a maximum and minimum value for
each element in a vector, you can scale individual vectors using
scaleAll
.
scale
Show a => Show (ScaledVector a) | |
(Fractional a, Ord a, Eq a) => Pattern (ScaledVector a) a |
Useful functions
If you wish to use a SOM with raw numeric vectors, use no-warn-orphans
and
add the following to your code:
instance (Floating a, Fractional a, Ord a, Eq a) ⇒ Pattern [a] a where difference = euclideanDistanceSquared makeSimilar = adjustVector
adjustVector :: (Num a, Ord a, Eq a) => [a] -> a -> [a] -> [a]Source
adjusts adjustVector
target amount vectorvector
to move it closer
to target
. The amount of adjustment is controlled by the learning rate
r
, which is a number between 0 and 1. Larger values of r
permit more
adjustment. If r
=1, the result will be identical to the target
. If
amount
=0, the result will be the unmodified pattern
.
euclideanDistanceSquared :: Num a => [a] -> [a] -> aSource
Calculates the square of the Euclidean distance between two vectors.
gaussian :: Double -> Double -> Int -> DoubleSource
Calculates ce^(-d^2/2w^2)
.
This form of the Gaussian function is useful as a learning rate function.
In
, gaussian
c w dc
specifies the highest learning rate, which
will be applied to the SOM node that best matches the input pattern.
The learning rate applied to other nodes will be applied based on their
distance d
from the best matching node. The value w
controls the
'width' of the Gaussian. Higher values of w
cause the learning rate
to fall off more slowly with distance.