Copyright	(c) Amy de Buitléir 2012-2015
License	BSD-style
Maintainer	amy@nualeargais.ie
Stability	experimental
Portability	portable
Safe Haskell	Safe
Language	Haskell98

Data.Datamining.Clustering.SOM

Contents

Construction
Deconstruction
Learning functions
Advanced control

Description

A Kohonen Self-organising Map (SOM). A SOM maps input patterns onto a regular grid (usually two-dimensional) where each node in the grid is a model of the input data, and does so using a method which ensures that any topological relationships within the input data are also represented in the grid. This implementation supports the use of non-numeric patterns.

In layman's terms, a SOM can be useful when you you want to discover the underlying structure of some data. A tutorial is available at https://github.com/mhwombat/som/wiki.

NOTES:

Version 5.0 fixed a bug in the decayingGaussian function. If you use defaultSOM (which uses this function), your SOM should now learn more quickly.
The gaussian function has been removed because it is not as useful for SOMs as I originally thought. It was originally designed to be used as a factor in a learning function. However, in most cases the user will want to introduce a time decay into the exponent, rather than simply multiply by a factor.

References:

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43 (1), 59–69.

Synopsis

Construction

data SOM t d gm x k p Source

A Self-Organising Map (SOM).

Although SOM implements GridMap, most users will only need the interface provided by Data.Datamining.Clustering.Classifier. If you chose to use the GridMap functions, please note:

The functions adjust, and adjustWithKey do not increment the counter. You can do so manually with incrementCounter.
The functions map and mapWithKey are not implemented (they just return an error). It would be problematic to implement them because the input SOM and the output SOM would have to have the same Metric type.

Constructors

SOM

Fields

gridMap :: gm p: Maps patterns to tiles in a regular grid. In the context of a SOM, the tiles are called "nodes"
learningRate :: t -> d -> x: A function which determines the how quickly the SOM learns. For example, if the function is f, then f t d returns the learning rate for a node. The parameter t indicates how many patterns (or pattern batches) have previously been presented to the classifier. Typically this is used to make the learning rate decay over time. The parameter d is the grid distance from the node being updated to the BMU (Best Matching Unit). The output is the learning rate for that node (the amount by which the node's model should be updated to match the target). The learning rate should be between zero and one.
difference :: p -> p -> x: A function which compares two patterns and returns a non-negative number representing how different the patterns are. A result of 0 indicates that the patterns are identical.
makeSimilar :: p -> x -> p -> p: A function which updates models. If this function is f, then f target amount pattern returns a modified copy of pattern that is more similar to target than pattern is. The magnitude of the adjustment is controlled by the amount parameter, which should be a number between 0 and 1. Larger values for amount permit greater adjustments. If amount=1, the result should be identical to the target. If amount=0, the result should be the unmodified pattern.
counter :: t: A counter used as a "time" parameter. If you create the SOM with a counter value 0, and don't directly modify it, then the counter will represent the number of patterns that this SOM has classified.

Instances

(GridMap gm p, (~) * k (Index (BaseGrid gm p)), Grid (gm p), GridMap gm x, (~) * k (Index (gm p)), (~) * k (Index (BaseGrid gm x)), Num t, Ord x, Num x, Num d) => Classifier (SOM t d gm) x k p Source
Foldable gm => Foldable (SOM t d gm x k) Source
(Foldable gm, GridMap gm p, Grid (BaseGrid gm p)) => GridMap (SOM t d gm x k) p Source
Generic (SOM t d gm x k p) Source
Grid (gm p) => Grid (SOM t d gm x k p) Source
type BaseGrid (SOM t d gm x k) p = BaseGrid gm p Source
type Rep (SOM t d gm x k p) Source
type Index (SOM t d gm x k p) = Index (gm p) Source
type Direction (SOM t d gm x k p) = Direction (gm p) Source

Deconstruction

toGridMap :: GridMap gm p => SOM t d gm x k p -> gm p Source

Extracts the grid and current models from the SOM. A synonym for gridMap.

Learning functions

decayingGaussian :: Floating x => x -> x -> x -> x -> x -> x -> x -> x Source

A typical learning function for classifiers. decayingGaussian r0 rf w0 wf tf returns a bell curve-shaped function. At time zero, the maximum learning rate (applied to the BMU) is r0, and the neighbourhood width is w0. Over time the bell curve shrinks and the learning rate tapers off, until at time tf, the maximum learning rate (applied to the BMU) is rf, and the neighbourhood width is wf. Normally the parameters should be chosen such that:

0 < rf << r0 < 1
0 < wf << w0
0 < tf

where << means "is much smaller than" (not the Haskell << operator!)

stepFunction :: (Num d, Fractional x, Eq d) => x -> t -> d -> x Source

A learning function that only updates the BMU and has a constant learning rate.

constantFunction :: x -> t -> d -> x Source

A learning function that updates all nodes with the same, constant learning rate. This can be useful for testing.

Advanced control

trainNeighbourhood :: (Grid (gm p), GridMap gm p, Index (BaseGrid gm p) ~ Index (gm p), Num t, Num x, Num d) => SOM t d gm x k p -> Index (gm p) -> p -> SOM t d gm x k p Source

Trains the specified node and the neighbourood around it to better match a target. Most users should use train, which automatically determines the BMU and trains it and its neighbourhood.