Copyright	(c) Amy de Buitléir 2012-2015
License	BSD-style
Maintainer	amy@nualeargais.ie
Stability	experimental
Portability	portable
Safe Haskell	Safe
Language	Haskell98

Data.Datamining.Clustering.SGM

Contents

Construction
Deconstruction
Learning and classification

Description

A Self-generating Model (SGM). An SGM maps input patterns onto a set, where each element in the set is a model of the input data. An SGM is like a Kohonen Self-organising Map (SOM), except:

Instead of a grid, it uses a simple set of unconnected models. Since the models are unconnected, only the model that best matches the input is ever updated. This makes it faster, however, topological relationships within the input data are not preserved.
New models are created on-the-fly when no existing model is similar enough to an input pattern. If the SGM is at capacity, the least useful model will be deleted.

This implementation supports the use of non-numeric patterns.

In layman's terms, a SGM can be useful when you you want to build a set of models on some data. A tutorial is available at https://github.com/mhwombat/som/wiki.

References:

de Buitléir, Amy, Russell, Michael and Daly, Mark. (2012). Wains: A pattern-seeking artificial life species. Artificial Life, 18 (4), 399-423.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43 (1), 59–69.

Synopsis

Construction

data SGM t x k p Source

A Simplified Self-Organising Map (SGM). t is the type of the counter. x is the type of the learning rate and the difference metric. k is the type of the model indices. p is the type of the input patterns and models.

Constructors

SGM

Fields

toMap :: Map k (p, t): Maps patterns and match counts to nodes.
learningRate :: t -> x: A function which determines the learning rate for a node. The input parameter indicates how many patterns (or pattern batches) have previously been presented to the classifier. Typically this is used to make the learning rate decay over time. The output is the learning rate for that node (the amount by which the node's model should be updated to match the target). The learning rate should be between zero and one.
maxSize :: Int: The maximum number of models this SGM can hold.
diffThreshold :: x: The threshold that triggers creation of a new model.
allowDeletion :: Bool: Delete existing models to make room for new ones? The least useful (least frequently matched) models will be deleted first.
difference :: p -> p -> x: A function which compares two patterns and returns a non-negative number representing how different the patterns are. A result of 0 indicates that the patterns are identical.
makeSimilar :: p -> x -> p -> p: A function which updates models. For example, if this function is f, then f target amount pattern returns a modified copy of pattern that is more similar to target than pattern is. The magnitude of the adjustment is controlled by the amount parameter, which should be a number between 0 and 1. Larger values for amount permit greater adjustments. If amount=1, the result should be identical to the target. If amount=0, the result should be the unmodified pattern.
nextIndex :: k: Index for the next node to add to the SGM.

Instances

Generic (SGM t x k p) Source
(NFData t, NFData x, NFData k, NFData p) => NFData (SGM t x k p) Source
type Rep (SGM t x k p) Source

makeSGM :: Bounded k => (t -> x) -> Int -> x -> Bool -> (p -> p -> x) -> (p -> x -> p -> p) -> SGM t x k p Source

Deconstruction

time :: Num t => SGM t x k p -> t Source

The current "time" (number of times the SGM has been trained).

isEmpty :: SGM t x k p -> Bool Source

Returns true if the SGM has no models, false otherwise.

numModels :: SGM t x k p -> Int Source

Returns the number of models the SGM currently contains.

modelMap :: SGM t x k p -> Map k p Source

Returns a map from node ID to model.

counterMap :: SGM t x k p -> Map k t Source

Returns a map from node ID to counter (number of times the node's model has been the closest match to an input pattern).

Learning and classification

exponential :: (Floating a, Integral t) => a -> a -> t -> a Source

A typical learning function for classifiers. exponential r0 d t returns the learning rate at time t. When t = 0, the learning rate is r0. Over time the learning rate decays exponentially; the decay rate is d. Normally the parameters are chosen such that:

0 < r0 < 1
0 < d

classify :: (Num t, Ord t, Num x, Ord x, Enum k, Ord k) => SGM t x k p -> p -> (k, x, [(k, x)]) Source

classify s p identifies the model s that most closely matches the pattern p. It will not make any changes to the classifier. Returns the ID of the node with the best matching model, the difference between the best matching model and the pattern, and the SGM labels paired with the difference between the input and the corresponding model. The final paired list is sorted in decreasing order of similarity.

trainAndClassify :: (Num t, Ord t, Num x, Ord x, Enum k, Ord k) => SGM t x k p -> p -> (k, x, [(k, x)], SGM t x k p) Source

trainAndClassify s p identifies the model in s that most closely matches p, and updates it to be a somewhat better match. If necessary, it will create a new node and model. Returns the ID of the node with the best matching model, the difference between the best matching model and the pattern, the differences between the input and each model in the SGM, and the updated SGM.

train :: (Num t, Ord t, Num x, Ord x, Enum k, Ord k) => SGM t x k p -> p -> SGM t x k p Source

train s p identifies the model in s that most closely matches p, and updates it to be a somewhat better match. If necessary, it will create a new node and model.

trainBatch :: (Num t, Ord t, Num x, Ord x, Enum k, Ord k) => SGM t x k p -> [p] -> SGM t x k p Source

For each pattern p in ps, trainBatch s ps identifies the model in s that most closely matches p, and updates it to be a somewhat better match.