[ QU      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTportable experimentalgnn.github@gmail.com1The bounds of a list of inputs. Having the tuple (a,b) at index i  in bounds means that the value at index i of each of the input vectors / from the inputs which where used to calculate bounds is from the  intervall [a,b]. 3Input vectors are represented as lists of Doubles. Normalizes input vectors.   inputs' takes the given list of input vectors inputs and < returns a list of input vectors where each component is in [0,1]. 2 If you want to unnormalize the input vectors use  and  . 6Calculates the bounds of the input vector components. %Unnormalizes the given input vectors inputs assuming that it' s bounds  previously where bounds. FCalculating the dimension of a given set of inputs just means finding ) the length of the longest input vector.  i1 i2+ calculates the euclidean distance between  i1 and i2. If i1 and i2 have different lengths, excess  values are ignored. 1Multiplication of an input vector with a scalar. 8Subtraction and addition of vectors between each other. UGZips two lists, but instead of truncating the longer one to the length F of the shortert one the shorter one is padded with elements from the L suffix of the longer one which is exceeding the length of the shorter one. VForces a whole list. If it wasn't for this function,  ' would blow the stack because only the W of the bounds would be fully  evaluated while the X! would consist of huge thunks of Y and  Z applcations.     portable experimentalgnn.github@gmail.com AThe list of supported directions. Since we are only dealing with = hexagonal lattices, there are only six possible directions.  location direction calculates the coordinates of % the neighbour of node with location location in direction   direction.  point calculates the list of , coordinates which are directly adjacent to point.    non-portable (requires STM) experimentalgnn.github@gmail.com?The type of neighbourhoods. Wherever a neighbourhood of a node % is neede, this type should be used.  A  Neighbourhood/ consits of a list of pairs of nodes and their > discrete grid distance from the source of the neighbourhood. / The source node is the only one with distance 0 while immediate ( neighbours get distance one and so on. A node'*s neighbours are stored in fields of type  Neighbours. The type of nodes of a gsom. Dor they are actual nodes with a few associated values and a list of  neighbouring nodes. EUsed to uniquely identify nodes. This is also the actual location of E the node if the lattice it belongs to is beeing laid out in the two ? dimensional plane and it is used to store the node in the map  comprising the lattice. 8The quantization error the node has accumulated so far. The node'<s weight vector. This is the center of the voronoi cell the  node represents. The node's neighbours. They'9re either Leafs, signalling neighbours of boundary nodes  id weights neighbours# creates a node with the specified . parameters and initial quantization error of 0. % input learning_rate kernel neighbour updates  the weights of the  neighbour according to the formula:  w=eight -> weight + learning_rate * (kernel d) (input - weight)updateError node input updates the  of node. > The new error is just the old error plus the distance of the node's  weight vector from input.  node/ propagates the accumulated error of the given node  to it's neighbours. !  node returns [ if the given node is a  and  \ otherwise. "! node returns \ if the given node is a  and  [ otherwise. BCalculates the neighbourhood of the given size of the given node.  A neighbourhood size of 0( means that only the given node will be D an element of the returned set while a size of one will return the  given node and it'"s immediate neighbours and so on.  It'#s not very efficient so you shouldn't try big neihbourhood sizes. , The returned neighbourhood always includes node. ## node' returns the list of neighbours of the  given node. E Note that neighbours is unwrapped, i.e. the returned list hast type   not TVar . $]^)Inverts a direction, i.e. given an index i representing a direction, ? it calculates the index representing the backwards direction. E The rectangular lattice structure is hardcoded as the number 4. For C generalt structures with n neighbours we should have the formula:  invert i = (i+n/2) _ n `a^ direction/ returns the index which points to the left of   direction. ^ direction/ returns the index which points to the left of   direction. +This should probably belong into input.hs. %2When a new node is spawned we need to calculate it's new weight vector. E If the new node is spawned from parent p in direction d and p has a  neighbour n in the direction d'# opposite to d then the new weight 3 vector nw is calculated according to the formula:   nw = 2 * ( p) - ( n). GIn all other cases there exists exactly one neighbour of the new node. ) Let this neighbour be called n and let d' be the direction in which we E have to go to reach this neighbour from the new node. Let s then be  the child of the new node's parent p in direction d'. ? The new weights are then calculated according to the formula:  nw = p + n - s. b%Used to modify the fields of a node.  modify node field f modifies node by using field to select ! the appropriate value, applying f to the value and storing the  result in the field. & !"#$%& !"%&$# !"#$%&non-portable (requires STM) experimentalgnn.github@gmail.com '=The lattice type. Since global access to nodes is needed they're  stored in a Data.Map indexed by their coordinates. (( g dimension1 creates a new minimal lattice where weights are J randomly initialized with values between 0 and 1 using the random number  generator g2 and with the weight vectors having the specified  dimension. ) newNormalized dimension- creates a new minimal lattice where weights 6 are initialized with all components having the value 0.5 the and with " the weight vectors having length  dimension. *%Returns the nodes stored in lattice. ++ input lattice3 returns the best matching unit i.e. the node with - minimal distance to the given input vector. ,, lattice node+ will create new neighbours for every Leaf  neighbour of the given node and add the created nodes to lattice. I It will return the list of spawned nodes and the new lattice containing 0 every node created in the process of spawning. c>Used to spawn only a particular node. Returns the new lattice  containing the spawned node.  c lattice parent direction will create a new node as a  neighbour of parent at index  direction , making parent the neighbour  of the new node at index invert direction with the new node having an. -- lattice node growthThreshold" will check the accumulated error  of the node against the given growthThreshold and will do nothing if K the errror value is below the growth threshhold. Otherwise it will either G spawn new nodes or it will propagate the accumulated error value to it's F neighbours, depending on whether the node is a boundary node or not. 0 If new nodes are spawned they will be added to lattice and returned as - the second component of the resulting pair. dGenerates a new ' given the supply of weights with each node % having a weight vector of the given  dimension. e./ '()*+,-./ ')(+,-*./ '()*+,-./non-portable (requires STM) experimentalgnn.github@gmail.com'f@Parameters which have to be passed around between the functions  used to execute a phase. ghGrowth Threshold i%Learning Rate (Function of the step) jKernel Function k Growing Flag lRadius (Function of the step) mCurrent step number nwrapped Lattice 01DThe inverse time learning rate reduction function. Given an initial  learning rate of lr, a maximum number of steps of steps and the  current step number beeing step, the formula is:  &inverseAge lr step steps = lr * steps / (steps + 100 * step)2CThe linear learning rate reduction function. If you supply it with  the initial learning rate lr% it uses the following formula where  step) is the current step the phase is in and steps is the overall & number of steps the phase will take:  #linear lr step steps = lr * (1-step/steps)34Let s/ be the neighbourhood size currently in effect  and d> be the grid-distance of the current node to the winning node F then this kernel calculates a factor to control weight adaption with  the following formula:  gaussian s d = exp(d^2/(2*s^2))5/The bubble kernel is essentially the identity,  i.e. it has no effect. 67DThis datatype encapsulates all the parameters needed to be known to & run one phase of the GSOM algorithm. 89BThe number of passes which are to be made over the input vectors. E Since each step of the phase consumes one input vector, the overall . number of steps the phase will have will be:  steps = passes * length inputs:CThe initial size of the neighbourhood affected by weight adaption. : This will decrease linearly while the phase is executed. ;@The function used to calculate the learning rate in each of the  phase' s steps.  During each step  learninRate currentStep maxSteps is fed the > number (starting from zero) of the current step as the first C argument and the total number of steps the phase will have as the A second argument to calculate the learning rate which will be in  effect for this phase. <AThe kernel function. It is used in conjunction with the learning ! rate to adjust weight adaption.  ,kernel currentNeighbourhoodsize gridDistance should take the C neighbourhood size which is in effect during the current step and . a nodes grid distance from the winning node. J The neighbourhood size will be a real number due to the linear decrease. =The grow9 flag determines whether this is a growing phase or not.  If this is False" then no new nodes will be grown. >FThe spread factor is used to calculate the growth threshold according  to the formula:   GT = - sqrt(d)*ln(>)where d is the input dimension. ?>Returns the kernel function associated with the given kernel. @FReturns the learning rate adaption function associated with the given  type of learning rate. AA parameters inputs will update the given lattice by : executing one phase of the GSOM algorithm with the given inputs  and  parameters. oo confi i consumes the input i. pBESince a complete run of the GSOM algorithm means running a number of  6) this is usually the main function used.  run phases lattice inputs( runs the GSOM algorithm by running the  phases6 in the order specified, each time making passes over inputs  and using the produced '& to as an argument to the next phase.  The initial ', lattice may be constructed with the  ( and the ) functions. qr5The simplest kernel, which essentially does nothing.  It always evaluates to 1, thus having no effect on updating weights. *A gaussian kernel. The neighbourhood size s currently in effect , controls the bell width while the distance d of the current node = is used as the actual kernel argument. Thus the formula is:  gaussian s d = exp(d^2/(2*s^2))sCThe linear learning rate reduction function. If you supply it with  the initial learning rate lr% it uses the following formula where  step) is the current step the phase is in and steps is the overall & number of steps the phase will take:  #linear lr step steps = lr * (1-step/steps)tDThe inverse time learning rate reduction function. Given an initial  learning rate of lr, a maximum number of steps of steps and the  current step number beeing step, the formula is:  &inverseAge lr step steps = lr * steps / (steps + 100 * step)C>The default first phase is the only growing phase. It makes 5 A passes over the input, uses an initial learning rate of 0.1 and ) a starting neighbourhood size of 3. The > is set  to 0.1. D@The default for the second phase is a smoothing phase making 50 C passes over the input vectors with a learning rate of 0.05 and an D initial neighbourhood size of 2. Since there is no node growth the  > is ignored and thus set to 0. EEThe default for the third and last phase is a smoothing phase making C 50 passes over the input vectors with a learning rate of 0.01 and G an initial neighbourhood size of 1. Since there is no node growth the  > is ignored and thus set to 0. FDThis is the list of the three default phases of the GSOM algorithm. GBCalculates the growth threshold as explained in the documentation  for 87. 0123456789:;<=>?@ABCDEFG789:;<=>6CDEFGAB354?021@02112354456789:;<=>89:;<=>?@ABCDEFGnon-portable (requires STM) experimental#gnn -dot- code -at- gmail -dot- comu;The immutable configuration shared by every worker thread. vwxyz{|}~<A shared table used for bookkeeping purposes. It stores the +  nodes and the corresponding  points so that they can be > changed safely in between transactions, and retrieved later. HH n parameters inputs will update the given lattice by : executing one phase of the GSOM algorithm with the given inputs  and  parameters using n threads.  n config inputs lattice# will make one pass over the given  inputs spawning n' threads, adding the missing fields to config $ and modifying the wrapped lattice.  n action spawns n! worker threads doing action and  returns a 5 containing an integer which maintains acount of how . many of the spawned threads are still alive. The worker action.  queue lattice table repeatedly takes a  point from the queue and acts on it, modfying lattice and using  table" for storing and retrieving bmus.  q l t i consumes the input i, and then goes back to  work. IBSince a complete run of the GSOM algorithm means running a number  of 6* this is usually the main function used.  run n phases  lattice inputs( runs the GSOM algorithm by running the phases in 3 the order specified, each time making passes over inputs and using  the produced '( as an argument to the next phase. The  phases are run using n worker threads. The initial ',  lattice may be constructed with the ( and the  ) functions. HIHIHInon-portable (requires STM) experimentalgnn.github@gmail.com J?The final clustering which is the result of the GSOM algorithm  is a Data.Map mapping  to LKs. KBThe clusters generated by GSOM basically consist of three things: LMCthe vector which best represents all the vectors belonging to this  cluster. N<The indices of the input vectors belonging to this cluster. 5 That means a cluster is always relative to a set of  O the coordinates of this cluster P4Computes a clustering induced by the given lattice. P lattice uses the  of the * stored  in lattice to generate Qs and returns the J  storing these Qs. Each non leaf node n in lattice  corresponds to one Q c with (O c =   n) and with M c equal to the weight vector of n. Each  generated Q's N are empty. Use the Q A function with a set of inputs to obtain a clustering where each  LK's N' is a list of the indices of the input  points belonging to this Q. QQ inputs clustering clusters the given inputs according to  the centers of the clusters in  clustering. That means for each input i  from inputs the index of i) is added to the contents of the cluster  center to which i has minimal distance.  TODO: Implement tiebreaker. RR input clustering returns the cluster which has * the center with the smallest distance to input. SS c path expects to be given a J c  having 2 dimensional Ms and will call  if that's not 6 the case. On success it will save a python script to path.py. If ? this python script is run it will in turn save a PDF image to  path2.pdf. The image will contain the graph induced by c with each : node (cluster center) placed positioned according to the c's > center (weight vector). The python script will depend on the  networkx and  mathplotlib" python packages being installed. 9 I know that this is relatively clunky, but since I haven' t found a G better way of creating an image of a graph with known node positions,  this is the way I chose to go. T>Dumps the given input vectors to a string which can be fed to 4 gnuplot. Just write the string to a file and write plot "file" in  gnuplot. BAgain computes the best matching unit, only this time it is pure. G  !"#&'()*+./0123456789:;<=>ABCDEFJKLMNOPQRSTG   !"#&')(+*./TS789:;<=>6354021CDEFABKLMNOJQPR JKLMNOLMNOPQRST      !"#$%&'()*+,-./0123456789:;<=>>?@AB3CDEFGHIJKLFGMNNOPQRSTUVWXYZ[YZ\Y]^Y]_`ab`acdeYfghijklmnnopqrstuvwxyz{nnopq|}~kYvwY2 hsgsom-0.2.0%Data.Datamining.Clustering.Gsom.Input+Data.Datamining.Clustering.Gsom.Coordinates$Data.Datamining.Clustering.Gsom.Node'Data.Datamining.Clustering.Gsom.Lattice%Data.Datamining.Clustering.Gsom.Phase(Data.Datamining.Clustering.Gsom.ParallelData.Datamining.Clustering.GsomBoundsInputsInput normalizebounds unnormalize dimensiondistance.**.<+><-> Directions Direction Coordinates directions neighbourneighbourCoordinates Neighbourhood NeighboursNodesNodelocationquantizationErrorweights neighboursLeafnodeupdate updateError propagateisLeafisNode neighbourhoodunwrappedNeighbours boundaryNode newWeightputNodeLattice newRandom newCenterednodesbmugrowvent putLattice putWeights LearningRate InverseAgeLinearKernelGaussianBubblePhasesPhasepassesneighbourhoodSize learningRatekernel spreadFactorkernelFunctionadaptionphaserun defaultFirst defaultSecond defaultThirddefaultsgrowthThreshold ClusteringClustercentercontents coordinates clusteringclusternearestCluster renderScript dumpInputspadZipforcebaseGHC.Listheadtail GHC.Classesminmaxghc-primGHC.BoolTrueFalseinvertleftGHC.Realmodright checkBoundsmodifyspawnnewspecificElementsConfiggTlRkFgRrSsTwLconsume modifyTVarbubblegaussianlinear inverseAgecfGrowradiusstepqueuecfLtableTablepass GHC.Conc.SyncTVarworkcheckMinGHC.Errerror