Graphalyze-0.8.0.0: Graph-Theoretic Analysis library.Source codeContentsIndex
Data.Graph.Analysis.Algorithms.Clustering
MaintainerIvan.Miljenovic@gmail.com
Contents
Clustering Algorithms
Non-deterministic algorithms
Spatial Algorithms
Graph Collapsing
Description
Clustering and grouping algorithms that are graph-invariant and require no user intervention.
Synopsis
chineseWhispers :: (RandomGen g, Eq a, Eq b, DynGraph gr) => g -> gr a b -> gr (GenCluster a) b
relativeNeighbourhood :: (DynGraph gr, Eq a, Ord b) => Bool -> gr a b -> gr (GenCluster a) b
type CNodes a = [a]
collapseGraph :: (DynGraph gr, Eq b) => gr a b -> gr (CNodes a) b
collapseGraphBy :: DynGraph gr => [gr (CNodes a) b -> [NGroup]] -> gr a b -> gr (CNodes a) b
collapseGraphBy' :: DynGraph gr => [gr a b -> [(NGroup, a)]] -> gr a b -> gr a b
trivialCollapse :: Graph gr => gr (CNodes a) b -> Bool
Clustering Algorithms
Non-deterministic algorithms

The Chinese Whispers Algorithm. This is an adaptation of the algorithm described in:

Biemann, C. (2006): Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. Proceedings of the HLT-NAACL-06 Workshops on Textgraphs-06, New York, USA http://wortschatz.uni-leipzig.de/~cbiemann/pub/2006/BiemannTextGraph06.pdf

The adaptations to this algorithm are as follows:

  • Ignore any edge weightings that may exist, as we can't depend on them (also, we want the algorithm to be dependent solely upon the structure of the graph, not what it contains).
  • Explicitly shuffle the node order for each iteration.

Simplistically, the way it works is this:

1. Every node is assigned into its own unique cluster.

2. Sort the nodes into some random order. Each node joins the most popular cluster in its neighbourhood (where popularity is defined as the sum of the node weightings in that cluster).

3. Repeat step 2. until a fixed point is reached.

Note that this algorithm is non-deterministic, and that for some graphs no fixed point may be reached (and the algorithm may oscillate between a few different graph clusterings).

Chinese Whispers is O(number of edges).

chineseWhispers :: (RandomGen g, Eq a, Eq b, DynGraph gr) => g -> gr a b -> gr (GenCluster a) bSource
The actual Chinese Whispers algorithm.
Spatial Algorithms

This implements the algorithm called CLUSTER, from the paper:

Bandyopadhyay, S. (2003): An automatic shape independent clustering technique. Pattern Recognition, vol. 37, pp. 33-45.

Simplistically, it defines clusters as groups of nodes that are spatially located closer to each other than to nodes in other clusters. It utilises the concept of a /Relative Neighbour Graph/ [RNG] to determine the spatial structure of a set of two-dimensional data points.

The adaptations to this algorithm are as follows:

  • Due to the limitations of the BKTree data structure, we utilise a fuzzy distance function defined as the ceiling of the standard Euclidian distance.
  • We utilise toPosGraph to get the spatial locations. As such, these locations may not be optimal, especially for smaller graphs.
  • The actual algorithm is applied to each connected component of the graph. The actual paper is unclear what to do in this scenario, but Graphviz may locate nodes from separate components together, despite them not being related.

The algorithm is renamed relativeNeighbourhood. Experimentally, it seems to work better with larger graphs (i.e. more nodes), since then Graphviz makes the apparent clusters more obvious. The actual algorithm is O(n^2), where n is the number of Nodes in the graph.

relativeNeighbourhood :: (DynGraph gr, Eq a, Ord b) => Bool -> gr a b -> gr (GenCluster a) bSource
The renamed CLUSTER algorithm. Attempts to cluster a graph by using the spatial locations used by Graphviz.
Graph Collapsing

Collapse the parts of a graph down to try and show a compressed overview of the whole graph.

It may be possible to extend this to a clustering algorithm by collapsing low density regions into high density regions.

type CNodes a = [a]Source
A collapsed node contains a list of nodes that it represents.
collapseGraph :: (DynGraph gr, Eq b) => gr a b -> gr (CNodes a) bSource
Collapse the cliques, cycles and chains in the graph down. Note that this doesn't work too well on undirected graphs, since every pair of nodes forms a K_2 subgraph.
collapseGraphBy :: DynGraph gr => [gr (CNodes a) b -> [NGroup]] -> gr a b -> gr (CNodes a) bSource
Use the given functions to determine which nodes to collapse.
collapseGraphBy' :: DynGraph gr => [gr a b -> [(NGroup, a)]] -> gr a b -> gr a bSource
Use the given functions to determine which nodes to collapse, with a new label to represent the collapsed nodes.
trivialCollapse :: Graph gr => gr (CNodes a) b -> BoolSource
Return True if the collapsed graph is either a singleton node or else isomorphic to the original graph (i.e. not collapsed at all).
Produced by Haddock version 2.4.2