-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | Quickly detect clusters and holes in data. -- -- Persistence is a topological data analysis library motivated by -- flexibility when it comes to the type of data being analyzed. If you -- have data that comes with a meaningful function into something of the -- Ord typeclass, you can use Persistence to detect clusters and holes in -- the data. You can also use the library to analyze the topology of -- directed/undirected weighted/unweighted graphs, and compare topologies -- of different data sets. @package Persistence @version 1.1 -- | This module provides functions for constructing neighborhood graphs, -- clique complexes, Vietoris-Rips complexes, as well as the computation -- of Betti numbers. -- -- An important thing to know about this module is the difference between -- "fast" and "light" functions. Fast functions encode the metric in a 2D -- vector, so that distances don't need to be computed over and over -- again and can instead be accessed in constant time. Unfortunately, -- this takes O(n^2) memory so I would strongly recomend against using it -- for larger data sets; this is why "light" functions exist. -- -- A neighborhood graph is a graph where an edge exists between two -- vertices if and only if the vertices fall within a given distance of -- each other. Graphs here are more of a helper data type for -- constructing filtrations, which is why they have a somewhat odd -- representation. Not to worry though, I've supplied a way of encoding -- graphs of a more generic representation. -- -- The clique complex of a graph is a simplicial complex whose simplices -- are the complete subgraphs of the given graph. The Vietoris-Rips -- complex is the clique complex of the neighborhood graph. -- -- Betti numbers measure the number of n-dimensional holes in a given -- simplicial complex. The 0th Betti number is the number of connected -- components, or clusters; the 1st Betti number is the number of loops -- or tunnels; the 2nd Betti number measures voids or hollow volumes; and -- if you have high-dimensional data, you might have higher Betti numbers -- representing the number of high-dimensional holes. module SimplicialComplex -- | The first component of the pair is the number of vertices. The second -- component is a vector whose entries are vectors of simplices of the -- same dimension. Index 0 of the vecor corresponds to dimension 1 -- because there is no need to store individual vertices. A simplex is -- represented as a pair: the vector of its vertices (represent by their -- index in the original data set), and the vector of the indices of the -- faces in the next lowest dimension. Edges are the exception to the -- last part - they do not store reference to their faces because it -- would be redundant with their vertices. type SimplicialComplex = (Int, Vector (Vector (Vector Int, Vector Int))) -- | This represents the (symmetric) adjacency matrix of some weighted -- undirected graph. The type a is whatever distance is in your -- data analysis regime. The reason graphs are represented like this is -- because their main function is too speed up the construction of -- simplicial complexes and filtrations. If the clique complex of this -- graph were to be constructed, only the adjacencies would matter. But -- if a filtration is constructed all the distances will be required over -- and over again - this allows them to be accessed in constant time. type Graph a = Vector (Vector (a, Bool)) -- | Show all the information in a simplicial complex. sc2String :: SimplicialComplex -> String -- | Get the dimension of the highest dimensional simplex (constant time). getDim :: SimplicialComplex -> Int -- | Takes the number of vertices and each edge paired with its weight to -- output the graph encoded as a 2D vector. If only you have an -- unweighted graph, you can still encode your graph by simply letting -- the type a be `()`. If you have a weighted graph but there -- isn't a distance between every vertex, you can use the -- Extended type (essentially the same as Maybe) from the -- Persistence module which is already an instance of Ord. encodeWeightedGraph :: Int -> (Int -> Int -> (a, Bool)) -> Graph a -- | The first argument is a scale, the second is a metric, and the third -- is the data. This function records the distance between every element -- of the data and whether or not it is smaller than the given scale. makeNbrhdGraph :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> Graph a -- | Makes a simplicial complex where the simplices are the complete -- subgraphs (cliques) of the given graph. Mainly a helper function for -- makeVRComplexFast, but it might be useful if you happen to have a -- graph you want to analyze. This utilizes any available processors in -- parallel because the construction is quite expensive. makeCliqueComplex :: Graph a -> SimplicialComplex -- | Constructs the Vietoris-Rips complex given a scale, metric, and data -- set. Also uses O(n^2) memory (where n is the number of data points) -- for a graph storing all the distances between data points. This -- utilizes any available processors in parallel because the construction -- is quite expensive. makeVRComplexFast :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> (SimplicialComplex, Graph a) -- | Constructs the Vietoris-Rips complex given a scale, metric, and data -- set. This utilizes any available processors in parallel because the -- construction is quite expensive. makeVRComplexLight :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> SimplicialComplex -- | The nth index of the output corresponds to the nth Betti number. The -- zeroth Betti number is the number of connected components, the 1st is -- the number of tunnels/punctures, the 2nd is the number of hollow -- volumes, and so on for higher-dimensional holes. bettiNumbers :: SimplicialComplex -> [Int] -- | Calculates all of the Betti numbers in parallel. bettiNumbersPar :: SimplicialComplex -> [Int] -- | This module contains functions for constructing filtrations and -- computing persistent homology, as well as a few utility functions for -- working with filtrations. -- -- A filtration is a finite sequence of simplicial complexes where each -- complex is a subset of the next. This means that a filtration can be -- thought of as a single simplicial complex where each of the simplices -- is labeled with a "filtration index" that represents the index in the -- sequence where that simplex enters the filtration. -- -- One way to create a filtration, given a simplicial complex, a metric -- for the vertices, and a list of distances, is to loop through the -- distances from greatest to least: create a simplicial complex each -- iteration which excludes simplices that contain pairs of vertices -- which are further than the current distance apart. This method will -- produce a filtration of Vietoris-Rips complexes - each filtration -- index will correspond to a VR complex whose scale is the corresponding -- distance. -- -- An essential thing to note about the way this library is set up is the -- distinction between "fast" and "light" functions. Light functions call -- the metric every time distance between two points is required, which -- is a lot. Fast functions store the distances between points and access -- them in constant time, BUT this means they use O(n^2) memory with -- respect to the number of data points, so it's a really bad idea to use -- this optimization on substantially large data. -- -- IMPORTANT NOTE: This library currently only handles filtrations where -- all the vertices have filtration index 0. Since I haven't thought of -- any way filtrations that don't respect this property could come up in -- applications, I have chosen to exclude them in favor of less memory -- usage and a slight speedup. -- -- Persistent homology is the main event of topological data analysis. It -- allows one to identify clusters, holes, and voids that persist in the -- data throughout many scales. The output of the persistence algorithm -- is a barcode diagram, which currently have a somewhat crude -- representation in this library. A single barcode represents the -- filtration index where a feature appears and the index where it -- disappears (if it does). Thus, short barcodes are typically -- interpretted as noise and long barcodes are interpretted as actual -- features. -- -- After you've got the persistent homology of a data set, you might want -- to compare it with that of a different data set. That's why this -- release includes two versions of "bottleneck distance," one works only -- if the number of features in each data set is the same and the other -- works regardless. module Persistence -- | The first component is simply the number of vertices (all vertices are -- assumed to have filtration index 0). The second component is a vector -- with an entry for each dimension of simplices, starting at dimension 1 -- for edges. Each simplex is represented as a triple: its filtration -- index, the indices of its vertices in the original data, and the -- indices of its faces in the next lowest dimension. Edges do not have -- reference to their faces, as it would be redundant with their -- vertices. All simplices are sorted according to filtration index upon -- construction of the filtration. type Filtration = (Int, Vector (Vector (Int, Vector Int, Vector Int))) -- | (i, Just j) is a feature that appears at filtration index i and -- disappears at index j, (i, Nothing) begins at i and doesn't disappear. type BarCode = (Int, Maybe Int) -- | Type for representing inifinite bottleneck distance. data Extended a -- | Shows all the information in a simplex. sim2String :: (Int, Vector Int, Vector Int) -> String -- | Shows all the information in a filtration. filtr2String :: Filtration -> String -- | Gets the simplicial complex specified by the filtration index. This is -- O(n) with respect to the number of simplices. getComplex :: Int -> Filtration -> SimplicialComplex -- | The first argument is a list of dimensions, the second argument is a -- list of filtration indices. The function returns the number of -- simplices in the filtration whose dimension and index exist in the -- respective lists. getNumSimplices :: [Int] -> [Int] -> Filtration -> Int -- | Return the dimension of the highest dimensional simplex in the -- filtration (constant time). getDimension :: Filtration -> Int -- | Given a list of scales, a simplicial complex, and a weighted graph -- (see SimplicialComplex) which encodes a metric on the vertices, this -- function creates a filtration out of a simplicial complex by removing -- simplices that contain edges that are too long for each scale in the -- list. This is really a helper function to be called by -- makeVRFiltrationFast, but I decided to expose it in case you have a -- simplicial complex and weighted graph lying around. The scales MUST be -- in decreasing order for this function. filterByWeightsFast :: Ord a => [a] -> (SimplicialComplex, Graph a) -> Filtration -- | Given a list of scales, a metric, and a data set, this function -- constructs a filtration of the Vietoris-Rips complexes associated with -- the scales. The scales MUST be in decreasing order. Note that this a -- fast function, meaning it uses O(n^2) memory to quickly access -- distances where n is the number of data points. makeVRFiltrationFast :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration -- | The same as filterbyWeightsFast except it uses far less memory at the -- cost of speed. Note that the scales must be in decreasing order. filterByWeightsLight :: Ord a => [a] -> (b -> b -> a) -> [b] -> SimplicialComplex -> Filtration -- | Given a list of scales in decreasing order, a metric, and a data set, -- this constructs the filtration of Vietoris-Rips complexes -- corresponding to the scales. makeVRFiltrationLight :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration -- | Each index in the list is a list of barcodes whose dimension -- corresponds to the index. That is, the first list will represent -- clusters, the second list will represent tunnels or punctures, the -- third will represent hollow volumes, and the nth index list will -- represent n-dimensional holes in the data. persistentHomology :: Filtration -> [[BarCode]] -- | Return the maximum of minimum distances bewteen the bar codes. It's -- important to note that the function isn't "unsafe" in the sense that -- it will throw an exception, it will just give you a distance -- regardless of whether or not there is the same number of barcodes is -- in each list. bottelNeckDistance :: [BarCode] -> [BarCode] -> Extended Double -- | Get's all the bottle neck distances; a good way to determine the -- similarity of the topology of two filtrations. bottelNeckDistances :: [[BarCode]] -> [[BarCode]] -> [Extended Double] -- | If the number of barcodes is the same, return the maximum of minimum -- distances bewteen the bar codes. Otherwise return nothing. safeBottelNeckDistance :: [BarCode] -> [BarCode] -> Maybe (Extended Double) -- | Safely get all the bottle neck distances; a good way to determine the -- similarity of the topology of two filtrations. safeBottelNeckDistances :: [[BarCode]] -> [[BarCode]] -> [Maybe (Extended Double)] instance GHC.Classes.Eq a => GHC.Classes.Eq (Persistence.Extended a) instance (GHC.Classes.Ord a, GHC.Classes.Eq a) => GHC.Classes.Ord (Persistence.Extended a) -- | This module implements algorithms for admissible Hasse diagrams. A -- Hasse diagram is admissible if it is stratified and oriented. A -- diagram is stratified if all the vertices can be arranged in rows such -- that all the sources of each vertex are in the next highest row and -- all the targets are in the next lowest row. A diagram is oriented if -- every vertex has a linear ordering on its targets. -- -- A node in the diagram is represented as a tuple: the indices of the -- level 0 nodes in the diagram that are reachable from this node, the -- indices of targets in the next lowest level, and the indices of the -- sources in the next highest level. The entire diagram is simply an -- array where each entry is an array representing a particular level; -- index 0 represents level 0, etc. -- -- Any directed graph can be encoded as an admissible Hasse diagram with -- 2 levels. The edges are level 1 and the vertices are level 0. The -- ordering on the targets of a node representing an edge is simply the -- terminal vertex first and the initial vertex second. This may be -- counterintuitive, but its helpful to interpret an arrow between two -- vertices as the "<" operator. This induces a linear ordering on the -- vertices of any acyclic complete subgraph - which is what the nodes in -- the Hasse diagram of the directed clique complex represent. -- -- Any oriented simplicial complex can also be encoded as an admissible -- Hasse diagram. A node is a simplex, the targets are the faces of the -- simplex, and the sources are simplices of which the given simplex is a -- face. -- -- The main feature of this module is an algorithm which takes the Hasse -- diagram of a directed graph and generates the Hasse diagram of the -- directed flag complex - the simplicial complex whose simplices are -- acyclic complete subgraphs of the given graph. Here acyclic refers to -- a directed graph without any sequence of arrows whose heads and tails -- match up an which has the same start and end vertex. -- -- The idea is that, if your directed graph represents any kind of -- information flow, "sub-modules" in the network that simply take input, -- process it using its nodes, and then output it without spinning the -- information around at all. These "sub-modules" are the directed -- cliques/flags which I've been referring to as acyclic complete -- subgraphs up to this point. Constructing a simplicial complex out of -- them will allow you to both simplify the 1-dimensional topology of the -- network and possibly detect higher-dimensional topological features. module HasseDiagram -- | Type representing a node in a Hasse diagram. Hasse diagrams are being -- used to represent simplicial complexes so each node represents a -- simplex Contents of the tuple in order: Vector of references to -- vertices of the underlying directed graph, vector of references to the -- simplices faes in the next lowest level of the Hasse diagram, vector -- of references to "parent" simplices (simplices who have this simplex -- as a face) in the next highest level of the Hasse diagram. type Node = (Vector Int, Vector Int, Vector Int) -- | Type representing an admissible Hasse diagram. Each entry in the -- vector represents a level in the Hasse diagram. type HasseDiagram = Vector (Vector Node) -- | Simple printing function for Hasse diagrams. hsd2String :: HasseDiagram -> String -- | Given the number of vertices in a directed graph, and pairs -- representing the direction of each edge (initial, terminal), construct -- a Hasse diagram representing the graph. encodeDirectedGraph :: Int -> [(Int, Int)] -> HasseDiagram -- | Given a Hasse diagram representing a directed graph, construct the -- diagram representing the directed clique/flag complex of the graph. directedFlagComplex :: HasseDiagram -> HasseDiagram -- | Convert a Hasse diagram to a simplicial complex. toSimplicialComplex :: HasseDiagram -> SimplicialComplex