-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | Quickly detect clusters and holes in data. -- -- Persistence is a topological data analysis library motivated by -- flexibility when it comes to the type of data being analyzed. If you -- have data that comes with a meaningful function into something of the -- Ord typeclass, you can use Persistence to detect clusters and holes in -- the data. You can also use the library to analyze undirected graphs, -- and interesting features for directed graphs will be added soon. @package Persistence @version 1.0 -- | This module provides functions for constructing neighborhood graphs, -- clique complexes, Vietoris-Rips complexes, as well as the computation -- of Betti numbers. -- -- An important thing to know about this module is the difference between -- "fast" and "light" functions. Fast functions encode the metric in a 2D -- vector, so that distances don't need to be computed over and over -- again and can instead be accessed in constant time. Unfortunately, -- this takes O(n^2) memory so I would strongly recomend against using it -- for larger data sets; this is why "light" functions exist. -- -- A neighborhood graph is a graph where an edge exists between two -- vertices if and only if the vertices fall within a given distance of -- each other. Graphs here are more of a helper data type for -- constructing filtrations, which is why they have a somewhat odd -- representation. -- -- The clique complex of a graph is a simplicial complex whose simplices -- are the complete subgraphs of the given graph. The Vietoris-Rips -- complex is the clique complex of the neighborhood graph. -- -- Betti numbers measure the number of n-dimensional holes in a given -- simplicial complex. The 0th Betti number is the number of connected -- components, or clusters; the 1st Betti number is the number of loops -- or tunnels; then 2nd Betti number measures voids or hollow volumes; -- and if you have high-dimensional data, you might have higher Betti -- numbers representing the number of high-dimensional holes. module SimplicialComplex -- | The first component of the pair is the number of vertices. The second -- component is a vector whose entries are vectors of simplices of the -- same dimension. The dimension starts at 1 because there is no need to -- store individual vertices. A simplex is represented as a pair: the -- vector of its vertices in the original data set from which the complex -- was constructed, and the vector of the indices of the faces in the -- next lowest dimension. Edges are the exception - they do not store -- reference to their faces because it would be redundant with their -- vertices. type SimplicialComplex = (Int, Vector (Vector (Vector Int, Vector Int))) -- | This represents the (symmetric) adjacency matrix of some undirected -- graph. The type a is whatever distance is in your data -- analysis regime. The reason graphs are represented like this is -- because their main function is too speed up the construction of -- simplicial complexes and filtrations. If the clique complex of this -- graph were to be constructed, only the adjacencies would matter. But -- if a filtration is constructed all the distances will be required over -- and over again - this allows them to be accessed in constant time. type Graph a = Vector (Vector (a, Bool)) -- | Show all the information in a simplicial complex. sc2String :: SimplicialComplex -> String -- | Get the dimension of the highest dimensional simplex (constant time). getDim :: SimplicialComplex -> Int -- | The first argument is a scale, the second is a metric, and the third -- is the data. This function records the distance between every element -- of the data and whether or not it is smaller than the given scale. makeNbrhdGraph :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> Graph a -- | Makes a simplicial complex where the simplices are the complete -- subgraphs (cliques) of the given graph. Mainly a helper function for -- makeVRComplexFast, but it might be useful if you happen to have a -- graph you want to analyze. This utilizes any available processors in -- parallel because the construction is quite expensive. makeCliqueComplex :: Graph a -> SimplicialComplex -- | Constructs the Vietoris-Rips complex given a scale, metric, and data -- set. Also uses O(n^2) memory (where n is the number of data points) -- for a graph storing all the distances between data points. This -- utilizes any available processors in parallel because the construction -- is quite expensive. makeVRComplexFast :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> (SimplicialComplex, Graph a) -- | Constructs the Vietoris-Rips complex given a scale, metric, and data -- set. This utilizes any available processors in parallel because the -- construction is quite expensive. makeVRComplexLight :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> SimplicialComplex -- | The nth index of the output corresponds to the nth Betti number. The -- zeroth Betti number is the number of connected components, the 1st is -- the number of tunnels/punctures, the 2nd is the number of hollow -- volumes, and so on for higher-dimensional holes. bettiNumbers :: SimplicialComplex -> [Int] -- | Calculates all of the Betti numbers in parallel. bettiNumbersPar :: SimplicialComplex -> [Int] -- | This module contains functions for constructing filtrations and -- computing persistent homology, as well as a few utility functions for -- working with filtrations. -- -- A filtration is a finite sequence of simplicial complexes where each -- complex is a subset of the next. This means that a filtration can be -- thought of as a single simplicial complex where each of the simplices -- is labeled with a "filtration index" that represents the index in the -- sequence where that simplex enters the filtration. -- -- One way to create a filtration, given a simplicial complex, a metric -- for the vertices, and a list of distances, is to loop through the -- distances from greatest to least: create a simplicial complex each -- iteration which excludes simplices that contain pairs of vertices -- which are further than the current distance apart. This method will -- produce a filtration of Vietoris-Rips complexes - each filtration -- index will correspond to a VR complex whose scale is the corresponding -- distance. -- -- An essential thing to note about the way this library is set up is the -- distinction between "fast" and "light" functions. Light functions call -- the metric every time distance between two points is required, which -- is a lot. Fast functions store the distances between points and access -- them in constant time, BUT this means they use O(n^2) memory with -- respect to the number of data points, so it's a really bad idea to use -- this optimization on substantially large data. -- -- IMPORTANT NOTE: This library currently only handles filtrations where -- all the vertices have filtration index 0. Since I haven't thought of -- any way filtrations that don't respect this property could come up in -- applications, I have chosen to exclude them in favor of less memory -- usage and a slight speedup. -- -- Persistent homology is the main event of topological data analysis. It -- allows one to identify clusters, holes, and voids that persist in the -- data throughout many scales. The output of the persistence algorithm -- is a barcode diagram, which currently have a somewhat crude -- representation in this library. A single barcode represents the -- filtration index where a feature appears and the index where it -- disappears (if it does). Thus, short barcodes are typically -- interpretted as noise and long barcodes are interpretted as actual -- features. module Persistence -- | The first component is simply the number of vertices (all vertices are -- assumed to have filtration index 0). The second component is a vector -- with an entry for each dimension of simplices, starting at dimension 1 -- for edges. Each simplex is represented as a triple: its filtration -- index, the indices of its vertices in the original data, and the -- indices of its faces in the next lowest dimension. Edges do not have -- reference to their faces, as it would be redundant with their -- vertices. All simplices are sorted according to filtration index upon -- construction of the filtration. type Filtration = (Int, Vector (Vector (Int, Vector Int, Vector Int))) -- | (i, Just j) is a feature that appears at filtration index i and -- disappears at index j, (i, Nothing) begins at i and doesn't disappear. type BarCode = (Int, Maybe Int) -- | Shows all the information in a simplex. sim2String :: (Int, Vector Int, Vector Int) -> String -- | Shows all the information in a filtration. filtr2String :: Filtration -> String -- | Gets the simplicial complex specified by the filtration index. This is -- O(n) with respect to the number of simplices. getComplex :: Int -> Filtration -> SimplicialComplex -- | The first argument is a list of dimensions, the second argument is a -- list of filtration indices. The function returns the number of -- simplices in the filtration whose dimension and index exist in the -- respective lists. getNumSimplices :: [Int] -> [Int] -> Filtration -> Int -- | Return the dimension of the highest dimensional simplex in the -- filtration (constant time). getDimension :: Filtration -> Int -- | Given a list of scales, a simplicial complex, and a weighted graph -- (see SimplicialComplex) which encodes a metric on the vertices, this -- function creates a filtration out of a simplicial complex by removing -- simplices that contain edges that are too long for each scale in the -- list. Really, this is a helper function to be called by -- makeVRFiltrationFast, but I decided to expose it in case you have a -- simplicial complex and weighted graph lying around. The scales MUST be -- in decreasing order for this function. filterByWeightsFast :: Ord a => [a] -> (SimplicialComplex, Graph a) -> Filtration -- | Given a list of scales, a metric, and a data set, this function -- constructs a filtration of the Vietoris-Rips complexes associated with -- the scales. The scales MUST be in decreasing order. Note that this a -- fast function, meaning it uses O(n^2) memory to quickly access -- distances where n is the number of data points. makeVRFiltrationFast :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration -- | The same as filterbyWeightsFast except it uses far less memory at the -- cost of speed. Note that the scales must be in decreasing order. filterByWeightsLight :: Ord a => [a] -> (b -> b -> a) -> [b] -> SimplicialComplex -> Filtration -- | Given a list of scales in decreasing order, a metric, and a data set, -- this constructs the filtration of Vietoris-Rips complexes -- corresponding to the scales. makeVRFiltrationLight :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration -- | Each index in the list is a list of barcodes whose dimension -- corresponds to the index. That is, the first list will represent -- clusters, the second list will represent tunnels or punctures, the -- third will represent hollow volumes, and the nth index list will -- represent n-dimensional holes in the data. persistentHomology :: Filtration -> [[BarCode]]