-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | A versatile library for topological data analysis. -- -- A topological data analysis library motivated by flexibility when it -- comes to the type of data being analyzed. If your data comes with a -- meaningful binary function into an ordered set, you can use -- Persistence to analyze your data. The library also provides functions -- for analyzing directed/undirected, weighted/unweighted graphs. See the -- README for resources on learning about topological data anlysis. @package Persistence @version 2.0.2 -- | This module provides functions for constructing neighborhood graphs, -- clique complexes, and Vietoris-Rips complexes, as well as the -- computation of Betti numbers. -- -- A simplicial complex is like a generalization of a graph, where you -- can have any number of n-dimensional simplices (line-segments, -- triangles, tetrahedrons), glued along their boundaries so that the -- intersection of any two simplices is a simplicial complex. These -- objects are fundamental to topological data analysis. -- -- An important thing to note about this module, and the library in -- general, is the difference between "fast" and "light" functions. Fast -- functions encode the metric in a 2D vector, so that distances don't -- need to be computed over and over again and can instead be accessed in -- constant time. Unfortunately, this takes O(n^2) memory so I would -- strongly recomend against using it for larger data sets unless you -- have a lot of RAM. Light functions do not use this speed -- optimization. -- -- The neighborhood graph of point cloud data is a graph where an edge -- exists between two data points if and only if the vertices fall within -- a given distance of each other. Graphs here are more of a helper data -- type for constructing filtrations, which is why they have a somewhat -- odd representation. Not to worry, though, I've supplied a way of -- encoding graphs of a more generic representation. -- -- The clique complex of a graph is a simplicial complex whose simplices -- are the complete subgraphs of the given graph. The Vietoris-Rips -- complex is the clique complex of the neighborhood graph. -- -- Betti numbers measure the number of n-dimensional holes in a given -- simplicial complex. The 0th Betti number is the number of connected -- components, or clusters; the 1st Betti number is the number of loops -- or tunnels; the 2nd Betti number measures voids or hollow volumes; and -- if you have high-dimensional data, you might have higher Betti numbers -- representing the number of high-dimensional holes. module Persistence.SimplicialComplex -- | This is type synonym exists to make other synonyms more concise. A -- simplex is represented as a pair: the vector of its vertices (their -- index in the original data set), and the vector of the indices of the -- faces in the next lowest dimension of the simplicial complex. -- 1-simplices are the exception: they do not store reference to their -- faces because it would be redundant with their vertices. type Simplex = (Vector Int, Vector Int) -- | The first component of the pair is the number of vertices. The second -- component is a vector whose entries are vectors of simplices of the -- same dimension. Index 0 of the vecor corresponds to dimension 1 -- because there is no need to store individual vertices. type SimplicialComplex = (Int, Vector (Vector Simplex)) -- | This represents the (symmetric) adjacency matrix of some weighted, -- undirected graph. The type a is whatever distance is in your data -- analysis procedure. The reason graphs are represented like this is -- because their main purpose is too speed up the construction of -- simplicial complexes and filtrations. If the clique complex of this -- graph were to be constructed, only the adjacencies would matter. But -- if a filtration is constructed all the distances will be required over -- and over again - this allows them to be accessed in constant time. type Graph a = Vector (Vector (a, Bool)) -- | Show all the information in a simplicial complex. sc2String :: SimplicialComplex -> String -- | Get the dimension of the highest dimensional simplex (constant time). getDim :: SimplicialComplex -> Int -- | Takes the number of vertices and each edge paired with its weight to -- output the graph encoded as a 2D vector. If only you have an -- unweighted graph, you can still encode your graph by simply letting -- the type a be (). If you have a weighted graph but there isn't a -- distance between every vertex, you can use the Extended type -- (essentially the same as Maybe) from the Filtration module which is -- already an instance of Ord. encodeWeightedGraph :: Int -> (Int -> Int -> (a, Bool)) -> Graph a -- | Given a weighted graph, construct a 1-dimensional simplicial complex -- in the canonical way. Betti numbers of this simplicial complex can be -- used to count cycles and connected components. wGraph2sc :: Graph a -> SimplicialComplex -- | Index into the adjacency matrix of a graph, flipping the indices if -- necessary. indexGraph :: Graph a -> (Int, Int) -> (a, Bool) -- | The first argument is a scale, the second is a metric, and the third -- is the data. This function records the distance between every element -- of the data and whether or not it is smaller than the given scale. makeNbrhdGraph :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> Graph a -- | Parallel version of the above. makeNbrhdGraphPar :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> Graph a -- | Makes a simplicial complex where the simplices are the complete -- subgraphs (cliques) of the given graph. Mainly a helper function for -- makeRipsComplexFast, but it might be useful if you happen to have a -- graph you want to analyze. I highly recomend using the parallel -- version, as this process is very expensive. makeCliqueComplex :: Graph a -> SimplicialComplex -- | Parallel version of the above. makeCliqueComplexPar :: Graph a -> SimplicialComplex -- | Constructs the Vietoris-Rips complex given a scale, metric, and data -- set. Also uses O(n^2) memory (where n is the number of data points) -- for a graph storing all the distances between data points. makeRipsComplexFast :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> (SimplicialComplex, Graph a) -- | Parallel version of the above. makeRipsComplexFastPar :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> (SimplicialComplex, Graph a) -- | Constructs the Vietoris-Rips complex given a scale, metric, and data -- set. makeRipsComplexLight :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> SimplicialComplex -- | Parallel version of the above. makeRipsComplexLightPar :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> SimplicialComplex -- | The nth index of the output corresponds to the nth Betti number. -- -- The zeroth Betti number is the number of connected components, the 1st -- is the number of tunnels/ punctures, the 2nd is the number of hollow -- volumes, and so on for higher-dimensional holes. bettiNumbers :: SimplicialComplex -> [Int] -- | Calculates all of the Betti numbers in parallel. bettiNumbersPar :: SimplicialComplex -> [Int] -- | This module allows one to do computations involving directed graphs. -- Namely, it allows you to convert a directed graph (presented in a -- generic way) into a simplicial complex and provides functions for -- constructing the "directed clique complex," see below. -- -- This module uses algorithms for admissible Hasse diagrams. A Hasse -- diagram is admissible if it is stratified and oriented. A diagram is -- stratified if all the vertices can be arranged in rows such that all -- the sources of each vertex are in the next highest row and all the -- targets are in the next lowest row. A diagram is oriented if every -- vertex has a linear ordering on its targets. -- -- A node in the diagram is represented as a tuple: the indices of the -- level 0 nodes in the diagram that are reachable from this node, the -- indices of targets in the next lowest level, and the indices of the -- sources in the next highest level. The entire diagram is simply an -- array of arrays representing each particular level; index 0 represents -- level 0, etc. -- -- Any directed graph can be encoded as an admissible Hasse diagram with -- 2 levels. The edges are level 1 and the vertices are level 0. The -- ordering on the targets of a node representing an edge is simply the -- terminal vertex first and the initial vertex second. This may be -- counterintuitive, but its helpful to interpret an arrow between two -- vertices as the "<" operator. This induces a linear ordering on the -- vertices of any acyclic complete subgraph - which is what the nodes in -- the Hasse diagram of the directed clique complex represent. -- -- Any oriented simplicial complex can also be encoded as an admissible -- Hasse diagram. A node is a simplex, the targets are the faces of the -- simplex, and the sources are simplices of which the given simplex is a -- face. -- -- The main feature of this module is an algorithm which takes the Hasse -- diagram of a directed graph and generates the Hasse diagram of the -- directed flag complex - the simplicial complex whose simplices are -- acyclic complete subgraphs of the given graph. Here acyclic refers to -- a directed graph without any sequence of arrows whose heads and tails -- match up and which has the same start and end vertex. -- -- The idea is that, if your directed graph represents any kind of -- information flow, "sub-modules" in the network are groups of nodes -- that simply take input, process it, and then output it without -- spinning the information around at all. These "sub-modules" are the -- directed cliques/flags which I've been referring to as acyclic -- complete subgraphs up to this point. Constructing a simplicial complex -- out of them will allow you to both simplify the 1-dimensional topology -- of the network and possibly detect higher-dimensional topological -- features. -- -- The algorithm for constructing the directed clique complex comes from -- this paper by Markram et al: -- https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full. module Persistence.HasseDiagram -- | Type representing a node in a Hasse diagram. Hasse diagrams are being -- used to represent simplicial complexes so each node represents a -- simplex. Contents of the tuple in order: Vector of references to -- vertices of the underlying directed graph, vector of references to the -- simplices faes in the next lowest level of the Hasse diagram, vector -- of references to "parent" simplices (simplices who have this simplex -- as a face) in the next highest level of the Hasse diagram. type Node = (Vector Int, Vector Int, Vector Int) -- | Type representing an admissible Hasse diagram. Each entry in the -- vector represents a level in the Hasse diagram. type HasseDiagram = Vector (Vector Node) -- | Simple printing function for Hasse diagrams. hsd2String :: HasseDiagram -> String -- | Given the number of vertices in a directed graph, and pairs -- representing the direction of each edge, construct a 1-dimensional -- simplicial complex in the canonical way. Betti numbers of this -- simplicial complex can be used to count cycles and connected -- components. dGraph2sc :: Int -> [(Int, Int)] -> SimplicialComplex -- | Given the number of vertices in a directed graph, and pairs -- representing the direction of each edge (initial, terminal), construct -- a Hasse diagram representing the graph. encodeDirectedGraph :: Int -> [(Int, Int)] -> HasseDiagram -- | Given a Hasse diagram representing a directed graph, construct the -- diagram representing the directed clique/flag complex of the graph. -- Algorithm adapted from the one shown in the supplementary materials of -- this paper: -- https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full directedFlagComplex :: HasseDiagram -> HasseDiagram -- | Convert a Hasse diagram to a simplicial complex. hDiagram2sc :: HasseDiagram -> SimplicialComplex -- | This module contains functions for constructing filtrations and -- computing persistent homology, persistence landscapes, and computing -- bottleneck distance between barcode diagrams. -- -- A filtration is a finite sequence of simplicial complexes where each -- complex is a subset of the next. This means that a filtration can be -- thought of as a single simplicial complex where each of the simplices -- is labeled with a "filtration index" that represents the index in the -- sequence where that simplex enters the filtration. -- -- One way to create a filtration, given a simplicial complex, a metric -- for the vertices, and a list of distances, is to loop through the -- distances from greatest to least: create a simplicial complex each -- iteration which excludes simplices that contain pairs of vertices -- which are further than the current distance apart. This method will -- produce a filtration of Vietoris-Rips complexes - each filtration -- index will correspond to a Rips complex whose scale is the -- corresponding distance. This filtration represents the topology of the -- data at each of the scales with which it was constructed. -- -- NOTE: It's important that, even though the smallest filtration index -- represents the smallest scale at which the data is being anaylzed, all -- functions in this library receive your list of scales sorted in -- *decreasing* order. -- -- An essential thing to note in this library is the distinction between -- "fast" and "light" functions. Light functions call the metric every -- time distance between two points is required, which is a lot. Fast -- functions store the distances between points and access them in -- constant time, BUT this means they use O(n^2) memory with respect to -- the number of data points, so it's a really bad idea to use this -- optimization on substantially large data if you don't have a lot of -- RAM. -- -- Persistent homology is the main event of topological data analysis. It -- allows one to identify clusters, tunnels, cavities, and higher -- dimensional holes that persist in the data throughout many scales. The -- output of the persistence algorithm is a barcode diagram. A single -- barcode represents the filtration index where a feature appears and -- the index where it disappears (if it does). Alternatively, a barcode -- can represent the scale at which a feature and the scale at which it -- ends. Thus, short barcodes are typically interpretted as sampling -- irregularities and long barcodes are interpretted as actual features -- of whatever the underlying data set represents. In this context, what -- a feature *is* depends on which dimension the barcode diagram is; -- 0-dimensional features are connected components, 1-dimensional -- features are loops or tunnels, 2-dimensional features are hollow -- volumes, and higher dimensional features correspond to -- heigher-dimensional cavities. -- -- After you've got the barcodes of a data set, you might want to compare -- it with that of a different data set. This is the purpose of -- bottleneck distance, which corresponds to the Hausdorff distance -- between barcode diagrams. -- -- Another way to compare barcode diagrams is by using persistence -- landscapes. The peristence landscape of a barcode diagram is a finite -- sequence of piecewise-linear, real-valued functions. This means they -- can be used to take averages and compute distances between barcode -- diagrams. See "A Persistence Landscapes Toolbox For Topological -- Statistics" by Bubenik and Dlotko for more information. -- -- WARNING: The persistence landscape functions have not been fully -- tested. Use them with caution. If you get any errors or unexpected -- output, please don't hesitate to email me. module Persistence.Filtration -- | This type synonym exists to make other synonyms more concise. Each -- simplex in a filtration is represented as a triple: its filtration -- index, the indices of its vertices in the original data, and the -- indices of its faces in the next lowest dimension. Edges do not have -- reference to their faces, as it would be redundant with their -- vertices. All simplices are sorted according to filtration index upon -- construction of the filtration. In each dimension, all simplices are -- sorted in increasing order of filtration index, and every simplices -- face indices are sorted in decreasing order; both of these facts are -- critical to the computation of persistent homology. type FilterSimplex = (Int, Vector Int, Vector Int) -- | A type representing a filtration whose vertices all have filtration -- index 0. Slightly faster and slightly less memory usage. The first -- component is simply the number of vertices. The second component is a -- vector with an entry for each dimension of simplices, starting at -- dimension 1 for edges. type SimpleFiltration = (Int, Vector (Vector FilterSimplex)) -- | Representation of a filtration which, unlike SimpleFiltration, can -- cope with vertices that have a non-zero filtration index. Vertices of -- the filtration are represented like all other simplices except that -- they don't their own have vertices or faces. -- -- Note that, since this library currently only deals with static -- pointcloud data, all of the filtration construction functions produce -- vertices whose filtration index is 0. Thus, if you want to use this -- type you will have to construct the instances yourself. type Filtration = Vector (Vector FilterSimplex) -- | A type extending the number line to positive and negative infinity. -- Used for representing infinite barcodes, bottleneck distance, and -- persistence landscapes. data Extended a Finite :: a -> Extended a Infinity :: Extended a MinusInfty :: Extended a -- | (x, Finite y) is a topological feature that appears at the index or -- scale x and disappears at the index or scale y. (x, Infinity) begins -- at x and doesn't disappear. type BarCode a = (a, Extended a) -- | A Persistence landscape is a certain type of piecewise linear function -- based on a barcode diagram. It can be represented as a list of -- critical points paired with critical values. Useful for taking -- averages and differences between barcode diagrams. type Landscape = Vector (Vector (Extended Double, Extended Double)) -- | Shows all the information in a simplex. sim2String :: FilterSimplex -> String -- | Shows all the information in a filtration. filtr2String :: Either SimpleFiltration Filtration -> String -- | Gets the simplicial complex specified by the filtration index. This is -- O(n) with respect to the number of simplices. getComplex :: Int -> Either SimpleFiltration Filtration -> SimplicialComplex -- | Return the dimension of the highest dimensional simplex in the -- filtration (constant time). getDimension :: Either SimpleFiltration Filtration -> Int -- | Convert a simple filtration into an ordinary filtration. simple2Filtr :: SimpleFiltration -> Filtration -- | This function creates a filtration out of a simplicial complex by -- removing simplices that contain edges that are too long for each scale -- in the list. This is really a helper function to be called by -- makeRipsFiltrationFast, but I decided to expose it in case you have a -- simplicial complex and weighted graph lying around. The scales MUST be -- in decreasing order. filterByWeightsFast :: Ord a => Either (Vector a) [a] -> (SimplicialComplex, Graph a) -> SimpleFiltration -- | This function constructs a filtration of the Vietoris-Rips complexes -- associated with the scales. Note that this a fast function, meaning it -- uses O(n^2) memory to quickly access distances where n is the number -- of data points. makeRipsFiltrationFast :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration -- | Same as above except it uses parallelism when computing the -- Vietoris-Rips complex of the largest scale. makeRipsFiltrationFastPar :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration -- | The same as filterbyWeightsFast except it uses far less memory at the -- cost of speed. Note that the scales must be in decreasing order. filterByWeightsLight :: Ord a => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimplicialComplex -> SimpleFiltration -- | Constructs the filtration of Vietoris-Rips complexes corresponding to -- each of the scales. makeRipsFiltrationLight :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration -- | Same as above except it uses parallelism when computing the -- Vietoris-Rips complex of the largest scale. makeRipsFiltrationLightPar :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration -- | The nth entry in the list will describe the n-dimensional topology of -- the filtration. That is, the first list will represent clusters, the -- second list will represent tunnels or punctures, the third will -- represent hollow volumes, and the nth index list will represent -- n-dimensional holes in the data. Features are encoded by the -- filtration indices where they appear and disappear. indexBarCodes :: Filtration -> Vector (Vector (BarCode Int)) -- | Same as above except this function acts on filtrations whose vertices -- all have filtration index zero (for a very slight speedup). indexBarCodesSimple :: SimpleFiltration -> Vector (Vector (BarCode Int)) -- | The nth entry in the list will describe the n-dimensional topology of -- the filtration. However, features are encoded by the scales where they -- appear and disappear. For consistency, scales must be in decreasing -- order. scaleBarCodes :: Either (Vector a) [a] -> Filtration -> Vector (Vector (BarCode a)) -- | Same as above except acts only on filtrations whose vertices all have -- filtration index 0. Note that scales must be in decreasing order. scaleBarCodesSimple :: Either (Vector a) [a] -> SimpleFiltration -> Vector (Vector (BarCode a)) -- | The standard (Euclidean) metric between index barcodes. The distance -- between infinite and finite barcodes is infinite, and the distance -- between two infinite barcodes is the absolute value of the difference -- of their fst component. indexMetric :: BarCode Int -> BarCode Int -> Extended Double -- | Given a metric, return the Hausdorff distance (referred to as -- bottleneck distance in TDA) between the two sets. Returns nothing if -- either list of barcodes is empty. bottleNeckDistance :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (BarCode a) -> Vector (BarCode a) -> Maybe (Extended b) -- | Get's all the bottleneck distances; a good way to determine the -- similarity of the topology of two filtrations. bottleNeckDistances :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (Vector (BarCode a)) -> Vector (Vector (BarCode a)) -> Vector (Maybe (Extended b)) -- | Compute the persistence landscape of the barcodes for a single -- dimension. calcLandscape :: Vector (BarCode Int) -> Landscape -- | Evaluate the nth function in the landscape for the given point. evalLandscape :: Landscape -> Int -> Extended Double -> Extended Double -- | Evaluate all the real-valued functions in the landscape. evalLandscapeAll :: Landscape -> Extended Double -> Vector (Extended Double) -- | Compute a linear combination of the landscapes. If the coefficient -- list is too short, the rest of the coefficients are assumed to be -- zero. If it is too long, the extra coefficients are discarded. linearComboLandscapes :: [Double] -> [Landscape] -> Landscape -- | Average the persistence landscapes. avgLandscapes :: [Landscape] -> Landscape -- | Subtract the second landscape from the first. diffLandscapes :: Landscape -> Landscape -> Landscape -- | If p>=1 then it will compute the L^p norm on the given interval. -- Uses trapezoidal approximation. You should ensure that the stepsize -- partitions the interval evenly. normLp :: Extended Double -> (Double, Double) -> Double -> Landscape -> Maybe Double -- | Given the same information as above, computes the L^p distance between -- the two landscapes. One way to compare the topologies of two -- filtrations. metricLp :: Extended Double -> (Double, Double) -> Double -> Landscape -> Landscape -> Maybe Double instance GHC.Show.Show a => GHC.Show.Show (Persistence.Filtration.Extended a) instance GHC.Classes.Eq a => GHC.Classes.Eq (Persistence.Filtration.Extended a) instance (GHC.Classes.Ord a, GHC.Classes.Eq a) => GHC.Classes.Ord (Persistence.Filtration.Extended a) instance GHC.Num.Num a => GHC.Num.Num (Persistence.Filtration.Extended a)