-- Hoogle documentation, generated by Haddock
-- See Hoogle, http://www.haskell.org/hoogle/
-- | A versatile library for topological data anlysis.
--
-- A topological data anlysis library motivated by flexibility when it
-- comes to the type of data being analyzed. If your data comes with a
-- meaningful binary function into into an ordered set, you can use
-- Persistence to analyze your data. The library also provides functions
-- for analyzing directed/undirected, weighted/unweighted graphs. See the
-- README for resources on learning about topological data anlysis.
@package Persistence
@version 2.0
-- | This module provides functions for constructing neighborhood graphs,
-- clique complexes, and Vietoris-Rips complexes, as well as the
-- computation of Betti numbers.
--
-- A simplicial complex is like a generalization of a graph, where you
-- can have any number of n-dimensional simplices (line-segments,
-- triangles, tetrahedrons), glued along their boundaries so that the
-- intersection of any two simplices is a simplicial complex. These
-- objects are fundamental to topological data analysis.
--
-- An important thing to note about this module, and the library in
-- general, is the difference between "fast" and "light" functions. Fast
-- functions encode the metric in a 2D vector, so that distances don't
-- need to be computed over and over again and can instead be accessed in
-- constant time. Unfortunately, this takes O(n^2) memory so I would
-- strongly recomend against using it for larger data sets unless you
-- have a lot of RAM. Light functions do not use this speed
-- optimization.
--
-- The neighborhood graph of point cloud data is a graph where an edge
-- exists between two data points if and only if the vertices fall within
-- a given distance of each other. Graphs here are more of a helper data
-- type for constructing filtrations, which is why they have a somewhat
-- odd representation. Not to worry, though, I've supplied a way of
-- encoding graphs of a more generic representation.
--
-- The clique complex of a graph is a simplicial complex whose simplices
-- are the complete subgraphs of the given graph. The Vietoris-Rips
-- complex is the clique complex of the neighborhood graph.
--
-- Betti numbers measure the number of n-dimensional holes in a given
-- simplicial complex. The 0th Betti number is the number of connected
-- components, or clusters; the 1st Betti number is the number of loops
-- or tunnels; the 2nd Betti number measures voids or hollow volumes; and
-- if you have high-dimensional data, you might have higher Betti numbers
-- representing the number of high-dimensional holes.
module Persistence.SimplicialComplex
-- | This is type synonym exists to make other synonyms more concise. A
-- simplex is represented as a pair: the vector of its vertices (their
-- index in the original data set), and the vector of the indices of the
-- faces in the next lowest dimension of the simplicial complex.
-- 1-simplices are the exception: they do not store reference to their
-- faces because it would be redundant with their vertices.
type Simplex = (Vector Int, Vector Int)
-- | The first component of the pair is the number of vertices. The second
-- component is a vector whose entries are vectors of simplices of the
-- same dimension. Index 0 of the vecor corresponds to dimension 1
-- because there is no need to store individual vertices.
type SimplicialComplex = (Int, Vector (Vector Simplex))
-- | This represents the (symmetric) adjacency matrix of some weighted,
-- undirected graph. The type a is whatever distance is in your data
-- analysis procedure. The reason graphs are represented like this is
-- because their main purpose is too speed up the construction of
-- simplicial complexes and filtrations. If the clique complex of this
-- graph were to be constructed, only the adjacencies would matter. But
-- if a filtration is constructed all the distances will be required over
-- and over again - this allows them to be accessed in constant time.
type Graph a = Vector (Vector (a, Bool))
-- | Show all the information in a simplicial complex.
sc2String :: SimplicialComplex -> String
-- | Get the dimension of the highest dimensional simplex (constant time).
getDim :: SimplicialComplex -> Int
-- | Takes the number of vertices and each edge paired with its weight to
-- output the graph encoded as a 2D vector. If only you have an
-- unweighted graph, you can still encode your graph by simply letting
-- the type a be (). If you have a weighted graph but there isn't a
-- distance between every vertex, you can use the Extended type
-- (essentially the same as Maybe) from the Filtration module which is
-- already an instance of Ord.
encodeWeightedGraph :: Int -> (Int -> Int -> (a, Bool)) -> Graph a
-- | Given a weighted graph, construct a 1-dimensional simplicial complex
-- in the canonical way. Betti numbers of this simplicial complex can be
-- used to count cycles and connected components.
wGraph2sc :: Graph a -> SimplicialComplex
-- | The first argument is a scale, the second is a metric, and the third
-- is the data. This function records the distance between every element
-- of the data and whether or not it is smaller than the given scale.
makeNbrhdGraph :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> Graph a
-- | Parallel version of the above.
makeNbrhdGraphPar :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> Graph a
-- | Makes a simplicial complex where the simplices are the complete
-- subgraphs (cliques) of the given graph. Mainly a helper function for
-- makeRipsComplexFast, but it might be useful if you happen to have a
-- graph you want to analyze. I highly recomend using the parallel
-- version, as this process is very expensive.
makeCliqueComplex :: Graph a -> SimplicialComplex
-- | Parallel version of the above.
makeCliqueComplexPar :: Graph a -> SimplicialComplex
-- | Constructs the Vietoris-Rips complex given a scale, metric, and data
-- set. Also uses O(n^2) memory (where n is the number of data points)
-- for a graph storing all the distances between data points.
makeRipsComplexFast :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> (SimplicialComplex, Graph a)
-- | Parallel version of the above.
makeRipsComplexFastPar :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> (SimplicialComplex, Graph a)
-- | Constructs the Vietoris-Rips complex given a scale, metric, and data
-- set.
makeRipsComplexLight :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> SimplicialComplex
-- | Parallel version of the above.
makeRipsComplexLightPar :: (Ord a, Eq b) => a -> (b -> b -> a) -> Either (Vector b) [b] -> SimplicialComplex
-- | The nth index of the output corresponds to the nth Betti number.
--
-- The zeroth Betti number is the number of connected components, the 1st
-- is the number of tunnels/ punctures, the 2nd is the number of hollow
-- volumes, and so on for higher-dimensional holes.
bettiNumbers :: SimplicialComplex -> [Int]
-- | Calculates all of the Betti numbers in parallel.
bettiNumbersPar :: SimplicialComplex -> [Int]
-- | This module allows one to do computations involving directed graphs.
-- Namely, it allows you to convert a directed graph (presented in a
-- generic way) into a simplicial complex and provides functions for
-- constructing the "directed clique complex," see below.
--
-- This module uses algorithms for admissible Hasse diagrams. A Hasse
-- diagram is admissible if it is stratified and oriented. A diagram is
-- stratified if all the vertices can be arranged in rows such that all
-- the sources of each vertex are in the next highest row and all the
-- targets are in the next lowest row. A diagram is oriented if every
-- vertex has a linear ordering on its targets.
--
-- A node in the diagram is represented as a tuple: the indices of the
-- level 0 nodes in the diagram that are reachable from this node, the
-- indices of targets in the next lowest level, and the indices of the
-- sources in the next highest level. The entire diagram is simply an
-- array of arrays representing each particular level; index 0 represents
-- level 0, etc.
--
-- Any directed graph can be encoded as an admissible Hasse diagram with
-- 2 levels. The edges are level 1 and the vertices are level 0. The
-- ordering on the targets of a node representing an edge is simply the
-- terminal vertex first and the initial vertex second. This may be
-- counterintuitive, but its helpful to interpret an arrow between two
-- vertices as the "<" operator. This induces a linear ordering on the
-- vertices of any acyclic complete subgraph - which is what the nodes in
-- the Hasse diagram of the directed clique complex represent.
--
-- Any oriented simplicial complex can also be encoded as an admissible
-- Hasse diagram. A node is a simplex, the targets are the faces of the
-- simplex, and the sources are simplices of which the given simplex is a
-- face.
--
-- The main feature of this module is an algorithm which takes the Hasse
-- diagram of a directed graph and generates the Hasse diagram of the
-- directed flag complex - the simplicial complex whose simplices are
-- acyclic complete subgraphs of the given graph. Here acyclic refers to
-- a directed graph without any sequence of arrows whose heads and tails
-- match up and which has the same start and end vertex.
--
-- The idea is that, if your directed graph represents any kind of
-- information flow, "sub-modules" in the network are groups of nodes
-- that simply take input, process it, and then output it without
-- spinning the information around at all. These "sub-modules" are the
-- directed cliques/flags which I've been referring to as acyclic
-- complete subgraphs up to this point. Constructing a simplicial complex
-- out of them will allow you to both simplify the 1-dimensional topology
-- of the network and possibly detect higher-dimensional topological
-- features.
--
-- The algorithm for constructing the directed clique complex comes from
-- this paper by Markram et al:
-- https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full.
module Persistence.HasseDiagram
-- | Type representing a node in a Hasse diagram. Hasse diagrams are being
-- used to represent simplicial complexes so each node represents a
-- simplex. Contents of the tuple in order: Vector of references to
-- vertices of the underlying directed graph, vector of references to the
-- simplices faes in the next lowest level of the Hasse diagram, vector
-- of references to "parent" simplices (simplices who have this simplex
-- as a face) in the next highest level of the Hasse diagram.
type Node = (Vector Int, Vector Int, Vector Int)
-- | Type representing an admissible Hasse diagram. Each entry in the
-- vector represents a level in the Hasse diagram.
type HasseDiagram = Vector (Vector Node)
-- | Simple printing function for Hasse diagrams.
hsd2String :: HasseDiagram -> String
-- | Given the number of vertices in a directed graph, and pairs
-- representing the direction of each edge, construct a 1-dimensional
-- simplicial complex in the canonical way. Betti numbers of this
-- simplicial complex can be used to count cycles and connected
-- components.
dGraph2sc :: Int -> [(Int, Int)] -> SimplicialComplex
-- | Given the number of vertices in a directed graph, and pairs
-- representing the direction of each edge (initial, terminal), construct
-- a Hasse diagram representing the graph.
encodeDirectedGraph :: Int -> [(Int, Int)] -> HasseDiagram
-- | Given a Hasse diagram representing a directed graph, construct the
-- diagram representing the directed clique/flag complex of the graph.
-- Algorithm adapted from the one shown in the supplementary materials of
-- this paper:
-- https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full
directedFlagComplex :: HasseDiagram -> HasseDiagram
-- | Convert a Hasse diagram to a simplicial complex.
hDiagram2sc :: HasseDiagram -> SimplicialComplex
-- | This module contains functions for constructing filtrations and
-- computing persistent homology, persistence landscapes, and computing
-- bottleneck distance between barcode diagrams.
--
-- A filtration is a finite sequence of simplicial complexes where each
-- complex is a subset of the next. This means that a filtration can be
-- thought of as a single simplicial complex where each of the simplices
-- is labeled with a "filtration index" that represents the index in the
-- sequence where that simplex enters the filtration.
--
-- One way to create a filtration, given a simplicial complex, a metric
-- for the vertices, and a list of distances, is to loop through the
-- distances from greatest to least: create a simplicial complex each
-- iteration which excludes simplices that contain pairs of vertices
-- which are further than the current distance apart. This method will
-- produce a filtration of Vietoris-Rips complexes - each filtration
-- index will correspond to a Rips complex whose scale is the
-- corresponding distance. This filtration represents the topology of the
-- data at each of the scales with which it was constructed.
--
-- NOTE: It's important that, even though the smallest filtration index
-- represents the smallest scale at which the data is being anaylzed, all
-- functions in this library receive your list of scales sorted in
-- *decreasing* order.
--
-- An essential thing to note in this library is the distinction between
-- "fast" and "light" functions. Light functions call the metric every
-- time distance between two points is required, which is a lot. Fast
-- functions store the distances between points and access them in
-- constant time, BUT this means they use O(n^2) memory with respect to
-- the number of data points, so it's a really bad idea to use this
-- optimization on substantially large data if you don't have a lot of
-- RAM.
--
-- Persistent homology is the main event of topological data analysis. It
-- allows one to identify clusters, tunnels, cavities, and higher
-- dimensional holes that persist in the data throughout many scales. The
-- output of the persistence algorithm is a barcode diagram. A single
-- barcode represents the filtration index where a feature appears and
-- the index where it disappears (if it does). Alternatively, a barcode
-- can represent the scale at which a feature and the scale at which it
-- ends. Thus, short barcodes are typically interpretted as sampling
-- irregularities and long barcodes are interpretted as actual features
-- of whatever the underlying data set represents. In this context, what
-- a feature *is* depends on which dimension the barcode diagram is;
-- 0-dimensional features are connected components, 1-dimensional
-- features are loops or tunnels, 2-dimensional features are hollow
-- volumes, and higher dimensional features correspond to
-- heigher-dimensional cavities.
--
-- After you've got the barcodes of a data set, you might want to compare
-- it with that of a different data set. This is the purpose of
-- bottleneck distance, which corresponds to the Hausdorff distance
-- between barcode diagrams.
--
-- Another way to compare barcode diagrams is by using persistence
-- landscapes. The peristence landscape of a barcode diagram is a finite
-- sequence of piecewise-linear, real-valued functions. This means they
-- can be used to take averages and compute distances between barcode
-- diagrams. See "A Persistence Landscapes Toolbox For Topological
-- Statistics" by Bubenik and Dlotko for more information.
--
-- WARNING: The persistence landscape functions have not been fully
-- tested. Use them with caution. If you get any errors or unexpected
-- output, please don't hesitate to email me.
module Persistence.Filtration
-- | This type synonym exists to make other synonyms more concise. Each
-- simplex in a filtration is represented as a triple: its filtration
-- index, the indices of its vertices in the original data, and the
-- indices of its faces in the next lowest dimension. Edges do not have
-- reference to their faces, as it would be redundant with their
-- vertices. All simplices are sorted according to filtration index upon
-- construction of the filtration. In each dimension, all simplices are
-- sorted in increasing order of filtration index, and every simplices
-- face indices are sorted in decreasing order; both of these facts are
-- critical to the computation of persistent homology.
type FilterSimplex = (Int, Vector Int, Vector Int)
-- | A type representing a filtration whose vertices all have filtration
-- index 0. Slightly faster and slightly less memory usage. The first
-- component is simply the number of vertices. The second component is a
-- vector with an entry for each dimension of simplices, starting at
-- dimension 1 for edges.
type SimpleFiltration = (Int, Vector (Vector FilterSimplex))
-- | Representation of a filtration which, unlike SimpleFiltration, can
-- cope with vertices that have a non-zero filtration index. Vertices of
-- the filtration are represented like all other simplices except that
-- they don't their own have vertices or faces.
--
-- Note that, since this library currently only deals with static
-- pointcloud data, all of the filtration construction functions produce
-- vertices whose filtration index is 0. Thus, if you want to use this
-- type you will have to construct the instances yourself.
type Filtration = Vector (Vector FilterSimplex)
-- | A type extending the number line to positive and negative infinity.
-- Used for representing infinite barcodes, bottleneck distance, and
-- persistence landscapes.
data Extended a
Finite :: a -> Extended a
Infinity :: Extended a
MinusInfty :: Extended a
-- | (x, Finite y) is a topological feature that appears at the index or
-- scale x and disappears at the index or scale y. (x, Infinity) begins
-- at x and doesn't disappear.
type BarCode a = (a, Extended a)
-- | A Persistence landscape is a certain type of piecewise linear function
-- based on a barcode diagram. It can be represented as a list of
-- critical points paired with critical values. Useful for taking
-- averages and differences between barcode diagrams.
type Landscape = Vector (Vector (Extended Double, Extended Double))
-- | Shows all the information in a simplex.
sim2String :: FilterSimplex -> String
-- | Shows all the information in a filtration.
filtr2String :: Either SimpleFiltration Filtration -> String
-- | Gets the simplicial complex specified by the filtration index. This is
-- O(n) with respect to the number of simplices.
getComplex :: Int -> Either SimpleFiltration Filtration -> SimplicialComplex
-- | Return the dimension of the highest dimensional simplex in the
-- filtration (constant time).
getDimension :: Either SimpleFiltration Filtration -> Int
-- | Convert a simple filtration into an ordinary filtration.
simple2Filtr :: SimpleFiltration -> Filtration
-- | This function creates a filtration out of a simplicial complex by
-- removing simplices that contain edges that are too long for each scale
-- in the list. This is really a helper function to be called by
-- makeRipsFiltrationFast, but I decided to expose it in case you have a
-- simplicial complex and weighted graph lying around. The scales MUST be
-- in decreasing order.
filterByWeightsFast :: Ord a => Either (Vector a) [a] -> (SimplicialComplex, Graph a) -> SimpleFiltration
-- | This function constructs a filtration of the Vietoris-Rips complexes
-- associated with the scales. Note that this a fast function, meaning it
-- uses O(n^2) memory to quickly access distances where n is the number
-- of data points.
makeRipsFiltrationFast :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
-- | Same as above except it uses parallelism when computing the
-- Vietoris-Rips complex of the largest scale.
makeRipsFiltrationFastPar :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
-- | The same as filterbyWeightsFast except it uses far less memory at the
-- cost of speed. Note that the scales must be in decreasing order.
filterByWeightsLight :: Ord a => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimplicialComplex -> SimpleFiltration
-- | Constructs the filtration of Vietoris-Rips complexes corresponding to
-- each of the scales.
makeRipsFiltrationLight :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
-- | Same as above except it uses parallelism when computing the
-- Vietoris-Rips complex of the largest scale.
makeRipsFiltrationLightPar :: (Ord a, Eq b) => Either (Vector a) [a] -> (b -> b -> a) -> Either (Vector b) [b] -> SimpleFiltration
-- | The nth entry in the list will describe the n-dimensional topology of
-- the filtration. That is, the first list will represent clusters, the
-- second list will represent tunnels or punctures, the third will
-- represent hollow volumes, and the nth index list will represent
-- n-dimensional holes in the data. Features are encoded by the
-- filtration indices where they appear and disappear.
indexBarCodes :: Filtration -> Vector (Vector (BarCode Int))
-- | Same as above except this function acts on filtrations whose vertices
-- all have filtration index zero (for a very slight speedup).
indexBarCodesSimple :: SimpleFiltration -> Vector (Vector (BarCode Int))
-- | The nth entry in the list will describe the n-dimensional topology of
-- the filtration. However, features are encoded by the scales where they
-- appear and disappear. For consistency, scales must be in decreasing
-- order.
scaleBarCodes :: Either (Vector a) [a] -> Filtration -> Vector (Vector (BarCode a))
-- | Same as above except acts only on filtrations whose vertices all have
-- filtration index 0. Note that scales must be in decreasing order.
scaleBarCodesSimple :: Either (Vector a) [a] -> SimpleFiltration -> Vector (Vector (BarCode a))
-- | The standard (Euclidean) metric between index barcodes. The distance
-- between infinite and finite barcodes is infinite, and the distance
-- between two infinite barcodes is the absolute value of the difference
-- of their fst component.
indexMetric :: BarCode Int -> BarCode Int -> Extended Double
-- | Given a metric, return the Hausdorff distance (referred to as
-- bottleneck distance in TDA) between the two sets. Returns nothing if
-- either list of barcodes is empty.
bottleNeckDistance :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (BarCode a) -> Vector (BarCode a) -> Maybe (Extended b)
-- | Get's all the bottleneck distances; a good way to determine the
-- similarity of the topology of two filtrations.
bottleNeckDistances :: Ord b => (BarCode a -> BarCode a -> Extended b) -> Vector (Vector (BarCode a)) -> Vector (Vector (BarCode a)) -> Vector (Maybe (Extended b))
-- | Compute the persistence landscape of the barcodes for a single
-- dimension.
calcLandscape :: Vector (BarCode Int) -> Landscape
-- | Evaluate the nth function in the landscape for the given point.
evalLandscape :: Landscape -> Int -> Extended Double -> Extended Double
-- | Evaluate all the real-valued functions in the landscape.
evalLandscapeAll :: Landscape -> Extended Double -> Vector (Extended Double)
-- | Compute a linear combination of the landscapes. If the coefficient
-- list is too short, the rest of the coefficients are assumed to be
-- zero. If it is too long, the extra coefficients are discarded.
linearComboLandscapes :: [Double] -> [Landscape] -> Landscape
-- | Average the persistence landscapes.
avgLandscapes :: [Landscape] -> Landscape
-- | Subtract the second landscape from the first.
diffLandscapes :: Landscape -> Landscape -> Landscape
-- | If p>=1 then it will compute the L^p norm on the given interval.
-- Uses trapezoidal approximation. You should ensure that the stepsize
-- partitions the interval evenly.
normLp :: Extended Double -> (Double, Double) -> Double -> Landscape -> Maybe Double
-- | Given the same information as above, computes the L^p distance between
-- the two landscapes. One way to compare the topologies of two
-- filtrations.
metricLp :: Extended Double -> (Double, Double) -> Double -> Landscape -> Landscape -> Maybe Double
instance GHC.Show.Show a => GHC.Show.Show (Persistence.Filtration.Extended a)
instance GHC.Classes.Eq a => GHC.Classes.Eq (Persistence.Filtration.Extended a)
instance (GHC.Classes.Ord a, GHC.Classes.Eq a) => GHC.Classes.Ord (Persistence.Filtration.Extended a)
instance GHC.Num.Num a => GHC.Num.Num (Persistence.Filtration.Extended a)