Persistence-1.1.1: Quickly detect clusters and holes in data.

Copyright(c) Eben Cowley 2018
LicenseBSD 3 Clause
Maintainereben.cowley42@gmail.com
Stabilityexperimental
Safe HaskellNone
LanguageHaskell2010

Persistence

Description

This module contains functions for constructing filtrations and computing persistent homology, as well as a few utility functions for working with filtrations.

A filtration is a finite sequence of simplicial complexes where each complex is a subset of the next. This means that a filtration can be thought of as a single simplicial complex where each of the simplices is labeled with a "filtration index" that represents the index in the sequence where that simplex enters the filtration.

One way to create a filtration, given a simplicial complex, a metric for the vertices, and a list of distances, is to loop through the distances from greatest to least: create a simplicial complex each iteration which excludes simplices that contain pairs of vertices which are further than the current distance apart. This method will produce a filtration of Vietoris-Rips complexes - each filtration index will correspond to a VR complex whose scale is the corresponding distance.

An essential thing to note about the way this library is set up is the distinction between "fast" and "light" functions. Light functions call the metric every time distance between two points is required, which is a lot. Fast functions store the distances between points and access them in constant time, BUT this means they use O(n^2) memory with respect to the number of data points, so it's a really bad idea to use this optimization on substantially large data.

IMPORTANT NOTE: This library currently only handles filtrations where all the vertices have filtration index 0. Since I haven't thought of any way filtrations that don't respect this property could come up in applications, I have chosen to exclude them in favor of less memory usage and a slight speedup.

Persistent homology is the main event of topological data analysis. It allows one to identify clusters, holes, and voids that persist in the data throughout many scales. The output of the persistence algorithm is a barcode diagram, which currently have a somewhat crude representation in this library. A single barcode represents the filtration index where a feature appears and the index where it disappears (if it does). Thus, short barcodes are typically interpretted as noise and long barcodes are interpretted as actual features.

After you've got the persistent homology of a data set, you might want to compare it with that of a different data set. That's why this release includes two versions of "bottleneck distance," one works only if the number of features in each data set is the same and the other works regardless.

Synopsis

Documentation

type Filtration = (Int, Vector (Vector (Int, Vector Int, Vector Int))) Source #

The first component is simply the number of vertices (all vertices are assumed to have filtration index 0). The second component is a vector with an entry for each dimension of simplices, starting at dimension 1 for edges. Each simplex is represented as a triple: its filtration index, the indices of its vertices in the original data, and the indices of its faces in the next lowest dimension. Edges do not have reference to their faces, as it would be redundant with their vertices. All simplices are sorted according to filtration index upon construction of the filtration.

type BarCode = (Int, Maybe Int) Source #

(i, Just j) is a feature that appears at filtration index i and disappears at index j, (i, Nothing) begins at i and doesn't disappear.

data Extended a Source #

Type for representing inifinite bottleneck distance.

Instances

Eq a => Eq (Extended a) Source # 

Methods

(==) :: Extended a -> Extended a -> Bool #

(/=) :: Extended a -> Extended a -> Bool #

(Ord a, Eq a) => Ord (Extended a) Source # 

Methods

compare :: Extended a -> Extended a -> Ordering #

(<) :: Extended a -> Extended a -> Bool #

(<=) :: Extended a -> Extended a -> Bool #

(>) :: Extended a -> Extended a -> Bool #

(>=) :: Extended a -> Extended a -> Bool #

max :: Extended a -> Extended a -> Extended a #

min :: Extended a -> Extended a -> Extended a #

sim2String :: (Int, Vector Int, Vector Int) -> String Source #

Shows all the information in a simplex.

filtr2String :: Filtration -> String Source #

Shows all the information in a filtration.

getComplex :: Int -> Filtration -> SimplicialComplex Source #

Gets the simplicial complex specified by the filtration index. This is O(n) with respect to the number of simplices.

getNumSimplices :: [Int] -> [Int] -> Filtration -> Int Source #

The first argument is a list of dimensions, the second argument is a list of filtration indices. The function returns the number of simplices in the filtration whose dimension and index exist in the respective lists.

getDimension :: Filtration -> Int Source #

Return the dimension of the highest dimensional simplex in the filtration (constant time).

filterByWeightsFast :: Ord a => [a] -> (SimplicialComplex, Graph a) -> Filtration Source #

Given a list of scales, a simplicial complex, and a weighted graph (see SimplicialComplex) which encodes a metric on the vertices, this function creates a filtration out of a simplicial complex by removing simplices that contain edges that are too long for each scale in the list. This is really a helper function to be called by makeVRFiltrationFast, but I decided to expose it in case you have a simplicial complex and weighted graph lying around. The scales MUST be in decreasing order for this function.

makeVRFiltrationFast :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration Source #

Given a list of scales, a metric, and a data set, this function constructs a filtration of the Vietoris-Rips complexes associated with the scales. The scales MUST be in decreasing order. Note that this a fast function, meaning it uses O(n^2) memory to quickly access distances where n is the number of data points.

filterByWeightsLight :: Ord a => [a] -> (b -> b -> a) -> [b] -> SimplicialComplex -> Filtration Source #

The same as filterbyWeightsFast except it uses far less memory at the cost of speed. Note that the scales must be in decreasing order.

makeVRFiltrationLight :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration Source #

Given a list of scales in decreasing order, a metric, and a data set, this constructs the filtration of Vietoris-Rips complexes corresponding to the scales.

persistentHomology :: Filtration -> [[BarCode]] Source #

Each index in the list is a list of barcodes whose dimension corresponds to the index. That is, the first list will represent clusters, the second list will represent tunnels or punctures, the third will represent hollow volumes, and the nth index list will represent n-dimensional holes in the data.

bottelNeckDistance :: [BarCode] -> [BarCode] -> Extended Double Source #

Return the maximum of minimum distances bewteen the bar codes. It's important to note that the function isn't "unsafe" in the sense that it will throw an exception, it will just give you a distance regardless of whether or not there is the same number of barcodes is in each list.

bottelNeckDistances :: [[BarCode]] -> [[BarCode]] -> [Extended Double] Source #

Get's all the bottle neck distances; a good way to determine the similarity of the topology of two filtrations.

safeBottelNeckDistance :: [BarCode] -> [BarCode] -> Maybe (Extended Double) Source #

If the number of barcodes is the same, return the maximum of minimum distances bewteen the bar codes. Otherwise return nothing.

safeBottelNeckDistances :: [[BarCode]] -> [[BarCode]] -> [Maybe (Extended Double)] Source #

Safely get all the bottle neck distances; a good way to determine the similarity of the topology of two filtrations.