-- Hoogle documentation, generated by Haddock
-- See Hoogle, http://www.haskell.org/hoogle/
-- | Quickly detect clusters and holes in data.
--
-- Persistence is a topological data analysis library motivated by
-- flexibility when it comes to the type of data being analyzed. If you
-- have data that comes with a meaningful function into something of the
-- Ord typeclass, you can use Persistence to detect clusters and holes in
-- the data. You can also use the library to analyze the topology of
-- directed/undirected weighted/unweighted graphs, and compare topologies
-- of different data sets.
@package Persistence
@version 1.1
-- | This module provides functions for constructing neighborhood graphs,
-- clique complexes, Vietoris-Rips complexes, as well as the computation
-- of Betti numbers.
--
-- An important thing to know about this module is the difference between
-- "fast" and "light" functions. Fast functions encode the metric in a 2D
-- vector, so that distances don't need to be computed over and over
-- again and can instead be accessed in constant time. Unfortunately,
-- this takes O(n^2) memory so I would strongly recomend against using it
-- for larger data sets; this is why "light" functions exist.
--
-- A neighborhood graph is a graph where an edge exists between two
-- vertices if and only if the vertices fall within a given distance of
-- each other. Graphs here are more of a helper data type for
-- constructing filtrations, which is why they have a somewhat odd
-- representation. Not to worry though, I've supplied a way of encoding
-- graphs of a more generic representation.
--
-- The clique complex of a graph is a simplicial complex whose simplices
-- are the complete subgraphs of the given graph. The Vietoris-Rips
-- complex is the clique complex of the neighborhood graph.
--
-- Betti numbers measure the number of n-dimensional holes in a given
-- simplicial complex. The 0th Betti number is the number of connected
-- components, or clusters; the 1st Betti number is the number of loops
-- or tunnels; the 2nd Betti number measures voids or hollow volumes; and
-- if you have high-dimensional data, you might have higher Betti numbers
-- representing the number of high-dimensional holes.
module SimplicialComplex
-- | The first component of the pair is the number of vertices. The second
-- component is a vector whose entries are vectors of simplices of the
-- same dimension. Index 0 of the vecor corresponds to dimension 1
-- because there is no need to store individual vertices. A simplex is
-- represented as a pair: the vector of its vertices (represent by their
-- index in the original data set), and the vector of the indices of the
-- faces in the next lowest dimension. Edges are the exception to the
-- last part - they do not store reference to their faces because it
-- would be redundant with their vertices.
type SimplicialComplex = (Int, Vector (Vector (Vector Int, Vector Int)))
-- | This represents the (symmetric) adjacency matrix of some weighted
-- undirected graph. The type a is whatever distance is in your
-- data analysis regime. The reason graphs are represented like this is
-- because their main function is too speed up the construction of
-- simplicial complexes and filtrations. If the clique complex of this
-- graph were to be constructed, only the adjacencies would matter. But
-- if a filtration is constructed all the distances will be required over
-- and over again - this allows them to be accessed in constant time.
type Graph a = Vector (Vector (a, Bool))
-- | Show all the information in a simplicial complex.
sc2String :: SimplicialComplex -> String
-- | Get the dimension of the highest dimensional simplex (constant time).
getDim :: SimplicialComplex -> Int
-- | Takes the number of vertices and each edge paired with its weight to
-- output the graph encoded as a 2D vector. If only you have an
-- unweighted graph, you can still encode your graph by simply letting
-- the type a be `()`. If you have a weighted graph but there
-- isn't a distance between every vertex, you can use the
-- Extended type (essentially the same as Maybe) from the
-- Persistence module which is already an instance of Ord.
encodeWeightedGraph :: Int -> (Int -> Int -> (a, Bool)) -> Graph a
-- | The first argument is a scale, the second is a metric, and the third
-- is the data. This function records the distance between every element
-- of the data and whether or not it is smaller than the given scale.
makeNbrhdGraph :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> Graph a
-- | Makes a simplicial complex where the simplices are the complete
-- subgraphs (cliques) of the given graph. Mainly a helper function for
-- makeVRComplexFast, but it might be useful if you happen to have a
-- graph you want to analyze. This utilizes any available processors in
-- parallel because the construction is quite expensive.
makeCliqueComplex :: Graph a -> SimplicialComplex
-- | Constructs the Vietoris-Rips complex given a scale, metric, and data
-- set. Also uses O(n^2) memory (where n is the number of data points)
-- for a graph storing all the distances between data points. This
-- utilizes any available processors in parallel because the construction
-- is quite expensive.
makeVRComplexFast :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> (SimplicialComplex, Graph a)
-- | Constructs the Vietoris-Rips complex given a scale, metric, and data
-- set. This utilizes any available processors in parallel because the
-- construction is quite expensive.
makeVRComplexLight :: (Ord a, Eq b) => a -> (b -> b -> a) -> [b] -> SimplicialComplex
-- | The nth index of the output corresponds to the nth Betti number. The
-- zeroth Betti number is the number of connected components, the 1st is
-- the number of tunnels/punctures, the 2nd is the number of hollow
-- volumes, and so on for higher-dimensional holes.
bettiNumbers :: SimplicialComplex -> [Int]
-- | Calculates all of the Betti numbers in parallel.
bettiNumbersPar :: SimplicialComplex -> [Int]
-- | This module contains functions for constructing filtrations and
-- computing persistent homology, as well as a few utility functions for
-- working with filtrations.
--
-- A filtration is a finite sequence of simplicial complexes where each
-- complex is a subset of the next. This means that a filtration can be
-- thought of as a single simplicial complex where each of the simplices
-- is labeled with a "filtration index" that represents the index in the
-- sequence where that simplex enters the filtration.
--
-- One way to create a filtration, given a simplicial complex, a metric
-- for the vertices, and a list of distances, is to loop through the
-- distances from greatest to least: create a simplicial complex each
-- iteration which excludes simplices that contain pairs of vertices
-- which are further than the current distance apart. This method will
-- produce a filtration of Vietoris-Rips complexes - each filtration
-- index will correspond to a VR complex whose scale is the corresponding
-- distance.
--
-- An essential thing to note about the way this library is set up is the
-- distinction between "fast" and "light" functions. Light functions call
-- the metric every time distance between two points is required, which
-- is a lot. Fast functions store the distances between points and access
-- them in constant time, BUT this means they use O(n^2) memory with
-- respect to the number of data points, so it's a really bad idea to use
-- this optimization on substantially large data.
--
-- IMPORTANT NOTE: This library currently only handles filtrations where
-- all the vertices have filtration index 0. Since I haven't thought of
-- any way filtrations that don't respect this property could come up in
-- applications, I have chosen to exclude them in favor of less memory
-- usage and a slight speedup.
--
-- Persistent homology is the main event of topological data analysis. It
-- allows one to identify clusters, holes, and voids that persist in the
-- data throughout many scales. The output of the persistence algorithm
-- is a barcode diagram, which currently have a somewhat crude
-- representation in this library. A single barcode represents the
-- filtration index where a feature appears and the index where it
-- disappears (if it does). Thus, short barcodes are typically
-- interpretted as noise and long barcodes are interpretted as actual
-- features.
--
-- After you've got the persistent homology of a data set, you might want
-- to compare it with that of a different data set. That's why this
-- release includes two versions of "bottleneck distance," one works only
-- if the number of features in each data set is the same and the other
-- works regardless.
module Persistence
-- | The first component is simply the number of vertices (all vertices are
-- assumed to have filtration index 0). The second component is a vector
-- with an entry for each dimension of simplices, starting at dimension 1
-- for edges. Each simplex is represented as a triple: its filtration
-- index, the indices of its vertices in the original data, and the
-- indices of its faces in the next lowest dimension. Edges do not have
-- reference to their faces, as it would be redundant with their
-- vertices. All simplices are sorted according to filtration index upon
-- construction of the filtration.
type Filtration = (Int, Vector (Vector (Int, Vector Int, Vector Int)))
-- | (i, Just j) is a feature that appears at filtration index i and
-- disappears at index j, (i, Nothing) begins at i and doesn't disappear.
type BarCode = (Int, Maybe Int)
-- | Type for representing inifinite bottleneck distance.
data Extended a
-- | Shows all the information in a simplex.
sim2String :: (Int, Vector Int, Vector Int) -> String
-- | Shows all the information in a filtration.
filtr2String :: Filtration -> String
-- | Gets the simplicial complex specified by the filtration index. This is
-- O(n) with respect to the number of simplices.
getComplex :: Int -> Filtration -> SimplicialComplex
-- | The first argument is a list of dimensions, the second argument is a
-- list of filtration indices. The function returns the number of
-- simplices in the filtration whose dimension and index exist in the
-- respective lists.
getNumSimplices :: [Int] -> [Int] -> Filtration -> Int
-- | Return the dimension of the highest dimensional simplex in the
-- filtration (constant time).
getDimension :: Filtration -> Int
-- | Given a list of scales, a simplicial complex, and a weighted graph
-- (see SimplicialComplex) which encodes a metric on the vertices, this
-- function creates a filtration out of a simplicial complex by removing
-- simplices that contain edges that are too long for each scale in the
-- list. This is really a helper function to be called by
-- makeVRFiltrationFast, but I decided to expose it in case you have a
-- simplicial complex and weighted graph lying around. The scales MUST be
-- in decreasing order for this function.
filterByWeightsFast :: Ord a => [a] -> (SimplicialComplex, Graph a) -> Filtration
-- | Given a list of scales, a metric, and a data set, this function
-- constructs a filtration of the Vietoris-Rips complexes associated with
-- the scales. The scales MUST be in decreasing order. Note that this a
-- fast function, meaning it uses O(n^2) memory to quickly access
-- distances where n is the number of data points.
makeVRFiltrationFast :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration
-- | The same as filterbyWeightsFast except it uses far less memory at the
-- cost of speed. Note that the scales must be in decreasing order.
filterByWeightsLight :: Ord a => [a] -> (b -> b -> a) -> [b] -> SimplicialComplex -> Filtration
-- | Given a list of scales in decreasing order, a metric, and a data set,
-- this constructs the filtration of Vietoris-Rips complexes
-- corresponding to the scales.
makeVRFiltrationLight :: (Ord a, Eq b) => [a] -> (b -> b -> a) -> [b] -> Filtration
-- | Each index in the list is a list of barcodes whose dimension
-- corresponds to the index. That is, the first list will represent
-- clusters, the second list will represent tunnels or punctures, the
-- third will represent hollow volumes, and the nth index list will
-- represent n-dimensional holes in the data.
persistentHomology :: Filtration -> [[BarCode]]
-- | Return the maximum of minimum distances bewteen the bar codes. It's
-- important to note that the function isn't "unsafe" in the sense that
-- it will throw an exception, it will just give you a distance
-- regardless of whether or not there is the same number of barcodes is
-- in each list.
bottelNeckDistance :: [BarCode] -> [BarCode] -> Extended Double
-- | Get's all the bottle neck distances; a good way to determine the
-- similarity of the topology of two filtrations.
bottelNeckDistances :: [[BarCode]] -> [[BarCode]] -> [Extended Double]
-- | If the number of barcodes is the same, return the maximum of minimum
-- distances bewteen the bar codes. Otherwise return nothing.
safeBottelNeckDistance :: [BarCode] -> [BarCode] -> Maybe (Extended Double)
-- | Safely get all the bottle neck distances; a good way to determine the
-- similarity of the topology of two filtrations.
safeBottelNeckDistances :: [[BarCode]] -> [[BarCode]] -> [Maybe (Extended Double)]
instance GHC.Classes.Eq a => GHC.Classes.Eq (Persistence.Extended a)
instance (GHC.Classes.Ord a, GHC.Classes.Eq a) => GHC.Classes.Ord (Persistence.Extended a)
-- | This module implements algorithms for admissible Hasse diagrams. A
-- Hasse diagram is admissible if it is stratified and oriented. A
-- diagram is stratified if all the vertices can be arranged in rows such
-- that all the sources of each vertex are in the next highest row and
-- all the targets are in the next lowest row. A diagram is oriented if
-- every vertex has a linear ordering on its targets.
--
-- A node in the diagram is represented as a tuple: the indices of the
-- level 0 nodes in the diagram that are reachable from this node, the
-- indices of targets in the next lowest level, and the indices of the
-- sources in the next highest level. The entire diagram is simply an
-- array where each entry is an array representing a particular level;
-- index 0 represents level 0, etc.
--
-- Any directed graph can be encoded as an admissible Hasse diagram with
-- 2 levels. The edges are level 1 and the vertices are level 0. The
-- ordering on the targets of a node representing an edge is simply the
-- terminal vertex first and the initial vertex second. This may be
-- counterintuitive, but its helpful to interpret an arrow between two
-- vertices as the "<" operator. This induces a linear ordering on the
-- vertices of any acyclic complete subgraph - which is what the nodes in
-- the Hasse diagram of the directed clique complex represent.
--
-- Any oriented simplicial complex can also be encoded as an admissible
-- Hasse diagram. A node is a simplex, the targets are the faces of the
-- simplex, and the sources are simplices of which the given simplex is a
-- face.
--
-- The main feature of this module is an algorithm which takes the Hasse
-- diagram of a directed graph and generates the Hasse diagram of the
-- directed flag complex - the simplicial complex whose simplices are
-- acyclic complete subgraphs of the given graph. Here acyclic refers to
-- a directed graph without any sequence of arrows whose heads and tails
-- match up an which has the same start and end vertex.
--
-- The idea is that, if your directed graph represents any kind of
-- information flow, "sub-modules" in the network that simply take input,
-- process it using its nodes, and then output it without spinning the
-- information around at all. These "sub-modules" are the directed
-- cliques/flags which I've been referring to as acyclic complete
-- subgraphs up to this point. Constructing a simplicial complex out of
-- them will allow you to both simplify the 1-dimensional topology of the
-- network and possibly detect higher-dimensional topological features.
module HasseDiagram
-- | Type representing a node in a Hasse diagram. Hasse diagrams are being
-- used to represent simplicial complexes so each node represents a
-- simplex Contents of the tuple in order: Vector of references to
-- vertices of the underlying directed graph, vector of references to the
-- simplices faes in the next lowest level of the Hasse diagram, vector
-- of references to "parent" simplices (simplices who have this simplex
-- as a face) in the next highest level of the Hasse diagram.
type Node = (Vector Int, Vector Int, Vector Int)
-- | Type representing an admissible Hasse diagram. Each entry in the
-- vector represents a level in the Hasse diagram.
type HasseDiagram = Vector (Vector Node)
-- | Simple printing function for Hasse diagrams.
hsd2String :: HasseDiagram -> String
-- | Given the number of vertices in a directed graph, and pairs
-- representing the direction of each edge (initial, terminal), construct
-- a Hasse diagram representing the graph.
encodeDirectedGraph :: Int -> [(Int, Int)] -> HasseDiagram
-- | Given a Hasse diagram representing a directed graph, construct the
-- diagram representing the directed clique/flag complex of the graph.
directedFlagComplex :: HasseDiagram -> HasseDiagram
-- | Convert a Hasse diagram to a simplicial complex.
toSimplicialComplex :: HasseDiagram -> SimplicialComplex