Copyright | (c) William Yager 2015 |
---|---|

License | MIT |

Maintainer | will (dot) yager (at) gmail (dot) com |

Stability | provisional |

Portability | portable |

Safe Haskell | Safe |

Language | Haskell2010 |

This module provides a data structure and associated functions for representing discrete probability distributions.

All time and space complexity metrics are given in terms of `n`

. In this
case, `n`

refers to the number of unique outcomes inserted into the tree.
If one were to construct a tree by inserting a billion of the same
outcome, `n`

would still be 1.

The data structure is optimized for fast sampling from the distribution.
Sampling ranges from `O(1)`

to `O(log(n))`

depending on the distribution.

Under the hood, the distribution is represented by a perfectly balanced binary tree. The tree enforces a heap property, where more likely outcomes are closer to the top than less likely outcomes. Because we're more likely to sample from those outcomes, we minimize the amount of time spent traversing the tree.

When a duplicate outcome is inserted into the tree, the tree's "dups"
counter is incremented. When more than half the tree is duplicate entries,
the entire tree is rebuilt from scratch. Using amortized complexity
analysis, we can show that insertion is, at worst, `log(n)`

amortized
complexity. This prevents the size of tree from increasing to more than
`O(n)`

, even with many duplicate outcomes inserted.

## Synopsis

- data Distribution p o
- sample :: (Ord p, Num p, Random p, MonadRandom m) => Distribution p o -> m (Maybe o)
- cumulate :: Num p => Distribution p o -> p
- normalize :: Fractional p => Distribution p o -> Maybe (Distribution p o)
- lookup :: (Ord o, Num p) => Distribution p o -> o -> p
- empty :: Num p => Distribution p o
- insert :: (Ord o, Num p, Ord p) => (o, p) -> Distribution p o -> Distribution p o
- fromList :: (Ord o, Num p, Ord p) => [(o, p)] -> Distribution p o
- reweight :: (Ord o, Num p, Ord p) => (o -> p -> p) -> Distribution p o -> Distribution p o
- toList :: (Ord o, Num p) => Distribution p o -> [(o, p)]
- foldrWithP :: ((o, p) -> b -> b) -> b -> Distribution p o -> b
- joint :: (Ord o1, Ord o2, Num p, Ord p) => Distribution p o1 -> Distribution p o2 -> Distribution p (o1, o2)
- sum :: (Ord o, Num p, Ord p) => Distribution p o -> Distribution p o -> Distribution p o
- invariants :: (Num p, Ord p, Show p, Ord e, Show e) => Distribution p e -> Either String ()

# The distribution type

data Distribution p o Source #

A probability distribution with probabilities of type `p`

and
outcomes/events of type `o`

.

## Instances

(Num p, Show p, Ord o, Show o) => Show (Distribution p o) Source # | |

Defined in Numeric.Probability.Distribution showsPrec :: Int -> Distribution p o -> ShowS # show :: Distribution p o -> String # showList :: [Distribution p o] -> ShowS # |

# Probability operations

sample :: (Ord p, Num p, Random p, MonadRandom m) => Distribution p o -> m (Maybe o) Source #

Take a sample from the distribution. Can be used with e.g. `evalRand`

or `evalRandIO`

from `Control.Monad.Random`

. `O(log(n))`

for a uniform
distribution (worst case), but approaches `O(1)`

with less balanced
distributions.
Returns Nothing on an empty distribution.

cumulate :: Num p => Distribution p o -> p Source #

The sum of all probabilities in the distribution. `O(1)`

normalize :: Fractional p => Distribution p o -> Maybe (Distribution p o) Source #

Normalizes the distribution.
After normalizing,

is 1. `cumulate`

distribution`O(n)`

Returns Nothing if distribution is empty.

lookup :: (Ord o, Num p) => Distribution p o -> o -> p Source #

Given an outcome, returns the probability. Note that the probability
is not always normalized. If you want the probability to be in the 0-1
range, you should divide it by `cumulate dist`

(the sum of the probability
of all outcomes)

# Building

empty :: Num p => Distribution p o Source #

The empty distribution. `O(1)`

insert :: (Ord o, Num p, Ord p) => (o, p) -> Distribution p o -> Distribution p o Source #

Insert an outcome into the distribution.
Inserting `(o,p1)`

and `(o,p2)`

results in the same sampled distribution as
inserting `(o,p1+p2)`

. `O(log(n))`

amortized.

# Modifying

reweight :: (Ord o, Num p, Ord p) => (o -> p -> p) -> Distribution p o -> Distribution p o Source #

Reweights the probabilities in the distribution based on
the given function. `n*log(n)`

# Reducing

foldrWithP :: ((o, p) -> b -> b) -> b -> Distribution p o -> b Source #

A right-associative fold on the tree structure, including the
probabilities. Note that outcomes may be repeated within the data structure.
If you want identical outcomes to be lumped together, fold on the list
produced by

. `toList`

`O(n)`

.

# Combining

joint :: (Ord o1, Ord o2, Num p, Ord p) => Distribution p o1 -> Distribution p o2 -> Distribution p (o1, o2) Source #

Creates a new distribution that's the joint distribution of the two provided.
`O(nm*log(nm))`

amortized.

sum :: (Ord o, Num p, Ord p) => Distribution p o -> Distribution p o -> Distribution p o Source #

Creates a new distribution by summing the probabilities of the outcomes
in the two provided. `O((n+m)log(n+m))`

amortized.