boltzmann-samplers-0.1.1.0: Uniform random generators

Boltzmann.Data

Description

Generic Boltzmann samplers.

Here, the words "sampler" and "generator" are used interchangeably.

Given an algebraic datatype:

data A = A1 B C | A2 D

a Boltzmann sampler is recursively defined by choosing a constructor with some fixed distribution, and independently generating values for the corresponding fields with the same method.

A key component is the aforementioned distribution, defined for every type such that the resulting generator produces a finite value in the end. These distributions are obtained from a precomputed object called oracle, which we will not describe further here.

Oracles depend on the target size of the generated data (except for singular samplers), and can be fairly expensive to compute repeatedly, hence some of the functions below attempt to avoid (re)computing too many of them even when the required size changes.

When these functions are specialized, oracles are memoized and will be reused for different sizes.

Synopsis

# Documentation

type Size' = Int Source #

The size of a value is its number of constructors.

Here, however, the Size' type is interpreted differently to make better use of QuickCheck's size parameter provided by the sized combinator, so that we generate non-trivial data even at very small size values.

For infinite types, with objects of unbounded sizes > minSize, given a parameter delta :: Size', the produced values have an average size close to minSize + delta.

For example, values of type Either () [Bool] have at least two constructors, so

generator delta :: Gen (Either () [Bool])

will target sizes close to 2 + delta; the offset becomes less noticeable as delta grows to infinity.

For finite types with sizes in [minSize, maxSize], the target expected size is obtained by clamping a Size' to [0, 99] and applying an affine mapping.

# Main functions

### Suffixes

S
Singular sampler.

This works with recursive tree-like structures, as opposed to (lists of) structures with bounded size. More precisely, the generating function of the given type should have a finite radius of convergence, with a singularity of a certain kind (see Duchon et al., reference in the README), so that the oracle can be evaluated at that point.

This has the advantage of using the same oracle for all size parameters, which simply specify a target size interval.

P
Generator of pointed values.

It usually has a flatter distribution of sizes than a simple Boltzmann sampler, making it an efficient alternative to rejection sampling.

It also works on more types, particularly lists and finite types, but relies on multiple oracles.

R
Rejection sampling.

These generators filter out values whose sizes are not within some interval. In the first two sections, that interval is implicit: [(1-epsilon)*size', (1+epsilon)*size'], for epsilon = 0.1.

The generator restarts as soon as it has produced more constructors than the upper bound, this strategy is called ceiled rejection sampling.

# Pointing

The pointing of a type t is a derived type whose values are essentially values of type t, with one of their constructors being "pointed". Alternatively, we may turn every constructor into variants that indicate the position of points.

-- Original type
data Tree = Node Tree Tree | Leaf
-- Pointing of Tree
data Tree'
= Tree' Tree -- Point at the root
| Node'0 Tree' Tree -- Point to the left
| Node'1 Tree Tree' -- Point to the right

Pointed values are easily mapped back to the original type by erasing the point. Pointing makes larger values occur much more frequently, while preserving the uniformness of the distribution conditionally to a fixed size.

generatorSR :: (Data a, MonadRandomLike m) => Size' -> m a Source #

generatorSR :: Int -> Gen a

Singular ceiled rejection sampler.

generatorP :: (Data a, MonadRandomLike m) => Size' -> m a Source #

generatorP :: Int -> Gen a

Generator of pointed values.

generatorPR :: (Data a, MonadRandomLike m) => Size' -> m a Source #

Pointed generator with rejection.

generatorR :: (Data a, MonadRandomLike m) => Size' -> m a Source #

Generator with rejection and dynamic average size.

## Fixed size

The ' suffix indicates functions which do not do any precomputation before passing the size parameter.

This means that oracles are computed from scratch for every size value, which may incur a significant overhead.

generatorP' :: (Data a, MonadRandomLike m) => Size' -> m a Source #

Pointed generator.

generatorPR' :: (Data a, MonadRandomLike m) => Size' -> m a Source #

Pointed generator with rejection.

generatorR' :: (Data a, MonadRandomLike m) => Size' -> m a Source #

Ceiled rejection sampler with given average size.

generator' :: (Data a, MonadRandomLike m) => Size' -> m a Source #

Basic boltzmann sampler with no optimization.

# Generators with aliases

Boltzmann samplers can normally be defined only for types a such that:

• they are instances of Data;
• the set of types of subterms of values of type a is finite;
• and all of these types have at least one finite value (i.e., values with finitely many constructors).

Examples of misbehaving types are:

• a -> b -- Not Data
• data E a = L a | R (E [a]) -- Contains a, [a], [[a]], [[[a]]], etc.
• data I = C I -- No finite value

# Alias

The Alias type works around these limitations (AliasR for rejection samplers). This existential wrapper around a user-defined function f :: a -> m b makes boltzmann-samplers view occurences of the type b as a when processing a recursive system of types, possibly stopping some infinite unrolling of type definitions. When a value of type b needs to be generated, it generates an a which is passed to f.

let
as = [aliasR $\() -> return (L []) :: Gen (E [[Int]])] in generatorSRWith as asGen :: Size -> Gen (E Int) Another use case is to plug in user-defined generators where the default is not satisfactory, for example, to generate positive Ints: let as = [alias$ \() -> choose (0, 100) :: Gen Int)]
in
generatorPWith as asGen :: Size -> Gen [Int]

or to modify the weights assigned to some types. In particular, in some cases it seems preferable to make String (and Text) have the same weight as Int and ().

let

## Alias

alias :: (Monad m, Data a, Data b) => (a -> m b) -> Alias m Source #

Main constructor for Alias.

aliasR :: (Monad m, Data a, Data b) => (a -> m b) -> AliasR m Source #

Main constructor for AliasR.

coerceAlias :: Coercible m n => Alias m -> Alias n Source #

coerceAlias :: Alias m -> Alias (AMonadRandom m)

coerceAliases :: Coercible m n => [Alias m] -> [Alias n] Source #

coerceAliases :: [Alias m] -> [Alias (AMonadRandom m)]

data Alias m where Source #

Constructors

 Alias :: (Data a, Data b) => !(m a -> m b) -> Alias m

Instances

 Show (Alias m) Source # Dummy instance for debugging. MethodsshowsPrec :: Int -> Alias m -> ShowS #show :: Alias m -> String #showList :: [Alias m] -> ShowS #

type AliasR m = Alias (RejectT m) Source #