hakaru-0.3.0: A probabilistic programming language

Language.Hakaru.Disintegrate

Description

Disintegration via lazy partial evaluation.

N.B., the forward direction of disintegration is not just partial evaluation! In the version discussed in the paper we must also ensure that no heap-bound variables occur in the result, which requires using HNFs rather than WHNFs. That condition is sound, but a bit too strict; we could generalize this to handle cases where there may be heap-bound variables remaining in neutral terms, provided we (a) don't end up trying to go both forward and backward on the same variable, and (more importantly) (b) end up with the proper Jacobian. The paper version is rigged to ensure that whenever we recurse into two subexpressions (e.g., the arguments to addition) one of them has a Jacobian of zero, thus when going from x*y to dx*y + x*dy one of the terms cancels out.

Developer's Note: To help keep the code clean, we use the worker/wrapper transform. However, due to complexities in typechecking GADTs, this often confuses GHC if you don't give just the right type signature on definitions. This confusion shows up whenever you get error messages about an "ambiguous" choice of ABT instance, or certain types of "couldn't match a with a1" error messages. To eliminate these issues we use -XScopedTypeVariables. In particular, the abt type variable must be bound by the wrapper (i.e., the top-level definition), and the workers should just refer to that same type variable rather than quantifying over abother abt type. In addition, whatever other type variables there are (e.g., the xs and a of an abt xs a argument) should be polymorphic in the workers and should not reuse the other analogous type variables bound by the wrapper.

Developer's Note: In general, we'd like to emit weights and guards "as early as possible"; however, determining when that actually is can be tricky. If we emit them way-too-early then we'll get hygiene errors because bindings for the variables they use have not yet been emitted. We can fix these hygiene erors by calling atomize, to ensure that all the necessary bindings have already been emitted. But that may still emit things too early, because emitting th variable-binding statements now means that we can't go forward/backward on them later on; which may cause us to bot unnecessarily. One way to avoid this bot issue is to emit guards/weights later than necessary, by actually pushing them onto the context (and then emitting them whenever we residualize the context). This guarantees we don't emit too early; but the tradeoff is we may end up generating duplicate code by emitting too late. One possible (currently unimplemented) solution to that code duplication issue is to let these statements be emitted too late, but then have a post-processing step to lift guards/weights up as high as they can go. To avoid problems about testing programs/expressions for equality, we can use a hash-consing trick so we keep track of the identity of guard/weight statements, then we can simply compare those identities and only after the lifting do we replace the identity hash with the actual statement.

Synopsis

# the Hakaru API

disintegrateWithVar :: ABT Term abt => Text -> Sing a -> abt '[] (HMeasure (HPair a b)) -> [abt '[] (a :-> HMeasure b)] Source #

Disintegrate a measure over pairs with respect to the lebesgue measure on the first component. That is, for each measure kernel n <- disintegrate m we have that m == bindx lebesgue n. The first two arguments give the hint and type of the lambda-bound variable in the result. If you want to automatically fill those in, then see disintegrate.

N.B., the resulting functions from a to 'HMeasure b are indeed measurable, thus it is safe/appropriate to use Hakaru's (':->) rather than Haskell's (->).

BUG: Actually, disintegration is with respect to the Borel measure on the first component of the pair! Alas, we don't really have a clean way of describing this since we've no primitive MeasureOp for Borel measures.

Developer's Note: This function fills the role that the old runDisintegrate did (as opposed to the old function called disintegrate). [Once people are familiar enough with the new code base and no longer recall what the old code base was doing, this note should be deleted.]

disintegrate :: ABT Term abt => abt '[] (HMeasure (HPair a b)) -> [abt '[] (a :-> HMeasure b)] Source #

A variant of disintegrateWithVar which automatically computes the type via typeOf.

densityWithVar :: ABT Term abt => Text -> Sing a -> abt '[] (HMeasure a) -> [abt '[] (a :-> HProb)] Source #

Return the density function for a given measure. The first two arguments give the hint and type of the lambda-bound variable in the result. If you want to automatically fill those in, then see density.

TODO: is the resulting function guaranteed to be measurable? If so, update this documentation to reflect that fact; if not, then we should make it into a Haskell function instead.

density :: ABT Term abt => abt '[] (HMeasure a) -> [abt '[] (a :-> HProb)] Source #

A variant of densityWithVar which automatically computes the type via typeOf.

observe :: ABT Term abt => abt '[] (HMeasure a) -> abt '[] a -> [abt '[] (HMeasure a)] Source #

Constrain a measure such that it must return the observed value. In other words, the resulting measure returns the observed value with weight according to its density in the original measure, and gives all other values weight zero.

determine :: ABT Term abt => [abt '[] a] -> Maybe (abt '[] a) Source #

Arbitrarily choose one of the possible alternatives. In the future, this function should be replaced by a better one that takes some sort of strategy for deciding which alternative to choose.

# Implementation details

perform :: forall abt. ABT Term abt => MeasureEvaluator abt (Dis abt) Source #

Simulate performing HMeasure actions by simply emiting code for those actions, returning the bound variable.

This is the function called (|>>) in the disintegration paper.

atomize :: ABT Term abt => TermEvaluator abt (Dis abt) Source #

The goal of this function is to ensure the correctness criterion that given any term to be emitted, the resulting term is semantically equivalent but contains no heap-bound variables. That correctness criterion is necessary to ensure hygiene/scoping.

This particular implementation calls evaluate recursively, giving us something similar to full-beta reduction. However, that is considered an implementation detail rather than part of the specification of what the function should do. Also, it's a gross hack and prolly a big part of why we keep running into infinite looping issues.

This name is taken from the old finally tagless code, where "atomic" terms are (among other things) emissible; i.e., contain no heap-bound variables.

BUG: this function infinitely loops in certain circumstances (namely when dealing with neutral terms)

constrainValue :: ABT Term abt => abt '[] a -> abt '[] a -> Dis abt () Source #

Given an emissible term v0 (the first argument) and another term e0 (the second argument), compute the constraints such that e0 must evaluate to v0. This is the function called (<|) in the disintegration paper, though notably we swap the argument order so that the "value" is the first argument.

N.B., this function assumes (and does not verify) that the first argument is emissible. So callers (including recursive calls) must guarantee this invariant, by calling atomize as necessary.

TODO: capture the emissibility requirement on the first argument in the types, to help avoid accidentally passing the arguments in the wrong order!

constrainOutcome :: forall abt a. ABT Term abt => abt '[] a -> abt '[] (HMeasure a) -> Dis abt () Source #

This is a helper function for constrainValue to handle SBind statements (just as the perform argument to evaluate is a helper for handling SBind statements).

N.B., We assume that the first argument, v0, is already atomized. So, this must be ensured before recursing, but we can assume it's already been done by the IH. Technically, we con't care whether the first argument is in normal form or not, just so long as it doesn't contain any heap-bound variables.

This is the function called (<<|) in the paper, though notably we swap the argument order.

TODO: find some way to capture in the type that the first argument must be emissible, to help avoid accidentally passing the arguments in the wrong order!

TODO: under what circumstances is constrainOutcome x m different from constrainValue x =<< perform m? If they're always the same, then we should just use that as the definition in order to avoid repeating ourselves