mcmc-0.2.4: Sample from a posterior using Markov chain Monte Carlo

Copyright(c) Dominik Schrempf 2020
LicenseGPL-3.0-or-later
Maintainerdominik.schrempf@gmail.com
Stabilityunstable
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

Mcmc

Contents

Description

Creation date: Tue May 5 18:01:15 2020.

A short introduction to update mechanisms using the Metropolis-Hastings algorithm (see Geyer, C. J., 2011; Introduction to Markov Chain Monte Carlo. In Handbook of Markov Chain Monte Carlo (pp. 45), Chapman & Hall/CRC).

For examples, please see mcmc-examples.

The import of this module alone should cover most use cases.

Synopsis

Proposals

A Proposal is an instruction about how to advance a given Markov chain so that it possibly reaches a new state. That is, Proposals specify how the chain traverses the state space. As far as this MCMC library is concerned, Proposals are elementary updates in that they cannot be decomposed into smaller updates.

Proposals can be combined to form composite updates, a technique often referred to as composition. On the other hand, mixing (used in the sense of mixture models) is the random choice of a Proposal (or a composition of Proposals) from a given set.

The composition and mixture of Proposals allows specification of nearly all MCMC algorithms involving a single chain (i.e., population methods such as particle filters are excluded). In particular, Gibbs samplers of all sorts can be specified using this procedure.

This library enables composition and mixture of Proposals via the Cycle data type. Essentially, a Cycle is a set of Proposals. The chain advances after the completion of each Cycle, which is called an iteration, and the iteration counter is increased by one.

The Proposals in a Cycle can be executed in the given order or in a random sequence which allows, for example, specification of a fixed scan Gibbs sampler, or a random sequence scan Gibbs sampler, respectively. See Order.

Note that it is of utter importance that the given Cycle enables traversal of the complete state space. Otherwise, the Markov chain will not converge to the correct stationary posterior distribution.

Proposals are named according to what they do, i.e., how they change the state of a Markov chain, and not according to the intrinsically used probability distributions. For example, slideSymmetric is a sliding proposal. Under the hood, it uses the normal distribution with mean zero and given variance. The sampled variate is added to the current value of the variable (hence, the name slide). The same nomenclature is used by RevBayes [1]. The probability distributions and intrinsic properties of a specific proposal are specified in detail in the documentation.

The other method, which is used intrinsically, is more systematic, but also a little bit more complicated: we separate between the proposal distribution and how the state is affected. And here, I am not only referring to the accessor (i.e., the lens), but also to the operator (addition, multiplication, any other binary operator). For example, the sliding proposal (without tuning information) is implemented as

slideSimple :: Lens' a Double -> Double -> Double -> Double -> ProposalSimple a
slideSimple l m s t = genericContinuous l (normalDistr m (s * t)) (+) (-)

This specification is more involved. Especially since we need to know the probability of jumping back, and so we need to know the inverse operator. However, it also allows specification of new proposals with great ease.

1
Höhna, S., Landis, M. J., Heath, T. A., Boussau, B., Lartillot, N., Moore, B. R., Huelsenbeck, J. P., …, Revbayes: bayesian phylogenetic inference using graphical models and an interactive model-specification language, Systematic Biology, 65(4), 726–736 (2016). http://dx.doi.org/10.1093/sysbio/syw021

data Proposal a Source #

A Proposal is an instruction about how the Markov chain will traverse the state space a. Essentially, it is a probability mass or probability density conditioned on the current state (i.e., a kernel).

A Proposal may be tuneable in that it contains information about how to enlarge or shrink the step size to tune the acceptance ratio.

Instances
Eq (Proposal a) Source # 
Instance details

Defined in Mcmc.Proposal

Methods

(==) :: Proposal a -> Proposal a -> Bool #

(/=) :: Proposal a -> Proposal a -> Bool #

Ord (Proposal a) Source # 
Instance details

Defined in Mcmc.Proposal

Methods

compare :: Proposal a -> Proposal a -> Ordering #

(<) :: Proposal a -> Proposal a -> Bool #

(<=) :: Proposal a -> Proposal a -> Bool #

(>) :: Proposal a -> Proposal a -> Bool #

(>=) :: Proposal a -> Proposal a -> Bool #

max :: Proposal a -> Proposal a -> Proposal a #

min :: Proposal a -> Proposal a -> Proposal a #

Show (Proposal a) Source # 
Instance details

Defined in Mcmc.Proposal

Methods

showsPrec :: Int -> Proposal a -> ShowS #

show :: Proposal a -> String #

showList :: [Proposal a] -> ShowS #

(@~) :: Lens' b a -> Proposal a -> Proposal b Source #

Convert a proposal from one data type to another using a lens.

For example:

scaleFirstEntryOfTuple = _1 @~ scale

scale Source #

Arguments

:: Double

Shape.

-> Double

Scale.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal Double 

Multiplicative proposal with Gamma distributed kernel.

scaleUnbiased Source #

Arguments

:: Double

Shape.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal Double 

Multiplicative proposal with Gamma distributed kernel.

The scale of the Gamma distributions is set to (shape)^{-1}, so that the mean of the Gamma distribution is 1.0.

scaleContrarily Source #

Arguments

:: Double

Shape.

-> Double

Scale.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal (Double, Double) 

Multiplicative proposal with Gamma distributed kernel.

The two values are scaled contrarily so that their product stays constant. Contrary proposals are useful when parameters are confounded.

scaleBactrian Source #

Arguments

:: Double

Spike parameter.

-> Double

Standard deviation.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal Double 

Multiplicative proposal with kernel similar to the silhouette of a Bactrian camel. See slideBactrian.

slide Source #

Arguments

:: Double

Mean.

-> Double

Standard deviation.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal Double 

Additive proposal with normally distributed kernel.

slideSymmetric Source #

Arguments

:: Double

Standard deviation.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal Double 

Additive proposal with normally distributed kernel with mean zero. This proposal is very fast, because the Metropolis-Hastings ratio does not include calculation of the forwards and backwards kernels.

slideUniform Source #

Arguments

:: Double

Delta.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal Double 

Additive proposal with uniformly distributed kernel. This proposal is very fast, because the Metropolis-Hastings ratio does not include calculation of the forwards and backwards kernels.

slideContrarily Source #

Arguments

:: Double

Mean.

-> Double

Standard deviation.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal (Double, Double) 

Additive proposal with normally distributed kernel.

The two values are slid contrarily so that their sum stays constant. Contrary proposals are useful when parameters are confounded.

slideBactrian Source #

Arguments

:: Double

Spike parameter.

-> Double

Standard deviation.

-> String

Name.

-> Int

Weight.

-> Bool

Enable tuning.

-> Proposal Double 

Additive symmetric proposal with kernel similar to the silhouette of a Bactrian camel.

The Bactrian kernel is a mixture of two symmetrically arranged normal distributions. The spike parameter loosely determines the standard deviations of the individual humps while the second parameter refers to the standard deviation of the complete Bactrian kernel.

See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3845170/.

data Cycle a Source #

In brief, a Cycle is a list of proposals.

The state of the Markov chain will be logged only after all Proposals in the Cycle have been completed, and the iteration counter will be increased by one. The order in which the Proposals are executed is specified by Order. The default is RandomO.

Proposals must have unique names, so that they can be identified.

fromList :: [Proposal a] -> Cycle a Source #

Create a Cycle from a list of Proposals.

data Order Source #

Define the order in which Proposals are executed in a Cycle. The total number of Proposals per Cycle may differ between Orders (e.g., compare RandomO and RandomReversibleO).

Constructors

RandomO

Shuffle the Proposals in the Cycle. The Proposals are replicated according to their weights and executed in random order. If a Proposal has weight w, it is executed exactly w times per iteration.

SequentialO

The Proposals are executed sequentially, in the order they appear in the Cycle. Proposals with weight w>1 are repeated immediately w times (and not appended to the end of the list).

RandomReversibleO

Similar to RandomO. However, a reversed copy of the list of shuffled Proposals is appended such that the resulting Markov chain is reversible. Note: the total number of Proposals executed per cycle is twice the number of RandomO.

SequentialReversibleO

Similar to SequentialO. However, a reversed copy of the list of sequentially ordered Proposals is appended such that the resulting Markov chain is reversible.

Instances
Eq Order Source # 
Instance details

Defined in Mcmc.Proposal

Methods

(==) :: Order -> Order -> Bool #

(/=) :: Order -> Order -> Bool #

Show Order Source # 
Instance details

Defined in Mcmc.Proposal

Methods

showsPrec :: Int -> Order -> ShowS #

show :: Order -> String #

showList :: [Order] -> ShowS #

Default Order Source # 
Instance details

Defined in Mcmc.Proposal

Methods

def :: Order #

setOrder :: Order -> Cycle a -> Cycle a Source #

Set the order of Proposals in a Cycle.

Initialization

The Status contains all information to run an MCMC chain. It is constructed using the function status.

The Status of a Markov chain includes information about current state (Item) and iteration, the history of the chain (Trace), the Acceptance ratios, and the random number generator.

Further, the Status includes auxiliary variables and functions such as the prior and likelihood functions, instructions to move around the state space (see above) and to monitor the MCMC run, as well as some auxiliary information.

status Source #

Arguments

:: String

Name of the Markov chain; used as file prefix.

-> (a -> Log Double)

The prior function.

-> (a -> Log Double)

The likelihood function.

-> Cycle a

A list of Proposals executed in forward order. The chain will be logged after each cycle.

-> Monitor a

A Monitor observing the chain.

-> a

The initial state in the state space a.

-> Maybe Int

Number of burn in iterations; deactivate burn in with Nothing.

-> Maybe Int

Auto tuning period (only during burn in); deactivate auto tuning with Nothing.

-> Int

Number of normal iterations excluding burn in. Note that auto tuning only happens during burn in.

-> GenIO

A source of randomness. For reproducible runs, make sure to use generators with the same, fixed seed.

-> Status a 

Initialize the Status of a Markov chain Monte Carlo run.

saveWith :: Int -> Status a -> Status a Source #

Save the Markov chain with trace of given length.

force :: Status a -> Status a Source #

Overwrite existing files; it is not necessary to use force, when a chain is continued.

quiet :: Status a -> Status a Source #

Do not print anything to standard output. Do not create log file. File monitors and batch monitors are executed normally.

debug :: Status a -> Status a Source #

Be verbose.

Monitor

A Monitor describes which part of the Markov chain should be logged and where. There are three different types: - MonitorStdOut: Log to standard output. - MonitorFile: Log to a file. - MonitorBatch: Log summary statistics such as the mean of the last - states to a file.

data Monitor a Source #

A Monitor describes which part of the Markov chain should be logged and where. Further, they allow output of summary statistics per iteration in a flexible way.

Constructors

Monitor (MonitorStdOut a) [MonitorFile a] [MonitorBatch a] 

data MonitorStdOut a Source #

Monitor to standard output; constructed with monitorStdOut.

monitorStdOut Source #

Arguments

:: [MonitorParameter a]

Instructions about which parameters to log.

-> Int

Logging period.

-> MonitorStdOut a 

Monitor to standard output.

data MonitorFile a Source #

Monitor to a file; constructed with monitorFile.

monitorFile Source #

Arguments

:: String

Name; used as part of the file name.

-> [MonitorParameter a]

Instructions about which parameters to log.

-> Int

Logging period.

-> MonitorFile a 

Monitor parameters to a file.

data MonitorBatch a Source #

Monitor to a file, but calculate batch means for the given batch size; constructed with monitorBatch.

Batch monitors are slow at the moment because the monitored parameter has to be extracted from the state for each iteration.

monitorBatch Source #

Arguments

:: String

Name; used as part of the file name.

-> [MonitorParameterBatch a]

Instructions about which parameters to log and how to calculate the batch means.

-> Int

Batch size.

-> MonitorBatch a 

Monitor parameters to a file, see MonitorBatch.

Prior distributions

module Mcmc.Prior

Algorithms

At the moment, the library is tailored to the Metropolis-Hastings algorithm (mh) since it covers most use cases. However, implementation of more algorithms is planned in the future.

mh Source #

Arguments

:: ToJSON a 
=> Status a

Initial (or last) status of the Markov chain.

-> IO (Status a) 

Run a Markov chain for a given number of Metropolis-Hastings steps.

mhContinue Source #

Arguments

:: ToJSON a 
=> Int

Additional number of Metropolis-Hastings steps.

-> Status a

Loaded status of the Markov chain.

-> IO (Status a) 

Continue a Markov chain for a given number of Metropolis-Hastings steps.

At the moment, when an MCMC run is continued, the old .mcmc file is deleted. This behavior may change in the future.

This means that an interrupted continuation also breaks previous runs. This step is necessary because, otherwise, incomplete monitor files are left on disk, if a continuation is canceled. Subsequent continuations would append to the incomplete monitor files and produce garbage.

Misc

loadStatus :: FromJSON a => (a -> Log Double) -> (a -> Log Double) -> Cycle a -> Monitor a -> FilePath -> IO (Status a) Source #

Load a Status from file.

Important information that cannot be saved and has to be provided again when a chain is restored: - prior function - likelihood function - cycle - monitor

To avoid incomplete continued runs, the .mcmc file is removed after load.