Copyright  Patrick Steele 2015 

License  MIT (see the LICENSE file) 
Maintainer  prs233@cornell.edu 
Safe Haskell  None 
Language  Haskell98 
Algorithms and data structures for expressing and solving Markov decision processes (MDPs).
See the following for references on the algorithms implemented, along with general terminology.
 "Dynamic Programmand and Optimal Control, Vol. II", by Dimitri P. Bertsekas, Athena Scientific, Belmont, Massachusetts.
 "Stochastic Dynamic Programming and the Control of Queueing Systems", by Linn I. Sennott, A Wiley Interscience Publication, New York.
The module Algorithms.MDP.Examples contains implementations of several example problems from these texts.
To actually solve an MDP, use (for example) the
valueIteration
function from the
Algorithms.MDP.ValueIteration module.
 data MDP a b t = MDP {}
 mkDiscountedMDP :: Eq b => [a] > [b] > Transitions a b t > Costs a b t > ActionSet a b > t > MDP a b t
 mkUndiscountedMDP :: (Eq b, Num t) => [a] > [b] > Transitions a b t > Costs a b t > ActionSet a b > MDP a b t
 type Transitions a b t = b > a > a > t
 type Costs a b t = b > a > t
 type ActionSet a b = a > [b]
 type CF a b t = Vector (a, b, t)
 data CFBounds a b t = CFBounds {}
 cost :: Eq a => a > CF a b t > t
 action :: Eq a => a > CF a b t > b
 optimalityGap :: Num t => CFBounds a b t > t
 verifyStochastic :: (Ord t, Num t) => MDP a b t > t > Either (MDPError a b t) ()
 data MDPError a b t = MDPError {
 _negativeProbability :: [(b, a, a, t)]
 _notOneProbability :: [(b, a, t)]
Markov decision processes
A Markov decision process.
An MDP consists of a state space, an action space, state and actiondependent costs, and state and actiondependent transition probabilities. The goal is to compute a policy  a mapping from states to actions  which minimizes the total discounted cost of the problem, assuming a given discount factor in the range (0, 1].
Here the type variable a
represents the type of the states, b
represents the type of the actions, and t
represents the numeric
type used in computations. Generally choosing t
to be a Double is
fine, although there is no reason a higherprecision type cannot be
used.
This type should not be constructed directly; use the
mkDiscountedMDP
or mkUndiscountedMDP
constructors instead.
:: Eq b  
=> [a]  The state space 
> [b]  The action space 
> Transitions a b t  The transition probabilities 
> Costs a b t  The actiondependent costs 
> ActionSet a b  The statedependent actions 
> t  The discount factor 
> MDP a b t  The resulting DiscountedMDP 
Creates a discounted MDP.
:: (Eq b, Num t)  
=> [a]  The state space 
> [b]  The action space 
> Transitions a b t  The transition probabilities 
> Costs a b t  The actiondependent costs 
> ActionSet a b  The statedependent actions 
> MDP a b t  The resulting DiscountedMDP 
Creates an undiscounted MDP.
Types
type Transitions a b t = b > a > a > t Source #
A type representing an action and statedependent probablity vector.
type CF a b t = Vector (a, b, t) Source #
A cost function is a vector containing (state, action, cost) triples. Each triple describes the cost of taking the action in that state.
A cost function with error bounds. The cost in a (state, action, cost) triple is guaranteed to be in the range [cost + lb, cost + ub]
Utility functions
cost :: Eq a => a > CF a b t > t Source #
Get the cost associated with a state.
This function is only defined over the state values passed in to the original MDP.
action :: Eq a => a > CF a b t > b Source #
Get the action associated with a state.
This function is only defined over the state values passed in to the original MDP.
optimalityGap :: Num t => CFBounds a b t > t Source #
Compute the optimality gap associated with a CFBounds.
This error is absolute, not relative.
Validation
verifyStochastic :: (Ord t, Num t) => MDP a b t > t > Either (MDPError a b t) () Source #
Returns the nonstochastic (action, state) pairs in an MDP
.
An (action, state) pair is not stochastic if any transitions out of the state occur with negative probability, or if the total probability all possible transitions is not 1 (within the given tolerance).
Verifies that the MDP is stochastic.
An MDP is stochastic if all transition probabilities are nonnegative, and the total sum of transitions out of a state under a legal action sum to one.
We verify sums to within the given tolerance.
An error describing the ways an MDP can be poorlydefined.
An MDP can be poorly defined by having negative transition probabilities, or having the total probability associated with a state and action exceeding one.
MDPError  
