mdp-0.1.1.0: Tools for solving Markov Decision Processes.

Safe Haskell None Haskell98

Algorithms.MDP.ValueIteration

Description

This module provides several flavors of the value iteration algorithm for solving MDPs.

Synopsis

# Value iteration algorithms

Arguments

 :: (Ord t, Num t) => MDP a b t The MDP to solve -> [CF a b t] An converging sequence of cost functions

Compute an infinite sequence of estimates of cost functions converging to the true cost function.

This method should only be used on discounted MDPs (e.g. an MDP with a discount factor less than one).

Arguments

 :: (Read t, Ord t, Fractional t) => MDP a b t The MDP to solve -> [CFBounds a b t] A converging sequence of cost functions.

An implementation of value iteration that computes monotonic error bounds.

The error bounds provided at each iteration are additive in each state. That is, given a cost estimate c for a given state and lower and upper bounds lb and ub, the true cost is guaranteed to be in the interval [c + lb, c + ub].

Arguments

 :: (Ord t, Fractional t, Read t) => MDP a b t The MDP to solve -> [CFBounds a b t] A converging sequence of cost functions

Relative value iteration for undiscounted MDPs.

# Helper functions for value iteration

Arguments

 :: (Ord t, Num t) => MDP a b t The MDP to solve -> CF a b t The current cost function estimate -> CF a b t The next cost function estimate

Computes the next estimate of the cost function.

relativeValueIterate :: (Ord t, Fractional t) => MDP a b t -> CFBounds a b t -> CFBounds a b t Source #

Computes the next estimate of the cost function and associated error bounds.

undiscountedRVI :: (Ord t, Fractional t) => MDP a b t -> Int -> CFBounds a b t -> CFBounds a b t Source #

Performs a single iterate of relative value iteration for the undiscounted problem.