{- | This module shows how to solve several example problems using this library. -} module Algorithms.MDP.Examples ( -- * A discounted problem {- | We consider the problem defined in "Algorithms.MDP.Examples.Ex_3_1"; this example comes from Bersekas p. 22. We will solve this problem using regular value iteration. Having constructed the MDP, we can do this using the 'valueIteration' function. @ import Algorithms.MDP.Examples.Ex_3_1 import Algorithms.MDP.ValueIteration iterations :: [CF State Control Double] iterations = valueIteration mdp @ The iterates returned contain estimates of the cost of being at each state. To see the costs of the state A over the first 10 iterations, we could do @ estimates :: [Double] estimates = map (cost A) (take 10 iterations) @ -} -- * A discounted problem with error bounds {- | We consider the same example as above, but this time we use relative value iteration to compute error bounds on the costs. This will allow us to use fewer iterations to obtain an accurate cost estimate. Since we have already defined the problem, we do this via the 'relativeValueIteration' function. @ import Algorithms.MDP.Examples.Ex_3_1 import Algorithms.MDP.ValueIteration iterations :: [CFBounds State Control Double] iterations = relativeValueIteration mdp @ The iterates returned contain estimates of the cost of being at each state, along with associated error bounds. To see the costs of the state A over the first 10 iterations adjusted for the error bounds, we could do @ estimate state (CFBounds cf lb ub) = (z + lb, z + ub) where z = cost state cf estimates :: [(Double, Double)] estimates = map (estimate A) (take 10 iterations) @ Note that the lower- and upper-bounds returned in the first iteration are always +/-Infinity, and so it can be useful to consider only the tail of the iterations. -} -- * An average cost problem {- | We consider the problem defined in "Algorithms.MDP.Examples.Ex_3_2"; this example comes from Bersekas p. 210. Here we are interested in computing the long-run average cost of an undiscounted MDP. For this we use the 'undiscountedRelativeValueIteration' function. @ import Algorithms.MDP.Examples.Ex_3_2 import Algorithms.MDP.ValueIteration iterations :: [CFBounds State Control Double] iterations = undiscountedRelativeValueIteration mdp @ We can compute cost estimates in the same fashion as above. @ estimate state (CFBounds cf lb ub) = (lb, ub) estimates :: [(Double, Double)] estimates = map (estimate A) (take 10 iterations) @ It is important to note that in this problem the cost function returned in each 'CFBounds' object is not to be interpreted as a vector of costs, but rather as a differential cost vector; however, the estimates above retrain the same interpretation. -} -- * A continuous-time undiscounted problem {- | We now consider a family of problems described by Sennot p. 248. Here we are interested in first converting a CTMDP to an MDP via uniformization, and then computing the long-run average cost of the optimal policy. To begin, we construct one of the scenarios provided (each scenario is just an instance of the problem with certain parameters). We then convert the scenario to an MDP using the 'uniformize' function. @ import Algorithms.MDP.Examples.MM1 import Algorithms.MDP.CTMDP import Algorithms.MDP.ValueIteration scenario :: CTMDP State Action Double scenario = mkInstance scenario1 mdp :: MDP State Action Double mdp = uniformize scenario @ As above, we can use the 'undiscountedRelativeValueIteration' function to compute cost estimates. @ iterations :: [CFBounds State Action Double] iterations = undiscountedRelativeValueIteration mdp estimate state (CFBounds _ lb ub) = (lb, ub) estimates :: [(Double, Double)] estimates = map (estimate A) (take 10 iterations) @ -} ) where import Algorithms.MDP.ValueIteration() import Algorithms.MDP.CTMDP()