ql:\      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[Patrick Steele 2015MIT (see the LICENSE file)prs233@cornell.eduNone :An error describing the ways an MDP can be poorly-defined.An MDP can be poorly defined by having negative transition probabilities, or having the total probability associated with a state and action exceeding one.A Markov decision process.JAn MDP consists of a state space, an action space, state- and action-dependent costs, and state- and action-dependent transition probabilities. The goal is to compute a policy -- a mapping from states to actions -- which minimizes the total discounted cost of the problem, assuming a given discount factor in the range (0, 1].Here the type variable a$ represents the type of the states, b* represents the type of the actions, and tG represents the numeric type used in computations. Generally choosing t^ to be a Double is fine, although there is no reason a higher-precision type cannot be used.7This type should not be constructed directly; use the  or  constructors instead. A cost function with error bounds. The cost in a (state, action, cost) triple is guaranteed to be in the range [cost + lb, cost + ub]A cost function is a vector containing (state, action, cost) triples. Each triple describes the cost of taking the action in that state.3A type representing the allowed actions in a state.8A type representing an action- and state-dependent cost.FA type representing an action- and state-dependent probablity vector.%Get the cost associated with a state.SThis function is only defined over the state values passed in to the original MDP.'Get the action associated with a state.SThis function is only defined over the state values passed in to the original MDP.6Compute the optimality gap associated with a CFBounds.%This error is absolute, not relative.Creates a discounted MDP.Creates an undiscounted MDP.7Returns the non-stochastic (action, state) pairs in an .An (action, state) pair is not stochastic if any transitions out of the state occur with negative probability, or if the total probability all possible transitions is not 1 (within the given tolerance).$Verifies that the MDP is stochastic.An MDP is stochastic if all transition probabilities are non-negative, and the total sum of transitions out of a state under a legal action sum to one.-We verify sums to within the given tolerance. The state spaceThe action spaceThe transition probabilitiesThe action-dependent costsThe state-dependent actionsThe discount factorThe resulting DiscountedMDPThe state spaceThe action spaceThe transition probabilitiesThe action-dependent costsThe state-dependent actionsThe resulting DiscountedMDP      None>A function mapping an action and a state to a transition rate.*A Continuous-time Markov decision process.A CTMDP is a continuous-time analog of an MDP. In a CTMDP each stage takes a variable amount of time. Each stage lasts an expontially distributed amount of time characterized by a state- and action-dependent rate parameter. Instead of simply having costs associated with a state and an action, the costs of a CTMDP are broken up into fixed and rate costs. Fixed costs are incured as an action are chosen, while rate costs are paid for the duration of the stage.Here the type variable a$ represents the type of the states, b* represents the type of the actions, and tG represents the numeric type used in computations. Generally choosing t^ to be a Double is fine, although there is no reason a higher-precision type cannot be used.6This type should not be constructed directly; use the ' constructor instead.'Create a CTMDP.(Convert a CTMDP into an MDP.  !"#$%&' The state spaceThe action spaceThe transition probabilitiesThe transition rates The action-dependent fixed costsThe action-dependent rate costsThe state-dependent actionsThe discount factor in (0, 1]The resulting CTMDP(  $%&!"#'(  !"#$%&'(  !"#$%&'(None )jThere are a number of services we can provide each customer, and if there are no customers we do nothing.,7The state space is the count of customers in the queue..A description of an MDP.6 Generate an MDP from a Scenario.7A specific scenario.8A specific scenario.9A specific scenario.:A specific scenario.;A specific scenario.<A specific scenario.=A specific scenario.>A specific scenario.)*+,-./0123456789:;<=>)+*,-./0123456789:;<=>./012345,-)*+6789:;<=> )*+,-./0123456789:;<=>NoneC8There are two distinct actions we can take in each stateFThere are two distinct statesIThe transition matrixJ/The costs associated with each state and actionKThe discount factorLThe available statesMThe available actionsN!The MDP representing the problem. CDEFGHIJKLMN CDEFGHIJKLMN FGHCDEIJKLMNCDEFGHIJKLMNNoneU!The MDP representing the problem.UUUUNone \.Compute the inner product between two vectors.VbCompute an infinite sequence of estimates of cost functions converging to the true cost function.gThis method should only be used on discounted MDPs (e.g. an MDP with a discount factor less than one).W0Computes the next estimate of the cost function.]SFinds the action that minimizes the one-step payoff using the given cost function.^DComputes the cost implied by choosing an action in the given state.XKAn implementation of value iteration that computes monotonic error bounds.hThe error bounds provided at each iteration are additive in each state. That is, given a cost estimate c/ for a given state and lower and upper bounds lb and ubF, the true cost is guaranteed to be in the interval [c + lb, c + ub].YMComputes the next estimate of the cost function and associated error bounds.Z/Relative value iteration for undiscounted MDPs.[TPerforms a single iterate of relative value iteration for the undiscounted problem. \VThe MDP to solve(An converging sequence of cost functionsWThe MDP to solve"The current cost function estimateThe next cost function estimate]The MDP we are solvingThe current cost function'The state for which we choose an action'The state for which we choose an action(The choice of action and associated cost^The MDP we are solving.The current cost function.The index of the state.The index of the action.The estimated cost.XThe MDP to solve(A converging sequence of cost functions.YZThe MDP to solve'A converging sequence of cost functions[VWXYZ[VXZWY[ \VW]^XYZ[None_       !""  #$%&'()(**++,-./0123456789:;<=>?@A*BCDEFGHI;J<KLMINOPQRSTUVW!mdp-0.1.1.0-EsJvPwVINOTmTBurI4NPtAlgorithms.MDPAlgorithms.MDP.CTMDPAlgorithms.MDP.Examples.MM1Algorithms.MDP.Examples.Ex_3_1Algorithms.MDP.Examples.Ex_3_2Algorithms.MDP.ValueIterationAlgorithms.MDP.ExamplesMDPError_negativeProbability_notOneProbabilityMDP_states_actions_costs_trans _discount _actionSetCFBounds_CF_lb_ubCF ActionSetCosts Transitionscostaction optimalityGapmkDiscountedMDPmkUndiscountedMDPverifyStochastic$fShowMDPErrorRatesCTMDP _fixedCosts _rateCosts_ratesmkCTMDP uniformizeAction NullActionStateScenario _arrivalRate _serviceRates _serviceCosts _holdingCosts _maxWaiting _scenarioCost mkInstance scenario1 scenario2 scenario3 scenario4 scenario5 scenario6 scenario7 scenario8 $fShowState $fEqState $fShowAction $fEqActionControlU1U2AB transitioncostsalphastatescontrolsmdp $fOrdState $fShowControl $fOrdControl $fEqControlvalueIteration valueIteraterelativeValueIterationrelativeValueIterate"undiscountedRelativeValueIterationundiscountedRVIinner choiceFor costForAction