úÎU&Qø     NoneÿData type specifying the environment in which the Q learner operates. envExecute is the function used to execute actions at a particular state, returning the new state and the award associated with the state, action pair. envPossible returns the actions possible at any given state.1Wrapper around Double, specifying a reward value./Wrapper around Int, specifying an action index.-Wrapper around Int, specifying a state index.ÿñData type specifying the parameters and Q table for a particular Q learner. qAlpha is the learning rate associated with each iterative update. qGamma is the discount rate on rewards. qGrid is a matrix (dimension number of states by number of actions) that specifies the Q(s,a) function learned by this Q learner. qEpsilon is a function that maps from the number of iterations left to epsilon for the epsilon greedy strategy (can return 1 uniformly if an epsilon greedy strategy is not wanted).˜Given alpha, gamma, the number of states and the maximum number of actions possible at any state, returns a QLearner initialized with a zero Q-table. «Given the envExecute and envPossible functions, constructs an Environment. This is purely for for uniformity of the API. You are welcome to use the data type constructor  Environment since they are equivalent. öGiven an Environment, a Q learner and the state the Q Learner is on, returns the Q Learner with an updated Q table and the new state of the Q learner within the Environment. Also takes the number of time steps left for the epsilon computation. gSame thing as "moveLearner" but prints out the Q table and the current state after moving the QLearner. ŽRepeatedly moves (i.e. moves the given number of times) the qLearner and prints the Q table at every move until a stop state is encountered.[Returns the maximum number of characters needed to "show" an element from the given vector.\Returns the maximum number of characters needed to "show" an element in the 2D matrix given.mInternal function that pads strings with spaces in order to make sure that the string is of a certain length.‰Internal function that does a pretty print for a row vector given the maximum space that the row can take up in terms of the characters.‘Internal function that does a pretty print for the Q-table given the maximum space that the a single element can take up in terms of characters.$Does a pretty print for the Q-table.¼Create a table for Q(s,a) values, each element representing the expected value of a give state and action pair. Takes the number of possible states and the number of actions as arguments.]Figures out the highest Q(s,a) action given a particular state and returns that action index.:Returns the largest Q(s,a) value given a particular state.ÿ'Updates the Q(s, a) value based on the previous value of Q(s, a) for a given value of s (the state at which an action was executed), a (the action executed at that state), r (the reward attained given the state action pair), s' (the new state) and gamma (the discount factor for the rewards). CCreate an s x s grid consisting of rewards. Used for grid searches.ÿïTake a Q table, current state and return the new Q table along with the new state index. Takes a function "execute" that takes a state, action pair and returns the reward and new state associated that state and action pair. The argument "possible" is a function that gives us a list of actions that are possible at a particular state. For example, we can't go off the grid when we're at the edge of a grid so such an action would not be part of the possible states. TODO make params tunablefTakes an integer the width and height of a 2D matrix and a linear index and converts it to a 2D index.<Takes a 2D coordinate and turns it into a linear coordinate.ÿ~Takes the number of rows, number of cols (in a grid), the currents state (specified as a linear index) and an action to determine the next state' (also a linear index). The action can be one of the following: 0: move up 1: move down 2: move left 3: move right. Note that this does not perform any bounds checking. In addition, if the action is invalid, a -1 state is returned. “Takes a grid descirbing reward values (often from environments that look like grids), a state, an action and returns the new state and new reward.§Takes a grid of reward values (i.e. each point in this grid is a state and each state has a reward associated with it) and functions as an "execute" for qLearnIter. lCreate a V.Vector (V.Vector Double) from a [[Double]]. Used to create grid-based environments for the agent.ÿ A grid consisting of some number used primarily for examples. Here's what it looks like: [[1.0,2.0,3.0,4.0], [5.0,6.0,7.0,8.0], [12.0,11.0,10.0,9.0], [13.0,14.0,15.0,16.0]]sA "envPossible" function for use in the Environment data type, specifically for environments that look like grids.> !"#$%&'()*+,-. /012345  6789:;<=   / !"#$%&'()*+,-. /012345  6789:;<=>      !"#$%&'()*+,-./0123456789:;QLear_LcrniGKynVQAV06Td4T6Uo Data.QLearn EnvironmentRewardActionStateStopQLearner initQLearnerinitEnvironment moveLearnermoveLearnerAndPrintmoveLearnerPrintRepeat executeGrid gridFromListtestGrid possibleGrid maxSpaceRow maxSpaceMat padSpacesprettyPrintRow prettyPrintQ' prettyPrintQ createZeroQ maxActionmaxActionValueupdatedQ createGrid qLearnIter linearTo2D twoDToLinearapplyGridAction executeOnGrid envExecute envPossiblegetRewardValuegetActionValue getStateValueqAlphaqGammaqEpsilonqGrid unwrapExecuteunwrapPossible updateQRowindexQ multIndex unwrapMaybe randomActioncreateRewardTable qRandomIter gridPossibleX gridPossibleY gridPossibleqPrint checkEpsilonpick qEpsilonPrintepsilon