Machine learning algorithms in C++
Public Member Functions | Private Types | Private Member Functions | Private Attributes
GridWorld Class Reference

Class to solve the grid world toy problem as a Markov decision process. More...

#include <GridWorld.hpp>

Collaboration diagram for GridWorld:

Public Member Functions

void policyIteration (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double threshold=.001, bool verbose=true)
 Policy iteration method. More...
 
void valueIteration (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double threshold=.000001, bool verbose=true)
 Value iteration method. More...
 
void MonteCarloEstimatingStarts (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, unsigned maxIters=10000, bool verbose=true)
 Monte Carlo Estimating Starts algorithm for finding an optimal policy. More...
 
void Sarsa (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double alpha=.3, double epsilon=.8, unsigned int maxIters=10000, bool verbose=true)
 Temporal difference method for finding the optimal policy using SARSA. More...
 
void QLearning (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double alpha=.3, double epsilon=.8, unsigned int maxIters=10000, bool verbose=true)
 Temporal difference method for finding the optimal policy using Q-Learning. More...
 
const MatrixD & getV () const
 
const MatrixD & getQ () const
 
const MatrixD & getRewards () const
 
const MatrixD & getPolicy () const
 
double getGamma () const
 
unsigned long getNStates () const
 
const vector< pair< size_t, size_t > > & getGoals () const
 
const vector< ActionType > & getActions () const
 

Private Types

enum  ActionType { UP = 0, DOWN = 1, LEFT = 2, RIGHT = 3 }
 

Private Member Functions

void initialize (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1)
 
size_t fromCoord (size_t row, size_t col)
 Transforms row x column coordinates from the grid world into a raster representation. More...
 
pair< size_t, size_t > toCoord (size_t s)
 Transforms a raster coordinate from the grid world into its corresponding row x column representation. More...
 
double actionValue (size_t s, ActionType a)
 Gets the q value of action a on state s More...
 
Matrix< double > normalizeToOne (MatrixD m)
 Normalizes a matriz so its sum equals 1. More...
 
vector< double > actionValuesForState (size_t s)
 Gets the q values of all actions for a given state. More...
 
Matrix< double > policyIncrement (size_t s)
 Creates a new policy for a given state giving preference to the actions with maximum value. More...
 
string prettifyPolicy ()
 
void iterativePolicyEvaluation (double threshold, bool verbose)
 Iterative policy evaluation implemented as decribed in Sutton and Barto, 2017. More...
 
bool isGoal (size_t s)
 Informs whether a state is a goal state in the grid world. More...
 
double transition (size_t currentState, ActionType action, size_t nextState)
 Returns the transition probability to nextState, given currentState and action More...
 
size_t applyAction (size_t currentState, ActionType action)
 Returns the next state that results from applying an action to a state. More...
 
MatrixD policyForState (size_t s)
 Gets the policy for state s More...
 
size_t getNonGoalState ()
 Selects a random non-goal state. More...
 
ActionType eGreedy (size_t s, double epsilon)
 Selects an action for a state s following an e-greedy policy. More...
 
double bestQForState (size_t s)
 Gets the best action value for state s. More...
 
MatrixD getOptimalPolicyFromQ ()
 Updates the policy matrix according to the action values from the Q matrix. More...
 

Private Attributes

MatrixD V
 
MatrixD Q
 
MatrixD rewards
 
MatrixD policy
 
double gamma
 
unsigned long nStates
 
vector< pair< size_t, size_t > > goals
 
vector< ActionTypeactions = {UP, DOWN, LEFT, RIGHT}
 

Detailed Description

Class to solve the grid world toy problem as a Markov decision process.

Author
Douglas De Rizzo Meneghetti (dougl.nosp@m.asri.nosp@m.zzom@.nosp@m.gmai.nosp@m.l.com)
Date
2017-12-04 Implementation of the grid world as a Markov decision process

Definition at line 18 of file GridWorld.hpp.

Member Enumeration Documentation

◆ ActionType

enum GridWorld::ActionType
private
Enumerator
UP 
DOWN 
LEFT 
RIGHT 

Definition at line 25 of file GridWorld.hpp.

Member Function Documentation

◆ actionValue()

double GridWorld::actionValue ( size_t  s,
ActionType  a 
)
inlineprivate

Gets the q value of action a on state s

Parameters
sa state
aan action
Returns
action value of a in s

Definition at line 81 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ actionValuesForState()

vector<double> GridWorld::actionValuesForState ( size_t  s)
inlineprivate

Gets the q values of all actions for a given state.

Parameters
sa state
Returns
a vector containing the action values in s

Definition at line 115 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ applyAction()

size_t GridWorld::applyAction ( size_t  currentState,
ActionType  action 
)
inlineprivate

Returns the next state that results from applying an action to a state.

Parameters
currentStatea state
actionan action
Returns
future state resulting from applying action to currentState

Definition at line 273 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ bestQForState()

double GridWorld::bestQForState ( size_t  s)
inlineprivate

Gets the best action value for state s.

Action values are taken from the Q matrix.

Parameters
sa state
Returns
best action value for s

Definition at line 356 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ eGreedy()

ActionType GridWorld::eGreedy ( size_t  s,
double  epsilon 
)
inlineprivate

Selects an action for a state s following an e-greedy policy.

Action values are taken from the Q matrix.

Parameters
sa state
epsilone-greedy parameter
Returns
an action

Definition at line 327 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ fromCoord()

size_t GridWorld::fromCoord ( size_t  row,
size_t  col 
)
inlineprivate

Transforms row x column coordinates from the grid world into a raster representation.

Parameters
row
col
Returns

Definition at line 59 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ getActions()

const vector<ActionType>& GridWorld::getActions ( ) const
inline

Definition at line 707 of file GridWorld.hpp.

◆ getGamma()

double GridWorld::getGamma ( ) const
inline

Definition at line 698 of file GridWorld.hpp.

◆ getGoals()

const vector<pair<size_t, size_t> >& GridWorld::getGoals ( ) const
inline

Definition at line 704 of file GridWorld.hpp.

◆ getNonGoalState()

size_t GridWorld::getNonGoalState ( )
inlineprivate

Selects a random non-goal state.

Returns
a random non-goal state

Definition at line 309 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ getNStates()

unsigned long GridWorld::getNStates ( ) const
inline

Definition at line 701 of file GridWorld.hpp.

◆ getOptimalPolicyFromQ()

MatrixD GridWorld::getOptimalPolicyFromQ ( )
inlineprivate

Updates the policy matrix according to the action values from the Q matrix.

Definition at line 370 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ getPolicy()

const MatrixD& GridWorld::getPolicy ( ) const
inline

Definition at line 695 of file GridWorld.hpp.

◆ getQ()

const MatrixD& GridWorld::getQ ( ) const
inline

Definition at line 689 of file GridWorld.hpp.

◆ getRewards()

const MatrixD& GridWorld::getRewards ( ) const
inline

Definition at line 692 of file GridWorld.hpp.

◆ getV()

const MatrixD& GridWorld::getV ( ) const
inline

Definition at line 686 of file GridWorld.hpp.

◆ initialize()

void GridWorld::initialize ( size_t  height,
size_t  width,
vector< pair< size_t, size_t >>  goals,
double  gamma = 1 
)
inlineprivate

Definition at line 28 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ isGoal()

bool GridWorld::isGoal ( size_t  s)
inlineprivate

Informs whether a state is a goal state in the grid world.

Parameters
sa state
Returns
true if s is a goal, otherwise false

Definition at line 244 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ iterativePolicyEvaluation()

void GridWorld::iterativePolicyEvaluation ( double  threshold,
bool  verbose 
)
inlineprivate

Iterative policy evaluation implemented as decribed in Sutton and Barto, 2017.

Parameters
thresholdhat decides whether convergence has been reached
verboseif true, prints to stdout the number of evaluation iterations

Definition at line 211 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ MonteCarloEstimatingStarts()

void GridWorld::MonteCarloEstimatingStarts ( size_t  height,
size_t  width,
vector< pair< size_t, size_t >>  goals,
double  gamma = 1,
unsigned  maxIters = 10000,
bool  verbose = true 
)
inline

Monte Carlo Estimating Starts algorithm for finding an optimal policy.

Parameters
heightheight of the grid world to be generated
widthwidth of the grid world to be generated
goalsvector containing the coordinates of goal states
gammadiscount factor
maxItersmaximum number of iterations
verboseif true, prints to stdout the current policy each second

Definition at line 509 of file GridWorld.hpp.

Here is the call graph for this function:

◆ normalizeToOne()

Matrix<double> GridWorld::normalizeToOne ( MatrixD  m)
inlineprivate

Normalizes a matriz so its sum equals 1.

Parameters
ma matrix
Returns
a matrix, the same dimensions as m, with all elements normalized so their sum equals 1

Definition at line 106 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ policyForState()

MatrixD GridWorld::policyForState ( size_t  s)
inlineprivate

Gets the policy for state s

Parameters
sa state
Returns
a matrix containing the policy for s

Definition at line 301 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ policyIncrement()

Matrix<double> GridWorld::policyIncrement ( size_t  s)
inlineprivate

Creates a new policy for a given state giving preference to the actions with maximum value.

This method uses the current values of the value matrix V to generate the new policy.

Parameters
sa state
Returns
a matrix in which each element represents the probability of selecting the given action, given the action value

Definition at line 130 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ policyIteration()

void GridWorld::policyIteration ( size_t  height,
size_t  width,
vector< pair< size_t, size_t >>  goals,
double  gamma = 1,
double  threshold = .001,
bool  verbose = true 
)
inline

Policy iteration method.

This method evaluates policy using iterative policy evaluation, updating the value matrix V, and updates policy with the new value for V.

Parameters
heightheight of the grid world to be generated
widthwidth of the grid world to be generated
goalsvector containing the coordinates of goal states
gammadiscount factor
thresholdthreshold that will dictate the convergence of the method
verboseif true, prints to stdout number of iterations

Definition at line 403 of file GridWorld.hpp.

Here is the call graph for this function:

◆ prettifyPolicy()

string GridWorld::prettifyPolicy ( )
inlineprivate
Returns
a string with the policy represented as arrow characters

Definition at line 156 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ QLearning()

void GridWorld::QLearning ( size_t  height,
size_t  width,
vector< pair< size_t, size_t >>  goals,
double  gamma = 1,
double  alpha = .3,
double  epsilon = .8,
unsigned int  maxIters = 10000,
bool  verbose = true 
)
inline

Temporal difference method for finding the optimal policy using Q-Learning.

Parameters
heightheight of the grid world to be generated
widthwidth of the grid world to be generated
goalsvector containing the coordinates of goal states
gammadiscount factor
gammalearning rate
gammae-greedy parameter

Definition at line 650 of file GridWorld.hpp.

Here is the call graph for this function:

◆ Sarsa()

void GridWorld::Sarsa ( size_t  height,
size_t  width,
vector< pair< size_t, size_t >>  goals,
double  gamma = 1,
double  alpha = .3,
double  epsilon = .8,
unsigned int  maxIters = 10000,
bool  verbose = true 
)
inline

Temporal difference method for finding the optimal policy using SARSA.

Parameters
heightheight of the grid world to be generated
widthwidth of the grid world to be generated
goalsvector containing the coordinates of goal states
gammadiscount factor
gammalearning rate
gammae-greedy parameter
maxItersmaximum number of iterations
verboseif true, prints to stdout the current policy each second

Definition at line 607 of file GridWorld.hpp.

Here is the call graph for this function:

◆ toCoord()

pair<size_t, size_t> GridWorld::toCoord ( size_t  s)
inlineprivate

Transforms a raster coordinate from the grid world into its corresponding row x column representation.

Parameters
s
Returns

Definition at line 68 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ transition()

double GridWorld::transition ( size_t  currentState,
ActionType  action,
size_t  nextState 
)
inlineprivate

Returns the transition probability to nextState, given currentState and action

Parameters
currentStatethe current state
actionan action to be applied in currentState
nextStatethe possible next state
Returns
probability that applying action in currentState leads to nextState

Definition at line 257 of file GridWorld.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ valueIteration()

void GridWorld::valueIteration ( size_t  height,
size_t  width,
vector< pair< size_t, size_t >>  goals,
double  gamma = 1,
double  threshold = .000001,
bool  verbose = true 
)
inline

Value iteration method.

This method alternates between one step of evaluation of policy, updating the value matrix V, and one step of policy update, using the new value for V.

Parameters
heightheight of the grid world to be generated
widthwidth of the grid world to be generated
goalsvector containing the coordinates of goal states
gammadiscount factor
thresholdthreshold that will dictate the convergence of the method
verboseif true, prints to stdout number of iterations

Definition at line 455 of file GridWorld.hpp.

Here is the call graph for this function:

Field Documentation

◆ actions

vector<ActionType> GridWorld::actions = {UP, DOWN, LEFT, RIGHT}
private

Definition at line 26 of file GridWorld.hpp.

◆ gamma

double GridWorld::gamma
private

Definition at line 21 of file GridWorld.hpp.

◆ goals

vector<pair<size_t, size_t> > GridWorld::goals
private

Definition at line 23 of file GridWorld.hpp.

◆ nStates

unsigned long GridWorld::nStates
private

Definition at line 22 of file GridWorld.hpp.

◆ policy

MatrixD GridWorld::policy
private

Definition at line 20 of file GridWorld.hpp.

◆ Q

MatrixD GridWorld::Q
private

Definition at line 20 of file GridWorld.hpp.

◆ rewards

MatrixD GridWorld::rewards
private

Definition at line 20 of file GridWorld.hpp.

◆ V

MatrixD GridWorld::V
private

Definition at line 20 of file GridWorld.hpp.


The documentation for this class was generated from the following file: