Class to solve the grid world toy problem as a Markov decision process. More...

#include <GridWorld.hpp>

Collaboration diagram for GridWorld:

Public Member Functions
void	policyIteration (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double threshold=.001, bool verbose=true)
	Policy iteration method. More...

void	valueIteration (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double threshold=.000001, bool verbose=true)
	Value iteration method. More...

void	MonteCarloEstimatingStarts (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, unsigned maxIters=10000, bool verbose=true)
	Monte Carlo Estimating Starts algorithm for finding an optimal policy. More...

void	Sarsa (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double alpha=.3, double epsilon=.8, unsigned int maxIters=10000, bool verbose=true)
	Temporal difference method for finding the optimal policy using SARSA. More...

void	QLearning (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double alpha=.3, double epsilon=.8, unsigned int maxIters=10000, bool verbose=true)
	Temporal difference method for finding the optimal policy using Q-Learning. More...

const MatrixD &	getV () const

const MatrixD &	getQ () const

const MatrixD &	getRewards () const

const MatrixD &	getPolicy () const

double	getGamma () const

unsigned long	getNStates () const

const vector< pair< size_t, size_t > > &	getGoals () const

const vector< ActionType > &	getActions () const

Private Types
enum	ActionType { UP = 0, DOWN = 1, LEFT = 2, RIGHT = 3 }

Private Member Functions
void	initialize (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1)

size_t	fromCoord (size_t row, size_t col)
	Transforms row x column coordinates from the grid world into a raster representation. More...

pair< size_t, size_t >	toCoord (size_t s)
	Transforms a raster coordinate from the grid world into its corresponding row x column representation. More...

double	actionValue (size_t s, ActionType a)
	Gets the q value of action `a` on state `s` More...

Matrix< double >	normalizeToOne (MatrixD m)
	Normalizes a matriz so its sum equals 1. More...

vector< double >	actionValuesForState (size_t s)
	Gets the q values of all actions for a given state. More...

Matrix< double >	policyIncrement (size_t s)
	Creates a new policy for a given state giving preference to the actions with maximum value. More...

string	prettifyPolicy ()

void	iterativePolicyEvaluation (double threshold, bool verbose)
	Iterative policy evaluation implemented as decribed in Sutton and Barto, 2017. More...

bool	isGoal (size_t s)
	Informs whether a state is a goal state in the grid world. More...

double	transition (size_t currentState, ActionType action, size_t nextState)
	Returns the transition probability to `nextState`, given `currentState` and `action` More...

size_t	applyAction (size_t currentState, ActionType action)
	Returns the next state that results from applying an action to a state. More...

MatrixD	policyForState (size_t s)
	Gets the policy for state `s` More...

size_t	getNonGoalState ()
	Selects a random non-goal state. More...

ActionType	eGreedy (size_t s, double epsilon)
	Selects an action for a state `s` following an e-greedy policy. More...

double	bestQForState (size_t s)
	Gets the best action value for state `s`. More...

MatrixD	getOptimalPolicyFromQ ()
	Updates the policy matrix according to the action values from the Q matrix. More...

Private Attributes
MatrixD	V

MatrixD	Q

MatrixD	rewards

MatrixD	policy

double	gamma

unsigned long	nStates

vector< pair< size_t, size_t > >	goals

vector< ActionType >	actions = {UP, DOWN, LEFT, RIGHT}

Detailed Description

Class to solve the grid world toy problem as a Markov decision process.

Author: Douglas De Rizzo Meneghetti (dougl.nosp@m.asri.nosp@m.zzom@.nosp@m.gmai.nosp@m.l.com)

Date: 2017-12-04 Implementation of the grid world as a Markov decision process

Definition at line 18 of file GridWorld.hpp.

Member Enumeration Documentation

◆ ActionType

enum GridWorld::ActionType

private

Enumerator
UP
DOWN
LEFT
RIGHT

Definition at line 25 of file GridWorld.hpp.

Member Function Documentation

◆ actionValue()

double GridWorld::actionValue	(	size_t	s,
		ActionType	a
	)

inlineprivate

Gets the q value of action a on state s

Parameters

s	a state
a	an action

Returns: action value of a in s

Definition at line 81 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ actionValuesForState()

vector<double> GridWorld::actionValuesForState ( size_t s )

inlineprivate

Gets the q values of all actions for a given state.

Parameters

s a state

Returns: a vector containing the action values in s

Definition at line 115 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ applyAction()

size_t GridWorld::applyAction	(	size_t	currentState,
		ActionType	action
	)

inlineprivate

Returns the next state that results from applying an action to a state.

Parameters

currentState	a state
action	an action

Returns: future state resulting from applying action to currentState

Definition at line 273 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ bestQForState()

double GridWorld::bestQForState ( size_t s )

inlineprivate

Gets the best action value for state s.

Action values are taken from the Q matrix.

Parameters

s a state

Returns: best action value for s

Definition at line 356 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ eGreedy()

ActionType GridWorld::eGreedy	(	size_t	s,
		double	epsilon
	)

inlineprivate

Selects an action for a state s following an e-greedy policy.

Action values are taken from the Q matrix.

Parameters

s	a state
epsilon	e-greedy parameter

Returns: an action

Definition at line 327 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ fromCoord()

size_t GridWorld::fromCoord	(	size_t	row,
		size_t	col
	)

inlineprivate

Transforms row x column coordinates from the grid world into a raster representation.

Parameters

row
col

Returns

Definition at line 59 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ getActions()

const vector<ActionType>& GridWorld::getActions ( ) const

inline

Definition at line 707 of file GridWorld.hpp.

◆ getGamma()

double GridWorld::getGamma ( ) const

inline

Definition at line 698 of file GridWorld.hpp.

◆ getGoals()

const vector<pair<size_t, size_t> >& GridWorld::getGoals ( ) const

inline

Definition at line 704 of file GridWorld.hpp.

◆ getNonGoalState()

size_t GridWorld::getNonGoalState ( )

inlineprivate

Selects a random non-goal state.

Returns: a random non-goal state

Definition at line 309 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ getNStates()

unsigned long GridWorld::getNStates ( ) const

inline

Definition at line 701 of file GridWorld.hpp.

◆ getOptimalPolicyFromQ()

MatrixD GridWorld::getOptimalPolicyFromQ ( )

inlineprivate

Updates the policy matrix according to the action values from the Q matrix.

Definition at line 370 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ getPolicy()

const MatrixD& GridWorld::getPolicy ( ) const

inline

Definition at line 695 of file GridWorld.hpp.

◆ getQ()

const MatrixD& GridWorld::getQ ( ) const

inline

Definition at line 689 of file GridWorld.hpp.

◆ getRewards()

const MatrixD& GridWorld::getRewards ( ) const

inline

Definition at line 692 of file GridWorld.hpp.

◆ getV()

const MatrixD& GridWorld::getV ( ) const

inline

Definition at line 686 of file GridWorld.hpp.

◆ initialize()

void GridWorld::initialize	(	size_t	height,
		size_t	width,
		vector< pair< size_t, size_t >>	goals,
		double	gamma = `1`
	)

inlineprivate

Definition at line 28 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ isGoal()

bool GridWorld::isGoal ( size_t s )

inlineprivate

Informs whether a state is a goal state in the grid world.

Parameters

s a state

Returns: true if s is a goal, otherwise false

Definition at line 244 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ iterativePolicyEvaluation()

void GridWorld::iterativePolicyEvaluation	(	double	threshold,
		bool	verbose
	)

inlineprivate

Iterative policy evaluation implemented as decribed in Sutton and Barto, 2017.

Parameters

threshold	hat decides whether convergence has been reached
verbose	if true, prints to stdout the number of evaluation iterations

Definition at line 211 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ MonteCarloEstimatingStarts()

void GridWorld::MonteCarloEstimatingStarts	(	size_t	height,
		size_t	width,
		vector< pair< size_t, size_t >>	goals,
		double	gamma = `1`,
		unsigned	maxIters = `10000`,
		bool	verbose = `true`
	)

inline

Monte Carlo Estimating Starts algorithm for finding an optimal policy.

Parameters

height	height of the grid world to be generated
width	width of the grid world to be generated
goals	vector containing the coordinates of goal states
gamma	discount factor
maxIters	maximum number of iterations
verbose	if true, prints to stdout the current policy each second

Definition at line 509 of file GridWorld.hpp.

Here is the call graph for this function:

◆ normalizeToOne()

Matrix<double> GridWorld::normalizeToOne ( MatrixD m )

inlineprivate

Normalizes a matriz so its sum equals 1.

Parameters

m a matrix

Returns: a matrix, the same dimensions as m, with all elements normalized so their sum equals 1

Definition at line 106 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ policyForState()

MatrixD GridWorld::policyForState ( size_t s )

inlineprivate

Gets the policy for state s

Parameters

s a state

Returns: a matrix containing the policy for s

Definition at line 301 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ policyIncrement()

Matrix<double> GridWorld::policyIncrement ( size_t s )

inlineprivate

Creates a new policy for a given state giving preference to the actions with maximum value.

This method uses the current values of the value matrix V to generate the new policy.

Parameters

s a state

Returns: a matrix in which each element represents the probability of selecting the given action, given the action value

Definition at line 130 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ policyIteration()

void GridWorld::policyIteration	(	size_t	height,
		size_t	width,
		vector< pair< size_t, size_t >>	goals,
		double	gamma = `1`,
		double	threshold = `.001`,
		bool	verbose = `true`
	)

inline

Policy iteration method.

This method evaluates policy using iterative policy evaluation, updating the value matrix V, and updates policy with the new value for V.

Parameters

height	height of the grid world to be generated
width	width of the grid world to be generated
goals	vector containing the coordinates of goal states
gamma	discount factor
threshold	threshold that will dictate the convergence of the method
verbose	if true, prints to stdout number of iterations

Definition at line 403 of file GridWorld.hpp.

Here is the call graph for this function:

◆ prettifyPolicy()

string GridWorld::prettifyPolicy ( )

inlineprivate

Returns: a string with the policy represented as arrow characters

Definition at line 156 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ QLearning()

void GridWorld::QLearning	(	size_t	height,
		size_t	width,
		vector< pair< size_t, size_t >>	goals,
		double	gamma = `1`,
		double	alpha = `.3`,
		double	epsilon = `.8`,
		unsigned int	maxIters = `10000`,
		bool	verbose = `true`
	)

inline

Temporal difference method for finding the optimal policy using Q-Learning.

Parameters

height	height of the grid world to be generated
width	width of the grid world to be generated
goals	vector containing the coordinates of goal states
gamma	discount factor
gamma	learning rate
gamma	e-greedy parameter

Definition at line 650 of file GridWorld.hpp.

Here is the call graph for this function:

◆ Sarsa()

void GridWorld::Sarsa	(	size_t	height,
		size_t	width,
		vector< pair< size_t, size_t >>	goals,
		double	gamma = `1`,
		double	alpha = `.3`,
		double	epsilon = `.8`,
		unsigned int	maxIters = `10000`,
		bool	verbose = `true`
	)

inline

Temporal difference method for finding the optimal policy using SARSA.

Parameters

height	height of the grid world to be generated
width	width of the grid world to be generated
goals	vector containing the coordinates of goal states
gamma	discount factor
gamma	learning rate
gamma	e-greedy parameter
maxIters	maximum number of iterations
verbose	if true, prints to stdout the current policy each second

Definition at line 607 of file GridWorld.hpp.

Here is the call graph for this function:

◆ toCoord()

pair<size_t, size_t> GridWorld::toCoord ( size_t s )

inlineprivate

Transforms a raster coordinate from the grid world into its corresponding row x column representation.

Parameters

s

Returns

Definition at line 68 of file GridWorld.hpp.

Here is the caller graph for this function:

◆ transition()

double GridWorld::transition	(	size_t	currentState,
		ActionType	action,
		size_t	nextState
	)

inlineprivate

Returns the transition probability to nextState, given currentState and action

Parameters

currentState	the current state
action	an action to be applied in currentState
nextState	the possible next state

Returns: probability that applying action in currentState leads to nextState

Definition at line 257 of file GridWorld.hpp.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ valueIteration()

void GridWorld::valueIteration	(	size_t	height,
		size_t	width,
		vector< pair< size_t, size_t >>	goals,
		double	gamma = `1`,
		double	threshold = `.000001`,
		bool	verbose = `true`
	)

inline

Value iteration method.

This method alternates between one step of evaluation of policy, updating the value matrix V, and one step of policy update, using the new value for V.

Parameters

height	height of the grid world to be generated
width	width of the grid world to be generated
goals	vector containing the coordinates of goal states
gamma	discount factor
threshold	threshold that will dictate the convergence of the method
verbose	if true, prints to stdout number of iterations

Definition at line 455 of file GridWorld.hpp.

Here is the call graph for this function:

Field Documentation

◆ actions

vector<ActionType> GridWorld::actions = {UP, DOWN, LEFT, RIGHT}

private

Definition at line 26 of file GridWorld.hpp.

◆ gamma

double GridWorld::gamma

private

Definition at line 21 of file GridWorld.hpp.

◆ goals

vector<pair<size_t, size_t> > GridWorld::goals

private

Definition at line 23 of file GridWorld.hpp.

◆ nStates

unsigned long GridWorld::nStates

private

Definition at line 22 of file GridWorld.hpp.

◆ policy

MatrixD GridWorld::policy

private

Definition at line 20 of file GridWorld.hpp.

◆ Q

MatrixD GridWorld::Q

private

Definition at line 20 of file GridWorld.hpp.

◆ rewards

MatrixD GridWorld::rewards

private

Definition at line 20 of file GridWorld.hpp.

◆ V

MatrixD GridWorld::V

private

Definition at line 20 of file GridWorld.hpp.

The documentation for this class was generated from the following file:

include/GridWorld.hpp

Public Member Functions

Private Types

Private Member Functions

Private Attributes

Detailed Description

Member Enumeration Documentation

◆ ActionType

Member Function Documentation

◆ actionValue()

◆ actionValuesForState()

◆ applyAction()

◆ bestQForState()

◆ eGreedy()

◆ fromCoord()

◆ getActions()

◆ getGamma()

◆ getGoals()

◆ getNonGoalState()

◆ getNStates()

◆ getOptimalPolicyFromQ()

◆ getPolicy()

◆ getQ()

◆ getRewards()

◆ getV()

◆ initialize()

◆ isGoal()

◆ iterativePolicyEvaluation()

◆ MonteCarloEstimatingStarts()

◆ normalizeToOne()

◆ policyForState()

◆ policyIncrement()

◆ policyIteration()

◆ prettifyPolicy()

◆ QLearning()

◆ Sarsa()

◆ toCoord()

◆ transition()

◆ valueIteration()

Field Documentation

◆ actions

◆ gamma

◆ goals

◆ nStates

◆ policy

◆ Q

◆ rewards

◆ V