Class to solve the grid world toy problem as a Markov decision process.
More...
#include <GridWorld.hpp>
|
void | policyIteration (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double threshold=.001, bool verbose=true) |
| Policy iteration method. More...
|
|
void | valueIteration (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double threshold=.000001, bool verbose=true) |
| Value iteration method. More...
|
|
void | MonteCarloEstimatingStarts (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, unsigned maxIters=10000, bool verbose=true) |
| Monte Carlo Estimating Starts algorithm for finding an optimal policy. More...
|
|
void | Sarsa (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double alpha=.3, double epsilon=.8, unsigned int maxIters=10000, bool verbose=true) |
| Temporal difference method for finding the optimal policy using SARSA. More...
|
|
void | QLearning (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1, double alpha=.3, double epsilon=.8, unsigned int maxIters=10000, bool verbose=true) |
| Temporal difference method for finding the optimal policy using Q-Learning. More...
|
|
const MatrixD & | getV () const |
|
const MatrixD & | getQ () const |
|
const MatrixD & | getRewards () const |
|
const MatrixD & | getPolicy () const |
|
double | getGamma () const |
|
unsigned long | getNStates () const |
|
const vector< pair< size_t, size_t > > & | getGoals () const |
|
const vector< ActionType > & | getActions () const |
|
|
void | initialize (size_t height, size_t width, vector< pair< size_t, size_t >> goals, double gamma=1) |
|
size_t | fromCoord (size_t row, size_t col) |
| Transforms row x column coordinates from the grid world into a raster representation. More...
|
|
pair< size_t, size_t > | toCoord (size_t s) |
| Transforms a raster coordinate from the grid world into its corresponding row x column representation. More...
|
|
double | actionValue (size_t s, ActionType a) |
| Gets the q value of action a on state s More...
|
|
Matrix< double > | normalizeToOne (MatrixD m) |
| Normalizes a matriz so its sum equals 1. More...
|
|
vector< double > | actionValuesForState (size_t s) |
| Gets the q values of all actions for a given state. More...
|
|
Matrix< double > | policyIncrement (size_t s) |
| Creates a new policy for a given state giving preference to the actions with maximum value. More...
|
|
string | prettifyPolicy () |
|
void | iterativePolicyEvaluation (double threshold, bool verbose) |
| Iterative policy evaluation implemented as decribed in Sutton and Barto, 2017. More...
|
|
bool | isGoal (size_t s) |
| Informs whether a state is a goal state in the grid world. More...
|
|
double | transition (size_t currentState, ActionType action, size_t nextState) |
| Returns the transition probability to nextState , given currentState and action More...
|
|
size_t | applyAction (size_t currentState, ActionType action) |
| Returns the next state that results from applying an action to a state. More...
|
|
MatrixD | policyForState (size_t s) |
| Gets the policy for state s More...
|
|
size_t | getNonGoalState () |
| Selects a random non-goal state. More...
|
|
ActionType | eGreedy (size_t s, double epsilon) |
| Selects an action for a state s following an e-greedy policy. More...
|
|
double | bestQForState (size_t s) |
| Gets the best action value for state s . More...
|
|
MatrixD | getOptimalPolicyFromQ () |
| Updates the policy matrix according to the action values from the Q matrix. More...
|
|
Class to solve the grid world toy problem as a Markov decision process.
- Author
- Douglas De Rizzo Meneghetti (dougl.nosp@m.asri.nosp@m.zzom@.nosp@m.gmai.nosp@m.l.com)
- Date
- 2017-12-04 Implementation of the grid world as a Markov decision process
Definition at line 18 of file GridWorld.hpp.
◆ ActionType
◆ actionValue()
double GridWorld::actionValue |
( |
size_t |
s, |
|
|
ActionType |
a |
|
) |
| |
|
inlineprivate |
Gets the q value of action a
on state s
- Parameters
-
- Returns
- action value of
a
in s
Definition at line 81 of file GridWorld.hpp.
◆ actionValuesForState()
vector<double> GridWorld::actionValuesForState |
( |
size_t |
s | ) |
|
|
inlineprivate |
Gets the q values of all actions for a given state.
- Parameters
-
- Returns
- a vector containing the action values in
s
Definition at line 115 of file GridWorld.hpp.
◆ applyAction()
size_t GridWorld::applyAction |
( |
size_t |
currentState, |
|
|
ActionType |
action |
|
) |
| |
|
inlineprivate |
Returns the next state that results from applying an action to a state.
- Parameters
-
currentState | a state |
action | an action |
- Returns
- future state resulting from applying action to currentState
Definition at line 273 of file GridWorld.hpp.
◆ bestQForState()
double GridWorld::bestQForState |
( |
size_t |
s | ) |
|
|
inlineprivate |
Gets the best action value for state s
.
Action values are taken from the Q matrix.
- Parameters
-
- Returns
- best action value for
s
Definition at line 356 of file GridWorld.hpp.
◆ eGreedy()
ActionType GridWorld::eGreedy |
( |
size_t |
s, |
|
|
double |
epsilon |
|
) |
| |
|
inlineprivate |
Selects an action for a state s
following an e-greedy policy.
Action values are taken from the Q matrix.
- Parameters
-
s | a state |
epsilon | e-greedy parameter |
- Returns
- an action
Definition at line 327 of file GridWorld.hpp.
◆ fromCoord()
size_t GridWorld::fromCoord |
( |
size_t |
row, |
|
|
size_t |
col |
|
) |
| |
|
inlineprivate |
Transforms row x column coordinates from the grid world into a raster representation.
- Parameters
-
- Returns
Definition at line 59 of file GridWorld.hpp.
◆ getActions()
const vector<ActionType>& GridWorld::getActions |
( |
| ) |
const |
|
inline |
◆ getGamma()
double GridWorld::getGamma |
( |
| ) |
const |
|
inline |
◆ getGoals()
const vector<pair<size_t, size_t> >& GridWorld::getGoals |
( |
| ) |
const |
|
inline |
◆ getNonGoalState()
size_t GridWorld::getNonGoalState |
( |
| ) |
|
|
inlineprivate |
Selects a random non-goal state.
- Returns
- a random non-goal state
Definition at line 309 of file GridWorld.hpp.
◆ getNStates()
unsigned long GridWorld::getNStates |
( |
| ) |
const |
|
inline |
◆ getOptimalPolicyFromQ()
MatrixD GridWorld::getOptimalPolicyFromQ |
( |
| ) |
|
|
inlineprivate |
Updates the policy matrix according to the action values from the Q matrix.
Definition at line 370 of file GridWorld.hpp.
◆ getPolicy()
const MatrixD& GridWorld::getPolicy |
( |
| ) |
const |
|
inline |
◆ getQ()
const MatrixD& GridWorld::getQ |
( |
| ) |
const |
|
inline |
◆ getRewards()
const MatrixD& GridWorld::getRewards |
( |
| ) |
const |
|
inline |
◆ getV()
const MatrixD& GridWorld::getV |
( |
| ) |
const |
|
inline |
◆ initialize()
void GridWorld::initialize |
( |
size_t |
height, |
|
|
size_t |
width, |
|
|
vector< pair< size_t, size_t >> |
goals, |
|
|
double |
gamma = 1 |
|
) |
| |
|
inlineprivate |
◆ isGoal()
bool GridWorld::isGoal |
( |
size_t |
s | ) |
|
|
inlineprivate |
Informs whether a state is a goal state in the grid world.
- Parameters
-
- Returns
- true if
s
is a goal, otherwise false
Definition at line 244 of file GridWorld.hpp.
◆ iterativePolicyEvaluation()
void GridWorld::iterativePolicyEvaluation |
( |
double |
threshold, |
|
|
bool |
verbose |
|
) |
| |
|
inlineprivate |
Iterative policy evaluation implemented as decribed in Sutton and Barto, 2017.
- Parameters
-
threshold | hat decides whether convergence has been reached |
verbose | if true, prints to stdout the number of evaluation iterations |
Definition at line 211 of file GridWorld.hpp.
◆ MonteCarloEstimatingStarts()
void GridWorld::MonteCarloEstimatingStarts |
( |
size_t |
height, |
|
|
size_t |
width, |
|
|
vector< pair< size_t, size_t >> |
goals, |
|
|
double |
gamma = 1 , |
|
|
unsigned |
maxIters = 10000 , |
|
|
bool |
verbose = true |
|
) |
| |
|
inline |
Monte Carlo Estimating Starts algorithm for finding an optimal policy.
- Parameters
-
height | height of the grid world to be generated |
width | width of the grid world to be generated |
goals | vector containing the coordinates of goal states |
gamma | discount factor |
maxIters | maximum number of iterations |
verbose | if true, prints to stdout the current policy each second |
Definition at line 509 of file GridWorld.hpp.
◆ normalizeToOne()
Matrix<double> GridWorld::normalizeToOne |
( |
MatrixD |
m | ) |
|
|
inlineprivate |
Normalizes a matriz so its sum equals 1.
- Parameters
-
- Returns
- a matrix, the same dimensions as m, with all elements normalized so their sum equals 1
Definition at line 106 of file GridWorld.hpp.
◆ policyForState()
MatrixD GridWorld::policyForState |
( |
size_t |
s | ) |
|
|
inlineprivate |
Gets the policy for state s
- Parameters
-
- Returns
- a matrix containing the policy for
s
Definition at line 301 of file GridWorld.hpp.
◆ policyIncrement()
Matrix<double> GridWorld::policyIncrement |
( |
size_t |
s | ) |
|
|
inlineprivate |
Creates a new policy for a given state giving preference to the actions with maximum value.
This method uses the current values of the value matrix V
to generate the new policy.
- Parameters
-
- Returns
- a matrix in which each element represents the probability of selecting the given action, given the action value
Definition at line 130 of file GridWorld.hpp.
◆ policyIteration()
void GridWorld::policyIteration |
( |
size_t |
height, |
|
|
size_t |
width, |
|
|
vector< pair< size_t, size_t >> |
goals, |
|
|
double |
gamma = 1 , |
|
|
double |
threshold = .001 , |
|
|
bool |
verbose = true |
|
) |
| |
|
inline |
Policy iteration method.
This method evaluates policy
using iterative policy evaluation, updating the value matrix V
, and updates policy
with the new value for V
.
- Parameters
-
height | height of the grid world to be generated |
width | width of the grid world to be generated |
goals | vector containing the coordinates of goal states |
gamma | discount factor |
threshold | threshold that will dictate the convergence of the method |
verbose | if true, prints to stdout number of iterations |
Definition at line 403 of file GridWorld.hpp.
◆ prettifyPolicy()
string GridWorld::prettifyPolicy |
( |
| ) |
|
|
inlineprivate |
- Returns
- a string with the policy represented as arrow characters
Definition at line 156 of file GridWorld.hpp.
◆ QLearning()
void GridWorld::QLearning |
( |
size_t |
height, |
|
|
size_t |
width, |
|
|
vector< pair< size_t, size_t >> |
goals, |
|
|
double |
gamma = 1 , |
|
|
double |
alpha = .3 , |
|
|
double |
epsilon = .8 , |
|
|
unsigned int |
maxIters = 10000 , |
|
|
bool |
verbose = true |
|
) |
| |
|
inline |
Temporal difference method for finding the optimal policy using Q-Learning.
- Parameters
-
height | height of the grid world to be generated |
width | width of the grid world to be generated |
goals | vector containing the coordinates of goal states |
gamma | discount factor |
gamma | learning rate |
gamma | e-greedy parameter |
Definition at line 650 of file GridWorld.hpp.
◆ Sarsa()
void GridWorld::Sarsa |
( |
size_t |
height, |
|
|
size_t |
width, |
|
|
vector< pair< size_t, size_t >> |
goals, |
|
|
double |
gamma = 1 , |
|
|
double |
alpha = .3 , |
|
|
double |
epsilon = .8 , |
|
|
unsigned int |
maxIters = 10000 , |
|
|
bool |
verbose = true |
|
) |
| |
|
inline |
Temporal difference method for finding the optimal policy using SARSA.
- Parameters
-
height | height of the grid world to be generated |
width | width of the grid world to be generated |
goals | vector containing the coordinates of goal states |
gamma | discount factor |
gamma | learning rate |
gamma | e-greedy parameter |
maxIters | maximum number of iterations |
verbose | if true, prints to stdout the current policy each second |
Definition at line 607 of file GridWorld.hpp.
◆ toCoord()
pair<size_t, size_t> GridWorld::toCoord |
( |
size_t |
s | ) |
|
|
inlineprivate |
Transforms a raster coordinate from the grid world into its corresponding row x column representation.
- Parameters
-
- Returns
Definition at line 68 of file GridWorld.hpp.
◆ transition()
double GridWorld::transition |
( |
size_t |
currentState, |
|
|
ActionType |
action, |
|
|
size_t |
nextState |
|
) |
| |
|
inlineprivate |
Returns the transition probability to nextState
, given currentState
and action
- Parameters
-
currentState | the current state |
action | an action to be applied in currentState |
nextState | the possible next state |
- Returns
- probability that applying
action
in currentState
leads to nextState
Definition at line 257 of file GridWorld.hpp.
◆ valueIteration()
void GridWorld::valueIteration |
( |
size_t |
height, |
|
|
size_t |
width, |
|
|
vector< pair< size_t, size_t >> |
goals, |
|
|
double |
gamma = 1 , |
|
|
double |
threshold = .000001 , |
|
|
bool |
verbose = true |
|
) |
| |
|
inline |
Value iteration method.
This method alternates between one step of evaluation of policy
, updating the value matrix V
, and one step of policy
update, using the new value for V
.
- Parameters
-
height | height of the grid world to be generated |
width | width of the grid world to be generated |
goals | vector containing the coordinates of goal states |
gamma | discount factor |
threshold | threshold that will dictate the convergence of the method |
verbose | if true, prints to stdout number of iterations |
Definition at line 455 of file GridWorld.hpp.
◆ actions
◆ gamma
◆ goals
vector<pair<size_t, size_t> > GridWorld::goals |
|
private |
◆ nStates
unsigned long GridWorld::nStates |
|
private |
◆ policy
MatrixD GridWorld::policy |
|
private |
◆ rewards
MatrixD GridWorld::rewards |
|
private |
The documentation for this class was generated from the following file: