Four-Room¶
Action Space |
Discrete(4) |
Observation Shape |
(14,) |
Observation High |
[13 13 13 13 13 13 13 13 13 13 13 13 13 13] |
Observation Low |
[0 0 0 0 0 0 0 0 0 0 0 0 0 0] |
Reward Shape |
(3,) |
Reward High |
[1. 1. 1.] |
Reward Low |
[0. 0. 0.] |
Import |
|
Description¶
A discretized version of the gridworld environment introduced in [1]. Here, an agent learns to collect shapes with positive reward, while avoid those with negative reward, and then travel to a fixed goal. The gridworld is split into four rooms separated by walls with passage-ways.
References¶
[1] Barreto, André, et al. “Successor Features for Transfer in Reinforcement Learning.” NIPS. 2017.
Observation Space¶
The observation contains the 2D position of the agent in the gridworld, plus a binary vector indicating which items were collected.
Action Space¶
The action space is discrete with 4 actions: left, up, right, down.
Reward Space¶
The reward is a 3-dimensional vector with the following components:
+1 if collected a blue square, else 0
+1 if collected a green triangle, else 0
+1 if collected a red circle, else 0
Starting State¶
The agent starts in the lower left of the map.
Episode Termination¶
The episode terminates when the agent reaches the goal state, G.
Arguments¶
maze: Array containing the gridworld map. See MAZE for an example.
Credits¶
Code adapted from: Mike Gimelfarb’s source.