Breakable-Bottles#
Action Space |
Discrete(3) |
Observation Space |
Dict(‘bottles_carrying’: Discrete(3), ‘bottles_delivered’: Discrete(2), ‘bottles_dropped’: MultiBinary(3), ‘location’: Discrete(5)) |
Reward Shape |
(3,) |
Reward High |
[ 0. 50. 0.] |
Reward Low |
[-inf 0. -1.] |
Import |
|
Description#
This environment implements the problems UnbreakableBottles and BreakableBottles defined in Section 4.1.2 of the paper Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety.
Action Space#
The action space is a discrete space with 3 actions:
0: move left
1: move right
2: pick up a bottle
Observation Space#
The observation space is a dictionary with 4 keys:
location: the current location of the agent
bottles_carrying: the number of bottles the agent is currently carrying (0, 1 or 2)
bottles_delivered: the number of bottles the agent has delivered (0 or 1)
bottles_dropped: for each location, a boolean flag indicating if that location currently contains a bottle
Reward Space#
The reward space has 3 dimensions:
time penalty: -1 for each time step
bottle reward: bottle_reward for each bottle delivered
potential: While carrying multiple bottles there is a small probability of dropping them. A potential-based penalty is applied for bottles left on the ground.
Starting State#
The agent starts at location 0, carrying no bottles, having delivered no bottles and having dropped no bottles.
Episode Termination#
The episode terminates when the agent has delivered 2 bottles.
Arguments#
size: the number of locations in the environment
prob_drop: the probability of dropping a bottle while carrying 2 bottles
time_penalty: the time penalty for each time step
bottle_reward: the reward for delivering a bottle
unbreakable_bottles: if True, a bottle which is dropped in a location can be picked up again (so the outcome of dropping a bottle is reversible), otherwise a dropped bottle cannot be picked up.
Credits#
This environment was originally a contribution of Robert Klassert