Breakable-Bottles¶
Action Space |
Discrete(3) |
Observation Space |
Dict(‘bottles_carrying’: Discrete(3), ‘bottles_delivered’: Discrete(3), ‘bottles_dropped’: MultiBinary(3), ‘location’: Discrete(5)) |
Reward Shape |
(3,) |
Reward High |
[ 0. 50. 0.] |
Reward Low |
[-inf 0. -1.] |
Import |
|
Description¶
This environment implements the problems UnbreakableBottles and BreakableBottles defined in Section 4.1.2 of the paper Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety.
Action Space¶
The action space is a discrete space with 3 actions:
0: move left
1: move right
2: pick up a bottle
Observation Space¶
The observation space is a dictionary with 4 keys:
location: the current location of the agent
bottles_carrying: the number of bottles the agent is currently carrying (0, 1 or 2)
bottles_delivered: the number of bottles the agent has delivered (0, 1 or 2)
bottles_dropped: for each location, a boolean flag indicating if that location currently contains a bottle
Note that this observation space is different from that listed in the paper above. In the paper, bottles_delivered’s possible values are listed as (0 or 1), rather than (0, 1 or 2). This is because the paper did not take the terminal state, in which 2 bottles have been delivered, into account when calculating the observation space. As such, the observation space of this implementation is larger than specified in the paper, having 360 possible states instead of 240.
Reward Space¶
The reward space has 3 dimensions:
time penalty: -1 for each time step
bottle reward: bottle_reward for each bottle delivered
potential: While carrying multiple bottles there is a small probability of dropping them. A potential-based penalty is applied for bottles left on the ground.
Starting State¶
The agent starts at location 0, carrying no bottles, having delivered no bottles and having dropped no bottles.
Episode Termination¶
The episode terminates when the agent has delivered 2 bottles.
Arguments¶
size: the number of locations in the environment
prob_drop: the probability of dropping a bottle while carrying 2 bottles
time_penalty: the time penalty for each time step
bottle_reward: the reward for delivering a bottle
unbreakable_bottles: if True, a bottle which is dropped in a location can be picked up again (so the outcome of dropping a bottle is reversible), otherwise a dropped bottle cannot be picked up.
Credits¶
This environment was originally a contribution of Robert Klassert The home asset is from https://limezu.itch.io/serenevillagerevamped The gold, enemy and gem assets are from https://ninjikin.itch.io/treasure The bottles pixel art was created with the assistance of DALL·E 2.