Breakable-Bottles#


Action Space	Discrete(3)
Observation Space	Dict(‘bottles_carrying’: Discrete(3), ‘bottles_delivered’: Discrete(2), ‘bottles_dropped’: MultiBinary(3), ‘location’: Discrete(5))
Reward Shape	(3,)
Reward High	[ 0. 50. 0.]
Reward Low	[-inf 0. -1.]
Import	`mo_gymnasium.make("breakable-bottles-v0")`

Description#

This environment implements the problems UnbreakableBottles and BreakableBottles defined in Section 4.1.2 of the paper Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety.

The action space is a discrete space with 3 actions:

The observation space is a dictionary with 4 keys:

location: the current location of the agent
bottles_carrying: the number of bottles the agent is currently carrying (0, 1 or 2)
bottles_delivered: the number of bottles the agent has delivered (0 or 1)
bottles_dropped: for each location, a boolean flag indicating if that location currently contains a bottle

The reward space has 3 dimensions:

time penalty: -1 for each time step
bottle reward: bottle_reward for each bottle delivered
potential: While carrying multiple bottles there is a small probability of dropping them. A potential-based penalty is applied for bottles left on the ground.

The agent starts at location 0, carrying no bottles, having delivered no bottles and having dropped no bottles.

The episode terminates when the agent has delivered 2 bottles.

size: the number of locations in the environment
prob_drop: the probability of dropping a bottle while carrying 2 bottles
time_penalty: the time penalty for each time step
bottle_reward: the reward for delivering a bottle
unbreakable_bottles: if True, a bottle which is dropped in a location can be picked up again (so the outcome of dropping a bottle is reversible), otherwise a dropped bottle cannot be picked up.

This environment was originally a contribution of Robert Klassert