Action Space


Observation Shape


Observation High

[11 11]

Observation Low

[0 0]

Reward Shape


Reward High

[23.7 -1. ]

Reward Low

[ 0. -1.]




The Deep Sea Treasure environment is classic MORL problem in which the agent controls a submarine in a 2D grid world.

Observation Space#

The observation space is a 2D discrete box with values in [0, 10] for the x and y coordinates of the submarine.

Action Space#

The actions is a discrete space where:

  • 0: up

  • 1: down

  • 2: left

  • 3: right

Reward Space#

The reward is 2-dimensional:

  • time penalty: -1 at each time step

  • treasure value: the value of the treasure at the current position

Starting State#

The starting state is always the same: (0, 0)

Episode Termination#

The episode terminates when the agent reaches a treasure.


  • dst_map: the map of the deep sea treasure. Default is the convex map from Yang et al. (2019). To change, use mo_gymnasium.make("DeepSeaTreasure-v0", dst_map=CONCAVE_MAP | MIRRORED_MAP).

  • float_state: if True, the state is a 2D continuous box with values in [0.0, 1.0] for the x and y coordinates of the submarine.


The code was adapted from: Yang’s source. The background art is from The submarine art was created with the assistance of DALL·E 2.