MO-Reacher¶


Action Space	Discrete(9)
Observation Shape	(6,)
Observation High	inf
Observation Low	-inf
Reward Shape	(4,)
Reward High	[1. 1. 1. 1.]
Reward Low	[-1. -1. -1. -1.]
Import	`mo_gymnasium.make("mo-reacher-v5")`

Description¶

Multi-objective version of the Reacher-v4 environment.

Observation Space¶

The observation is 6-dimensional and contains:

sin and cos of the angles of the central and elbow joints
angular velocity of the central and elbow joints

Action Space¶

The action space is discrete and contains the 3^2=9 possible actions based on applying positive (+1), negative (-1) or zero (0) torque to each of the two joints.

Reward Space¶

The reward is 4-dimensional and is defined based on the distance of the tip of the arm and the four target locations. For each i={1,2,3,4} it is computed as:

    r_i = 1  - 4 * || finger_tip_coord - target_i ||^2

Version History:¶

See https://gymnasium.farama.org/environments/mujoco/reacher/#version-history