MO-Reacher¶
Action Space |
Discrete(9) |
Observation Shape |
(6,) |
Observation High |
inf |
Observation Low |
-inf |
Reward Shape |
(4,) |
Reward High |
[1. 1. 1. 1.] |
Reward Low |
[-1. -1. -1. -1.] |
Import |
|
Description¶
Multi-objective version of the Reacher-v4
environment.
Observation Space¶
The observation is 6-dimensional and contains:
sin and cos of the angles of the central and elbow joints
angular velocity of the central and elbow joints
Action Space¶
The action space is discrete and contains the 3^2=9 possible actions based on applying positive (+1), negative (-1) or zero (0) torque to each of the two joints.
Reward Space¶
The reward is 4-dimensional and is defined based on the distance of the tip of the arm and the four target locations. For each i={1,2,3,4} it is computed as:
r_i = 1 - 4 * || finger_tip_coord - target_i ||^2
Version History:¶
See https://gymnasium.farama.org/environments/mujoco/reacher/#version-history