Reward High

[1. 1. 1. 1.]

Reward Low

[-1. -1. -1. -1.]




Mujoco version of mo-reacher-v0, based on Reacher-v4 environment.

Observation Space#

The observation is 6-dimensional and contains:

  • sin and cos of the angles of the central and elbow joints

  • angular velocity of the central and elbow joints

Action Space#

The action space is discrete and contains the 3^2=9 possible actions based on applying positive (+1), negative (-1) or zero (0) torque to each of the two joints.

Reward Space#

The reward is 4-dimensional and is defined based on the distance of the tip of the arm and the four target locations. For each i={1,2,3,4} it is computed as:

    r_i = 1  - 4 * || finger_tip_coord - target_i ||^2