Action Space

Box(-1.0, 1.0, (1,), float32)

Observation Shape


Observation High

[0.6 0.07]

Observation Low

[-1.2 -0.07]

Reward Shape


Reward High

[0. 0.]

Reward Low

[-1. -1.]



A continuous version of the MountainCar environment, where the goal is to reach the top of the mountain.

See source for more information.

Reward space:#

The reward space is a 2D vector containing the time penalty and the fuel reward.

  • time penalty: -1.0 for each time step

  • fuel reward: -||action||^2 , i.e. the negative of the norm of the action vector