Wrappers#

A few wrappers inspired from Gymnasium’s wrappers are available in MO-Gymnasium. They are all available directly from the mo_gymnasium module.

LinearReward#

class mo_gymnasium.LinearReward(env: Env, weight: ndarray | None = None)#

Makes the env return a scalar reward, which is the dot-product between the reward vector and the weight vector.

MONormalizeReward#

class mo_gymnasium.MONormalizeReward(env: Env, idx: int, gamma: float = 0.99, epsilon: float = 1e-08)#

Wrapper to normalize the reward component at index idx. Does not touch other reward components.

MOClipReward#

class mo_gymnasium.MOClipReward(env: Env, idx: int, min_r, max_r)#

Clip reward[idx] to [min, max].

MOSyncVectorEnv#

class mo_gymnasium.MOSyncVectorEnv(env_fns: Iterator[callable], copy: bool = True)#

Vectorized environment that serially runs multiple environments.

MORecordEpisodeStatistics#

class mo_gymnasium.MORecordEpisodeStatistics(env: Env, gamma: float = 1.0, deque_size: int = 100)#

This wrapper will keep track of cumulative rewards and episode lengths.

After the completion of an episode, info will look like this:

>>> info = {
...     "episode": {
...         "r": "<cumulative reward (array)>",
...         "dr": "<discounted reward (array)>",
...         "l": "<episode length (scalar)>", # contrary to Gymnasium, these are not a numpy array
...         "t": "<elapsed time since beginning of episode (scalar)>"
...     },
... }

For a vectorized environments the output will be in the form of (be careful to first wrap the env into vector before applying MORewordStatistics):

>>> infos = {
...     "final_observation": "<array of length num-envs>",
...     "_final_observation": "<boolean array of length num-envs>",
...     "final_info": "<array of length num-envs>",
...     "_final_info": "<boolean array of length num-envs>",
...     "episode": {
...         "r": "<array of cumulative reward (2d array, shape (num_envs, dim_reward))>",
...         "dr": "<array of discounted reward (2d array, shape (num_envs, dim_reward))>",
...         "l": "<array of episode length (array)>",
...         "t": "<array of elapsed time since beginning of episode (array)>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }