Wrappers¶

A few wrappers inspired from Gymnasium’s wrappers are available in MO-Gymnasium. They are all available directly from the mo_gymnasium.wrappers module.

`LinearReward`¶

class mo_gymnasium.wrappers.LinearReward(env: Env, weight: ndarray | None = None)¶: Makes the env return a scalar reward, which is the dot-product between the reward vector and the weight vector.

`MONormalizeReward`¶

class mo_gymnasium.wrappers.MONormalizeReward(env: Env, idx: int, gamma: float = 0.99, epsilon: float = 1e-08)¶

Wrapper to normalize the reward component at index idx. Does not touch other reward components.

This code is heavily inspired on Gymnasium’s except that it extracts the reward component at given idx, normalizes it, and reinjects it.

(!) This smoothes the moving average of the reward, which can be useful for training stability. But it does not “normalize” the reward in the sense of making it have a mean of 0 and a standard deviation of 1.

Example

>>> import mo_gymnasium as mo_gym
>>> from mo_gymnasium.wrappers import MONormalizeReward
>>> env = mo_gym.make("deep-sea-treasure-v0")
>>> norm_treasure_env = MONormalizeReward(env, idx=0)
>>> both_norm_env = MONormalizeReward(norm_treasure_env, idx=1)
>>> both_norm_env.reset() # This one normalizes both rewards

`MOClipReward`¶

class mo_gymnasium.wrappers.MOClipReward(env: Env, idx: int, min_r, max_r)¶: Clip reward[idx] to [min, max].

`MORecordEpisodeStatistics`¶

class mo_gymnasium.wrappers.MORecordEpisodeStatistics(env: Env, gamma: float = 1.0, buffer_length: int = 100, stats_key: str = 'episode')¶

This wrapper will keep track of cumulative rewards and episode lengths.

After the completion of an episode, info will look like this:

>>> info = {
...     "episode": {
...         "r": "<cumulative reward (array)>",
...         "dr": "<discounted reward (array)>",
...         "l": "<episode length (scalar)>",
...         "t": "<elapsed time since beginning of episode (scalar)>"
...     },
... }

`MOMaxAndSkipObservation`¶

class mo_gymnasium.wrappers.MOMaxAndSkipObservation(env: Env[ObsType, ActType], skip: int = 4)¶

This wrapper will return only every skip-th frame (frameskipping) and return the max between the two last observations.

Note: This wrapper is based on the wrapper from stable-baselines3: https://stable-baselines3.readthedocs.io/en/master/_modules/stable_baselines3/common/atari_wrappers.html#MaxAndSkipEnv

Wrappers¶

LinearReward¶

MONormalizeReward¶

MOClipReward¶

MORecordEpisodeStatistics¶

MOMaxAndSkipObservation¶

`LinearReward`¶

`MONormalizeReward`¶

`MOClipReward`¶

`MORecordEpisodeStatistics`¶

`MOMaxAndSkipObservation`¶