Vector Wrappers

Similar to the normal wrappers, MO-Gymnasium provides a few wrappers that are specifically designed to work with vectorized environments. They are all available directly from the mo_gymnasium.wrappers.vector module.

MOSyncVectorEnv

class mo_gymnasium.wrappers.vector.MOSyncVectorEnv(env_fns: Iterator[callable], copy: bool = True)

Vectorized environment that serially runs multiple environments.

Example

>>> import mo_gymnasium as mo_gym
>>> envs = mo_gym.wrappers.vector.MOSyncVectorEnv([
...     lambda: mo_gym.make("deep-sea-treasure-v0") for _ in range(4)
... ])
>>> envs
MOSyncVectorEnv(num_envs=4)
>>> obs, infos = envs.reset()
>>> obs
array([[0, 0], [0, 0], [0, 0], [0, 0]], dtype=int32)
>>> _ = envs.action_space.seed(42)
>>> actions = envs.action_space.sample()
>>> obs, rewards, terminateds, truncateds, infos = envs.step([0, 1, 2, 3])
>>> obs
array([[0, 0], [1, 0], [0, 0], [0, 3]], dtype=int32)
>>> rewards
array([[0., -1.], [0.7, -1.], [0., -1.], [0., -1.]], dtype=float32)
>>> terminateds
array([False,  True, False, False])

MORecordEpisodeStatistics

class mo_gymnasium.wrappers.vector.MORecordEpisodeStatistics(env: VectorEnv, gamma: float = 1.0, buffer_length: int = 100, stats_key: str = 'episode')

This wrapper will keep track of cumulative rewards and episode lengths.

At the end of any episode within the vectorized env, the statistics of the episode will be added to info using the key episode, and the _episode key is used to indicate the environment index which has a terminated or truncated episode.

For a vectorized environments the output will be in the form of (be careful to first wrap the env into vector before applying MORewordStatistics):

>>> infos = { 
...     "episode": {
...         "r": "<array of cumulative reward for each done sub-environment (2d array, shape (num_envs, dim_reward))>",
...         "dr": "<array of discounted reward for each done sub-environment (2d array, shape (num_envs, dim_reward))>",
...         "l": "<array of episode length for each done sub-environment (array)>",
...         "t": "<array of elapsed time since beginning of episode for each done sub-environment (array)>"
...     },
...     "_episode": "<boolean array of length num-envs>"
... }

Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via wrapped_env.return_queue and wrapped_env.length_queue respectively.

Variables:
  • return_queue – The cumulative rewards of the last deque_size-many episodes

  • length_queue – The lengths of the last deque_size-many episodes