Vector Wrappers¶
Similar to the normal wrappers, MO-Gymnasium provides a few wrappers that are specifically designed to work with vectorized environments. They are all available directly from the mo_gymnasium.wrappers.vector
module.
MOSyncVectorEnv
¶
- class mo_gymnasium.wrappers.vector.MOSyncVectorEnv(env_fns: Iterator[callable], copy: bool = True)¶
Vectorized environment that serially runs multiple environments.
Example
>>> import mo_gymnasium as mo_gym
>>> envs = mo_gym.wrappers.vector.MOSyncVectorEnv([ ... lambda: mo_gym.make("deep-sea-treasure-v0") for _ in range(4) ... ]) >>> envs MOSyncVectorEnv(num_envs=4) >>> obs, infos = envs.reset() >>> obs array([[0, 0], [0, 0], [0, 0], [0, 0]], dtype=int32) >>> _ = envs.action_space.seed(42) >>> actions = envs.action_space.sample() >>> obs, rewards, terminateds, truncateds, infos = envs.step([0, 1, 2, 3]) >>> obs array([[0, 0], [1, 0], [0, 0], [0, 3]], dtype=int32) >>> rewards array([[0., -1.], [0.7, -1.], [0., -1.], [0., -1.]], dtype=float32) >>> terminateds array([False, True, False, False])
MORecordEpisodeStatistics
¶
- class mo_gymnasium.wrappers.vector.MORecordEpisodeStatistics(env: VectorEnv, gamma: float = 1.0, buffer_length: int = 100, stats_key: str = 'episode')¶
This wrapper will keep track of cumulative rewards and episode lengths.
At the end of any episode within the vectorized env, the statistics of the episode will be added to
info
using the keyepisode
, and the_episode
key is used to indicate the environment index which has a terminated or truncated episode.For a vectorized environments the output will be in the form of (be careful to first wrap the env into vector before applying MORewordStatistics):
>>> infos = { ... "episode": { ... "r": "<array of cumulative reward for each done sub-environment (2d array, shape (num_envs, dim_reward))>", ... "dr": "<array of discounted reward for each done sub-environment (2d array, shape (num_envs, dim_reward))>", ... "l": "<array of episode length for each done sub-environment (array)>", ... "t": "<array of elapsed time since beginning of episode for each done sub-environment (array)>" ... }, ... "_episode": "<boolean array of length num-envs>" ... }
Moreover, the most recent rewards and episode lengths are stored in buffers that can be accessed via
wrapped_env.return_queue
andwrapped_env.length_queue
respectively.- Variables:
return_queue – The cumulative rewards of the last
deque_size
-many episodeslength_queue – The lengths of the last
deque_size
-many episodes