Water-Reservoir¶
Action Space |
Box(0.0, inf, (1,), float32) |
Observation Shape |
(1,) |
Observation High |
[inf] |
Observation Low |
[0.] |
Reward Shape |
(2,) |
Reward High |
[0. 0.] |
Reward Low |
[-inf -inf] |
Import |
|
Description¶
A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam.
A. Castelletti, F. Pianosi and M. Restelli, “Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems,” The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, QLD, Australia, 2012, pp. 1-8, doi: 10.1109/IJCNN.2012.6252759.
Observation Space¶
The observation is a float corresponding to the current level of the reservoir.
Action Space¶
The action is a float corresponding to the amount of water released by the dam. If normalized_action is True, the action is a float between 0 and 1 corresponding to the percentage of water released by the dam.
Reward Space¶
There are up to 4 rewards:
cost due to excess level wrt a flooding threshold (upstream)
deficit in the water supply wrt the water demand
deficit in hydroelectric supply wrt hydroelectric demand
cost due to excess level wrt a flooding threshold (downstream) By default, only the first two are used.
Starting State¶
The reservoir is initialized with a random level between 0 and 160.
Arguments¶
- render_mode: The render mode to use. Can be 'human', 'rgb_array' or 'ansi'.
- time_limit: The maximum number of steps until the episode is truncated.
- nO: The number of objectives to use. Can be 2, 3 or 4.
- penalize: Whether to penalize the agent for selecting an action out of bounds.
- normalized_action: Whether to normalize the action space as a percentage [0, 1].
- initial_state: The initial state of the reservoir. If None, a random state is used.
Credits¶
Code from: Mathieu Reymond. Ported from: Simone Parisi.
Sky background image from: Paulina Riva (https://opengameart.org/content/sky-background)