WaterReservoir#
Action Space 
Box(0.0, inf, (1,), float32) 
Observation Shape 
(1,) 
Observation High 
[inf] 
Observation Low 
[0.] 
Reward Shape 
(2,) 
Reward High 
[0. 0.] 
Reward Low 
[inf inf] 
Import 

Description#
A Water reservoir environment. The agent executes a continuous action, corresponding to the amount of water released by the dam.
A. Castelletti, F. Pianosi and M. Restelli, “Treebased Fitted Qiteration for MultiObjective Markov Decision problems,” The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, QLD, Australia, 2012, pp. 18, doi: 10.1109/IJCNN.2012.6252759.
Observation Space#
The observation is a float corresponding to the current level of the reservoir.
Action Space#
The action is a float corresponding to the amount of water released by the dam. If normalized_action is True, the action is a float between 0 and 1 corresponding to the percentage of water released by the dam.
Reward Space#
There are up to 4 rewards:
cost due to excess level wrt a flooding threshold (upstream)
deficit in the water supply wrt the water demand
deficit in hydroelectric supply wrt hydroelectric demand
cost due to excess level wrt a flooding threshold (downstream) By default, only the first two are used.
Starting State#
The reservoir is initialized with a random level between 0 and 160.
Arguments#
 render_mode: The render mode to use. Can be 'human', 'rgb_array' or 'ansi'.
 time_limit: The maximum number of steps until the episode is truncated.
 nO: The number of objectives to use. Can be 2, 3 or 4.
 penalize: Whether to penalize the agent for selecting an action out of bounds.
 normalized_action: Whether to normalize the action space as a percentage [0, 1].
 initial_state: The initial state of the reservoir. If None, a random state is used.
Credits#
Code from: Mathieu Reymond. Ported from: Simone Parisi.
Sky background image from: Paulina Riva (https://opengameart.org/content/skybackground)