Resource-Gathering¶
Action Space |
Discrete(4) |
Observation Shape |
(4,) |
Observation High |
[5 5 5 5] |
Observation Low |
[0 0 0 0] |
Reward Shape |
(3,) |
Reward High |
[0. 1. 1.] |
Reward Low |
[-1. 0. 0.] |
Import |
|
Description¶
From “Barrett, Leon & Narayanan, Srini. (2008). Learning all optimal policies with multiple criteria. Proceedings of the 25th International Conference on Machine Learning. 41-47. 10.1145/1390156.1390162.”
Observation Space¶
The observation is discrete and consists of 4 elements:
0: The x coordinate of the agent
1: The y coordinate of the agent
2: Flag indicating if the agent collected the gold
3: Flag indicating if the agent collected the diamond
Action Space¶
The action is discrete and consists of 4 elements:
0: Move up
1: Move down
2: Move left
3: Move right
Reward Space¶
The reward is 3-dimensional:
0: -1 if killed by an enemy, else 0
1: +1 if returned home with gold, else 0
2: +1 if returned home with diamond, else 0
Starting State¶
The agent starts at the home position with no gold or diamond.
Episode Termination¶
The episode terminates when the agent returns home, or when the agent is killed by an enemy.
Credits¶
The home asset is from https://limezu.itch.io/serenevillagerevamped The gold, enemy and gem assets are from https://ninjikin.itch.io/treasure