In this exercise we will use the included racetrack_environment
in order to write our first reinforcement learning algorithm.
The used algorithm is Monte-Carlo learning in an on- and off-policy fashion.
- policy evaluation using first-visit Monte-Carlo
- on-policy epsilon-greedy control using first-visit Monte-Carlo
- off-policy epsilon-greedy control with weighted importance sampling Monte-Carlo
- extra challenge
(Source: https://media.giphy.com/media/UqZ4imFIoljlr5O2sM/giphy.gif)