Some important concepts and algorithms in RL, all summarized in one place. PDF file is also available here.
- Bandits: settings, exploration-exploitation, UCB, Thompson Sampling
- RL Framework: Markov Decision Process, Markov Property, Bellman Equations
- Dynamic Programming: Policy Evaluation, Policy Iteration, Value Iteration
- Value-Based
- Tabular environments: Tabular Q-learning, SARSA, TD-learning, eligibility traces
- Approximate Q-learning: DQN, prioritized experience replay, Double DQN, Rainbow, DRQN
- Policy Gradients
- On-Policy: REINFORCE, Actor-Critic (with compatible functions, GAE), A2C/A3C, TRPO, PPO
- Off-Policy: Policy gradient theorem, ACER, importance sampling
- Continuous Action Spaces: DDPG, Q-Prop
- Reinforcement Learning and advanced Deep Learning (RLD), Sorbonne University course, by Sylvain Lamprier
- Spinning Up in Deep RL, Open AI
- UCL Course on RL, David Silver's Lecture
Contributions are welcome ! If you find any typo or error, feel free to raise an issue.
If you would like to contribute to the code and make changes directly (e.g. adding algorithms, adding a new section, etc), you should start by cloning the repository.
git clone https://github.com/alxthm/rl-cheatsheet.git
Since all the sources and figures are included in the repo, you can make modifications and build the document locally. For this, you should have a full TeX distribution (if not, you can install it here), and you can then edit the LateX files with any IDE (e.g. Visual Studio Code).
If you'd rather avoid installing LateX, you can also use Overleaf. For this, you need to compress the rl-cheatsheet
folder and upload it to Overleaf (New Project -> Upload Project
).