This repo contans PyTorch implementations of reinforcement learning models for personal skill development.
REINFORCE is a policy gradient method that calculates the policy gradient at the end of every episode and updates the agents parameters accordingly.
Deep Deterministic Policy Gradient (DDPG) is an off-policy, actor-critic policy gradient method. Similar to Deep Q-Learning, a target and current model are used for the actor (policy) and critic (value) functions, and the target model is gradually updated. DDPG also utilizes an experience replay buffer. Losses are computed from the temporal difference error signal.
- Python 3.5.2
- PyTorch 0.2.0
- NumPy
- OpenAI Gym
- MuJoCo 1.5.0
- Tensorboard
Reward per episode on HalfCheetah-v1
Visualization of learned policy on HalfCheetah-v1