Reinforcement Learning Models in PyTorch

Description

This repo contans PyTorch implementations of reinforcement learning models for personal skill development.

REINFORCE
Deep Deterministic Policy Gradient

Background

REINFORCE is a policy gradient method that calculates the policy gradient at the end of every episode and updates the agents parameters accordingly.

Deep Deterministic Policy Gradient (DDPG) is an off-policy, actor-critic policy gradient method. Similar to Deep Q-Learning, a target and current model are used for the actor (policy) and critic (value) functions, and the target model is gradually updated. DDPG also utilizes an experience replay buffer. Losses are computed from the temporal difference error signal.

Dependencies

Python 3.5.2
PyTorch 0.2.0
NumPy
OpenAI Gym
MuJoCo 1.5.0
Tensorboard

Results

DDPG

Reward per episode on HalfCheetah-v1

Visualization of learned policy on HalfCheetah-v1

Useful References

Policy gradient methods
David Silver RL at UCL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reinforcement Learning Models in PyTorch

Description

Background

Dependencies

Results

DDPG

Useful References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reinforcement Learning Models in PyTorch

Description

Background

Dependencies

Results

DDPG

Useful References