Skip to content

Latest commit

 

History

History

PPO

Reproduce PPO with PARL

Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.

Paper: PPO in Proximal Policy Optimization Algorithms

Mujoco/Atari games introduction

PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco.

Benchmark result

1. Mujoco games results

mujoco-result

2. Atari games results

atari-result

  • Each experiment was run three times with different seeds

How to use

Mujoco-Dependencies:

Atari-Dependencies:

Training:

# To train an agent for discrete action game (Atari: PongNoFrameskip-v4 by default)
python train.py

# To train an agent for continuous action game (Mujoco)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000

Distributed Training

Accelerate training process by setting xparl_addr and env_num > 1 when environment simulation running very slow.
At first, we can start a local cluster with 8 CPUs:

xparl start --port 8010 --cpu_num 8

Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our documentation.

Then we can start the distributed training by running:

# To train an agent distributedly

# for discrete action game (Atari games)
python train.py --env "PongNoFrameskip-v4" --env_num 8 --xparl_addr 'localhost:8010'

# for continuous action game (Mujoco games)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000 --env_num 5 --xparl_addr 'localhost:8010'