PARL/examples/PPO at develop · PaddlePaddle/PARL

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
agent.py		agent.py
atari_config.py		atari_config.py
atari_model.py		atari_model.py
env_utils.py		env_utils.py
mujoco_config.py		mujoco_config.py
mujoco_model.py		mujoco_model.py
requirements_atari.txt		requirements_atari.txt
requirements_mujoco.txt		requirements_mujoco.txt
storage.py		storage.py
train.py		train.py

README.md

Reproduce PPO with PARL

Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.

Paper: PPO in Proximal Policy Optimization Algorithms

Mujoco/Atari games introduction

PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco.

Benchmark result

1. Mujoco games results

2. Atari games results

Each experiment was run three times with different seeds

How to use

Mujoco-Dependencies:

python3.7+
paddle>=2.3.1
parl>=2.1.1
gym>=0.26.0
mujoco>=2.2.2

Atari-Dependencies:

paddle>=2.3.1
parl>=2.1.1
gym==0.18.0
atari-py==0.2.6
opencv-python

Training:

# To train an agent for discrete action game (Atari: PongNoFrameskip-v4 by default)
python train.py

# To train an agent for continuous action game (Mujoco)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000

Distributed Training

Accelerate training process by setting xparl_addr and env_num > 1 when environment simulation running very slow.
At first, we can start a local cluster with 8 CPUs:

xparl start --port 8010 --cpu_num 8

Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our documentation.

Then we can start the distributed training by running:

# To train an agent distributedly

# for discrete action game (Atari games)
python train.py --env "PongNoFrameskip-v4" --env_num 8 --xparl_addr 'localhost:8010'

# for continuous action game (Mujoco games)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000 --env_num 5 --xparl_addr 'localhost:8010'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO

PPO

README.md

Reproduce PPO with PARL

Mujoco/Atari games introduction

Benchmark result

1. Mujoco games results

2. Atari games results

How to use

Mujoco-Dependencies:

Atari-Dependencies:

Training:

Distributed Training

Files

PPO

Directory actions

More options

Directory actions

More options

Latest commit

History

PPO

Folders and files

parent directory

README.md

Reproduce PPO with PARL

Mujoco/Atari games introduction

Benchmark result

1. Mujoco games results

2. Atari games results

How to use

Mujoco-Dependencies:

Atari-Dependencies:

Training:

Distributed Training