Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.
Paper: PPO in Proximal Policy Optimization Algorithms
PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco.
- Each experiment was run three times with different seeds
- python3.7+
- paddle>=2.3.1
- parl>=2.1.1
- gym>=0.26.0
- mujoco>=2.2.2
- paddle>=2.3.1
- parl>=2.1.1
- gym==0.18.0
- atari-py==0.2.6
- opencv-python
# To train an agent for discrete action game (Atari: PongNoFrameskip-v4 by default)
python train.py
# To train an agent for continuous action game (Mujoco)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000
Accelerate training process by setting xparl_addr
and env_num > 1
when environment simulation running very slow.
At first, we can start a local cluster with 8 CPUs:
xparl start --port 8010 --cpu_num 8
Note that if you have started a master before, you don't have to run the above command. For more information about the cluster, please refer to our documentation.
Then we can start the distributed training by running:
# To train an agent distributedly
# for discrete action game (Atari games)
python train.py --env "PongNoFrameskip-v4" --env_num 8 --xparl_addr 'localhost:8010'
# for continuous action game (Mujoco games)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000 --env_num 5 --xparl_addr 'localhost:8010'