Based on PARL, the CQL algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper on continuous control datasets from the D4RL benchmark.
Paper: CQL in Conservative Q-Learning for Offline Reinforcement Learning
- D4RL datasets: The algorithm is tested in the D4RL dataset, one of the most commonly used dataset for offline RL. Please see here to know more about D4RL datasets. D4RL require Mujoco as a dependency. For more D4RL usage methods, please refer to its guide.
- Mujoco simulator: Please see here to know more about Mujoco simulator and obtain a license.
- python3.5+
- parl>2.0.3
- paddlepaddle>=2.0.4
- gym==0.20.0
- mujoco-py==2.0.2.8
- d4rl (install from source)
# To train for halfcheetah-medium-expert-v0(default), or [halfcheetah/hopper/walker/ant]-[random/medium/expert/medium-expert/medium-replay]-[v0/v2]
python train.py --env [ENV_NAME]
# To reproduce the performance
python train.py --env [ENV_NAME] --with_automatic_entropy_tuning