PARL/examples/CQL at develop · PaddlePaddle/PARL

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
mujoco_agent.py		mujoco_agent.py
mujoco_model.py		mujoco_model.py
requirements.txt		requirements.txt
train.py		train.py

README.md

Reproduce CQL with PARL

Based on PARL, the CQL algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper on continuous control datasets from the D4RL benchmark.

Paper: CQL in Conservative Q-Learning for Offline Reinforcement Learning

Env and dataset introduction

D4RL datasets: The algorithm is tested in the D4RL dataset, one of the most commonly used dataset for offline RL. Please see here to know more about D4RL datasets. D4RL require Mujoco as a dependency. For more D4RL usage methods, please refer to its guide.
Mujoco simulator: Please see here to know more about Mujoco simulator and obtain a license.

Benchmark result

How to use

Dependencies:

python3.5+
parl>2.0.3
paddlepaddle>=2.0.4
gym==0.20.0
mujoco-py==2.0.2.8
d4rl (install from source)

Start Training:

Train

# To train for halfcheetah-medium-expert-v0(default), or [halfcheetah/hopper/walker/ant]-[random/medium/expert/medium-expert/medium-replay]-[v0/v2]
python train.py --env [ENV_NAME]

# To reproduce the performance
python train.py --env [ENV_NAME] --with_automatic_entropy_tuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQL

CQL

README.md

Reproduce CQL with PARL

Env and dataset introduction

Benchmark result

How to use

Dependencies:

Start Training:

Train

Files

CQL

Directory actions

More options

Directory actions

More options

Latest commit

History

CQL

Folders and files

parent directory

README.md

Reproduce CQL with PARL

Env and dataset introduction

Benchmark result

How to use

Dependencies:

Start Training:

Train