Skip to content

Soft actor-critic with beta policy via implicit reparameterization gradients

License

Notifications You must be signed in to change notification settings

lucadellalib/sac-beta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients

This project investigates the use of soft actor-critic (SAC) with the beta policy, which, compared to the normal policy, does not suffer from boundary effect bias and has been shown to converge faster. Implicit reparameterization approaches based on automatic differentiation and optimal mass transport are used to draw samples from the policy in a differentiable manner, as required by SAC. For the experimental evaluation we use four MuJoCo continuous control tasks.


🛠️️ Installation

First of all, install Miniconda. Clone or download and extract the repository, navigate to <path-to-repository>, open a terminal and run:

conda env create -f environment.yml

Project dependencies (pinned to a specific version to reduce compatibility and reproducibility issues) will be installed in a Conda virtual environment named sac-beta.

To activate it, run:

conda activate sac-beta

To deactivate it, run:

conda deactivate

To permanently delete it, run:

conda remove --n sac-beta --all

▶️ Quickstart

Running an experiment

To train one of the available algorithms on a MuJoCo task, open a terminal in scripts and run:

conda activate sac-beta
python <algorithm>.py --task <task>

Logs and experimental results (metrics, checkpoints, etc.) can be found in the auto-generated logs and experiments directory, respectively.

Reproducing the experimental results

The experiments were run on a CentOS Linux 7 machine with an Intel Xeon Gold 6148 Skylake CPU with 20 cores @ 2.40 GHz, 32 GB RAM and an NVIDIA Tesla V100 SXM2 @ 16GB with CUDA Toolkit 11.4.2.

Performance comparison

NOTE: run_experiment.py starts several processes in parallel under the hood, one for each experiment (make sure to have enough RAM and/or GPU memory, or adapt the script to your needs).

To reproduce the experimental results, open a terminal and run:

conda activate sac-beta

python run_experiment.py sac_beta_ad Ant-v4
python run_experiment.py sac_beta_omt Ant-v4
python run_experiment.py sac_normal Ant-v4
python run_experiment.py sac_tanh_normal Ant-v4

python run_experiment.py sac_beta_ad HalfCheetah-v4
python run_experiment.py sac_beta_omt HalfCheetah-v4
python run_experiment.py sac_normal HalfCheetah-v4
python run_experiment.py sac_tanh_normal HalfCheetah-v4

python run_experiment.py sac_beta_ad Hopper-v4
python run_experiment.py sac_beta_omt Hopper-v4
python run_experiment.py sac_normal Hopper-v4
python run_experiment.py sac_tanh_normal Hopper-v4

python run_experiment.py sac_beta_ad Walker2d-v4
python run_experiment.py sac_beta_omt Walker2d-v4
python run_experiment.py sac_normal Walker2d-v4
python run_experiment.py sac_tanh_normal Walker2d-v4

Wait for the experiments to finish. To plot the results, open a terminal and run:

python plotter.py --root-dir ../experiments/Ant-v4 --smooth 1 --shaded-std --legend-pattern "^([\w-]+)" --title Ant-v4 -u --output-path Ant-v4.pdf
python plotter.py --root-dir ../experiments/HalfCheetah-v4 --smooth 1 --shaded-std --legend-pattern "$^" --title HalfCheetah-v4 --ylabel "" -u --output-path HalfCheetah-v4.pdf
python plotter.py --root-dir ../experiments/Hopper-v4 --smooth 1 --shaded-std --legend-pattern "$^" --title Hopper-v4 --ylabel "" -u --output-path Hopper-v4.pdf
python plotter.py --root-dir ../experiments/Walker2d-v4 --smooth 1 --shaded-std --legend-pattern "$^" --title Walker2d-v4 --ylabel "" -u --output-path Walker2d-v4.pdf

Ablation study

NOTE: run_experiment.py starts several processes in parallel under the hood, one for each experiment (make sure to have enough RAM and/or GPU memory, or adapt the script to your needs).

To reproduce the experimental results, open a terminal and run:

conda activate sac-beta

python run_experiment.py sac_beta_omt Ant-v4 --experiment-dir ../experiments/ablation
python run_experiment.py sac_beta_omt_no_clip Ant-v4 --experiment-dir ../experiments/ablation
python run_experiment.py sac_beta_omt_non_concave Ant-v4 --experiment-dir ../experiments/ablation
python run_experiment.py sac_beta_omt_softplus Ant-v4 --experiment-dir ../experiments/ablation

Wait for the experiments to finish. To plot the results, open a terminal and run:

python plotter.py --root-dir ../experiments/ablation/Ant-v4 --smooth 1 --shaded-std --legend-pattern "^([\w-]+)" --title Ant-v4 --fig-length 5 --fig-width 3 -u --output-path ablation.pdf

📧 Contact

[email protected]