This project investigates the use of soft actor-critic (SAC) with the beta policy, which, compared to the normal policy, does not suffer from boundary effect bias and has been shown to converge faster. Implicit reparameterization approaches based on automatic differentiation and optimal mass transport are used to draw samples from the policy in a differentiable manner, as required by SAC. For the experimental evaluation we use four MuJoCo continuous control tasks.
First of all, install Miniconda.
Clone or download and extract the repository, navigate to <path-to-repository>
, open a terminal and run:
conda env create -f environment.yml
Project dependencies (pinned to a specific version to reduce compatibility and reproducibility issues)
will be installed in a Conda virtual environment named sac-beta
.
To activate it, run:
conda activate sac-beta
To deactivate it, run:
conda deactivate
To permanently delete it, run:
conda remove --n sac-beta --all
To train one of the available algorithms on a MuJoCo task, open a terminal in scripts
and run:
conda activate sac-beta
python <algorithm>.py --task <task>
Logs and experimental results (metrics, checkpoints, etc.) can be found in the auto-generated logs
and experiments
directory, respectively.
The experiments were run on a CentOS Linux 7 machine with an Intel Xeon Gold 6148 Skylake CPU with 20 cores @ 2.40 GHz, 32 GB RAM and an NVIDIA Tesla V100 SXM2 @ 16GB with CUDA Toolkit 11.4.2.
NOTE: run_experiment.py
starts several processes in parallel under the hood, one for each experiment
(make sure to have enough RAM and/or GPU memory, or adapt the script to your needs).
To reproduce the experimental results, open a terminal and run:
conda activate sac-beta
python run_experiment.py sac_beta_ad Ant-v4
python run_experiment.py sac_beta_omt Ant-v4
python run_experiment.py sac_normal Ant-v4
python run_experiment.py sac_tanh_normal Ant-v4
python run_experiment.py sac_beta_ad HalfCheetah-v4
python run_experiment.py sac_beta_omt HalfCheetah-v4
python run_experiment.py sac_normal HalfCheetah-v4
python run_experiment.py sac_tanh_normal HalfCheetah-v4
python run_experiment.py sac_beta_ad Hopper-v4
python run_experiment.py sac_beta_omt Hopper-v4
python run_experiment.py sac_normal Hopper-v4
python run_experiment.py sac_tanh_normal Hopper-v4
python run_experiment.py sac_beta_ad Walker2d-v4
python run_experiment.py sac_beta_omt Walker2d-v4
python run_experiment.py sac_normal Walker2d-v4
python run_experiment.py sac_tanh_normal Walker2d-v4
Wait for the experiments to finish. To plot the results, open a terminal and run:
python plotter.py --root-dir ../experiments/Ant-v4 --smooth 1 --shaded-std --legend-pattern "^([\w-]+)" --title Ant-v4 -u --output-path Ant-v4.pdf
python plotter.py --root-dir ../experiments/HalfCheetah-v4 --smooth 1 --shaded-std --legend-pattern "$^" --title HalfCheetah-v4 --ylabel "" -u --output-path HalfCheetah-v4.pdf
python plotter.py --root-dir ../experiments/Hopper-v4 --smooth 1 --shaded-std --legend-pattern "$^" --title Hopper-v4 --ylabel "" -u --output-path Hopper-v4.pdf
python plotter.py --root-dir ../experiments/Walker2d-v4 --smooth 1 --shaded-std --legend-pattern "$^" --title Walker2d-v4 --ylabel "" -u --output-path Walker2d-v4.pdf
NOTE: run_experiment.py
starts several processes in parallel under the hood, one for each experiment
(make sure to have enough RAM and/or GPU memory, or adapt the script to your needs).
To reproduce the experimental results, open a terminal and run:
conda activate sac-beta
python run_experiment.py sac_beta_omt Ant-v4 --experiment-dir ../experiments/ablation
python run_experiment.py sac_beta_omt_no_clip Ant-v4 --experiment-dir ../experiments/ablation
python run_experiment.py sac_beta_omt_non_concave Ant-v4 --experiment-dir ../experiments/ablation
python run_experiment.py sac_beta_omt_softplus Ant-v4 --experiment-dir ../experiments/ablation
Wait for the experiments to finish. To plot the results, open a terminal and run:
python plotter.py --root-dir ../experiments/ablation/Ant-v4 --smooth 1 --shaded-std --legend-pattern "^([\w-]+)" --title Ant-v4 --fig-length 5 --fig-width 3 -u --output-path ablation.pdf