Code for the paper "Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate", co-authored by Dongjie Yu*, Wenjun Zou*, Yujie Yang*, Haitong Ma, Shengbo Eben Li, Yuming Yin, Jianyu Chen and Jingliang Duan.
The paper has been accepted by IEEE Transactions on Automation Science and Engineering. Check the final version here. Congrats to all co-authors!
The code is based on SMBPO by Garrett Thomas. Thank him for his wonderful and clear implementation.
Branch name | Usage |
---|---|
drpo-other_env-viz | DRPO implementation for quadrotor and cartpole-move ; also for ablation_1 on different modules; training curves visualization. |
drpo-safetygym-viz | DRPO implementation for safetygym-car and safetygym-point ; also for ablation_1 on different modules; ablation_2 on different |
csc-other_env, csc-safetygym |
Conservative Safety Critics and MBPO-Lagrangian implementation for different envs. |
smbpo, smbpo-safetygym |
SMBPO and MBPO implementation for different envs. |
drpo-safetygym-ablation_3-constraints | Ablation_3 on different constraint formulations (intermediate policy or shield policy). |
Other branches are all deprecated. |
- Install MuJoCo and mujoco-py.
- Clone safe-control-gym and safety-gym and run
pip install -e .
in both directories to install the two environments. Note that we make changes (such as time-up settings) to the envs so they are different from the versions developed by original authors. You need to install our repositories to run DRPO codes. - run
pip install -r requirements.txt
. - Set the
ROOT_DIR
in./src/defaults.py
as/your/path/to/this/repository
. This is where experiments' logs and checkpoints will be placed.
Run
python main.py -c config/ENV.json
or
sh run-exp_name.sh
in the command line.
-
More envs Now we only support
ENV=cartpole-move, quadrotor, safetygym-car, safetygym-point
. But you are free to customize your own env as long as you implement it withcheck_done
,check_violation
andget_constrained_values
on top of basic gym envs. Remenber to put it in./src/env
and add it in./src/shared.py
-
Change hyper parameters You are free to finetune hyper-parameters in three ways: (1) change values in different
.py
files; (2) change values in./config/ENV.json
and (3) change values in the command line withpython main.py -c config/ENV.json -s PAMRM VALUE
. Use.
to specify hierarchical structure in the config, e.g.-s alg_cfg.horizon 10
. The priorities of the three ways are from low to high (e.g., a value in (1) will be overrided by the value specified in (3)). -
Experiments results will be stored in
./ENV/{time}_{alg_name}_{random_seed}
, together with configs, checkpoints, training and evaluation data.
-
Check and run the command line in the
./src/tester.py
and the results will be stored in the corresponding logs directories. -
Now you can run python files in
./src/viz_cartpole
and./src/viz_quadrotor
to see the learned multipliers, reachability certificates and the test trajectories. Images ofcartpole-move
will be stored in the tester directory in the logs while trajectories ofquadrotor
will be stored in./src/viz_quadrotor
.
- Collect the results of each algorithm in
./logs/ENV/ALGO/{time}_{alg_name}_{random_seed1}
,./logs/ENV/ALGO/{time}_{alg_name}_{random_seed2}
, etc. - See
./src/viz_curves.ipynb
and add your algorithms toalg_list
inhelp_func()
. plot_eval_results_of_all_alg_n_runs(ENV)
and watch the curves.
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with me before making a change. Also free to fork/star and make any changes.
@ARTICLE{yu2023drpo,
author={Yu, Dongjie and Zou, Wenjun and Yang, Yujie and Ma, Haitong and Li, Shengbo Eben and Yin, Yuming and Chen, Jianyu and Duan, Jingliang},
journal={IEEE Transactions on Automation Science and Engineering},
title={Safe Model-Based Reinforcement Learning With an Uncertainty-Aware Reachability Certificate},
year={2023},
volume={},
number={},
pages={1-14},
doi={10.1109/TASE.2023.3292388}
}