Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new algorithms #52

Merged
merged 40 commits into from
Dec 23, 2022
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
07766ea
refactor: change on_policy structure
Gaiejj Dec 22, 2022
2bb1e22
feat: add Saute algorithm
Gaiejj Dec 22, 2022
462cdc3
feat: add saute wrapper
Gaiejj Dec 22, 2022
8b201e3
feat: add new algorithms
Gaiejj Dec 22, 2022
f8d99f0
add new algorithms
Gaiejj Dec 22, 2022
63a6275
add new algorithms
Gaiejj Dec 22, 2022
c6b4b32
docs: update README.md
Gaiejj Dec 22, 2022
24d656a
refactor: correct comments
Gaiejj Dec 22, 2022
d328045
refactor: correct comments
Gaiejj Dec 22, 2022
c2756ce
docs: update README.md
XuehaiPan Dec 22, 2022
cce970c
chore(algorithms): rerender `__init__.py`
XuehaiPan Dec 22, 2022
4e12cbd
docs: update dictionary
XuehaiPan Dec 22, 2022
a3d1695
refactor: reformat the comments
Gaiejj Dec 22, 2022
bd9d75b
chore(algorithms): make registration immutable
XuehaiPan Dec 22, 2022
d4b3317
feat: add __init__.py
Gaiejj Dec 22, 2022
90a2ba8
refactor: reformat the comments
Gaiejj Dec 22, 2022
41fad8f
refactor: reformat the comments
Gaiejj Dec 22, 2022
6f3a0d9
refactor: reformat the comments
Gaiejj Dec 22, 2022
42f856c
refactor: reformat the comments
Gaiejj Dec 22, 2022
dbec13e
refactor: reformat the comments
Gaiejj Dec 22, 2022
034f8c3
docs: update dictionary
XuehaiPan Dec 22, 2022
35b3f23
chore(algorithms): rerender `__init__.py`
XuehaiPan Dec 22, 2022
1ee26d0
chore(algorithms): remove module references
XuehaiPan Dec 22, 2022
e7f11cd
fix: reformat the comments
Gaiejj Dec 22, 2022
2fda913
docs: update README.md
XuehaiPan Dec 22, 2022
660519e
docs: update docstrings
XuehaiPan Dec 22, 2022
d96d4c4
chore(env_wrapper, tests): make registration immutable
Gaiejj Dec 22, 2022
5447946
docs: update docstrings
Gaiejj Dec 22, 2022
d7aa5a3
docs: update docstrings
XuehaiPan Dec 22, 2022
83733f0
fix(model): fix compute bugs
rockmagma02 Dec 22, 2022
070ef9d
Merge pull request #1 from rockmagma02/model-bugs-fix
Gaiejj Dec 22, 2022
5054d6b
refactor: reframe algorithms
Gaiejj Dec 22, 2022
7148693
feat: add configs tool function
Gaiejj Dec 23, 2022
65cc1db
docs: update docstring
Gaiejj Dec 23, 2022
ec55b57
docs: update docstrings
XuehaiPan Dec 23, 2022
bab3964
style: cleanup `__init__` arguments
XuehaiPan Dec 23, 2022
c59a5e9
style: cleanup `__init__` arguments
XuehaiPan Dec 23, 2022
c5e8fdd
fix: test for namedtuple
XuehaiPan Dec 23, 2022
363d40c
chore: appease linters
XuehaiPan Dec 23, 2022
6654837
docs(README): update README.md
XuehaiPan Dec 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 48 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The simulation environment around OmniSafe and a series of reliable algorithm im

- [Overview](#overview)
- [Implemented Algorithms](#implemented-algorithms)
- [Published in 2022](#published-in-2022)
- [Published **in 2022**](#published-in-2022)
- [List of Algorithms](#list-of-algorithms)
- [SafeRL Environments](#saferl-environments)
- [Safety Gymnasium](#safety-gymnasium)
Expand All @@ -49,11 +49,11 @@ Here we provide a table for comparison of **OmniSafe's algorithm core** and exis

| SafeRL<br/>Platform | Backend | Engine | # Safe Algo. | Parallel<br/> CPU/GPU | New Gym API<sup>**(4)**</sup> | Vision Input |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----: | :---------------------------: | ------------------- | :-------------------: | :---------------------------: | :-----------------: |
| [Safety-Gym](https://github.com/openai/safety-gym)<br/>![GitHub last commit](https://img.shields.io/github/last-commit/openai/safety-gym?label=last%20update) | TF1 | `mujoco-py`<sup>**(1)**</sup> | 3 | CPU Only (`mpi4py`) | :x: | minimally supported |
| [safe-control-gym](https://github.com/utiasDSL/safe-control-gym)<br/>![GitHub last commit](https://img.shields.io/github/last-commit/utiasDSL/safe-control-gym?label=last%20update) | PyTorch | PyBullet | 5<sup>**(2)**</sup> | | :x: | :x: |
| Velocity-Constraints<sup>**(3)**</sup> | N/A | N/A | N/A | N/A | :x: | :x: |
| [mujoco-circle](https://github.com/ymzhang01/mujoco-circle)<br/>![GitHub last commit](https://img.shields.io/github/last-commit/ymzhang01/mujoco-circle?label=last%20update) | PyTorch | N/A | 0 | N/A | :x: | :x: |
| OmniSafe<br/>![GitHub last commit](https://img.shields.io/github/last-commit/PKU-MARL/omnisafe?label=last%20update) | PyTorch | **MuJoCo 2.3.0+** | **25+** | `torch.distributed` | :heavy_check_mark: | :heavy_check_mark: |
| [Safety-Gym](https://github.com/openai/safety-gym)<br/>![GitHub last commit](https://img.shields.io/github/last-commit/openai/safety-gym?label=last%20update) | TF1 | `mujoco-py`<sup>**(1)**</sup> | 3 | CPU Only (`mpi4py`) | | minimally supported |
| [safe-control-gym](https://github.com/utiasDSL/safe-control-gym)<br/>![GitHub last commit](https://img.shields.io/github/last-commit/utiasDSL/safe-control-gym?label=last%20update) | PyTorch | PyBullet | 5<sup>**(2)**</sup> | | | |
| Velocity-Constraints<sup>**(3)**</sup> | N/A | N/A | N/A | N/A | | |
| [mujoco-circle](https://github.com/ymzhang01/mujoco-circle)<br/>![GitHub last commit](https://img.shields.io/github/last-commit/ymzhang01/mujoco-circle?label=last%20update) | PyTorch | N/A | 0 | N/A | | |
| OmniSafe<br/>![GitHub last commit](https://img.shields.io/github/last-commit/PKU-MARL/omnisafe?label=last%20update) | PyTorch | **MuJoCo 2.3.0+** | **25+** | `torch.distributed` | ✅ | ✅ |

<sup>(1): Maintenance (expect bug fixes and minor updates), the last commit is 19 Nov 2021. Safety Gym depends on `mujoco-py` 2.0.2.7, which was updated on Oct 12, 2019.</sup><br/>
<sup>(2): We only count the safe's algorithm.</sup><br/>
Expand All @@ -68,61 +68,63 @@ The supported interface algorithms currently include:

### Published **in 2022**

- 😃 **[AAAI 2023]** Augmented Proximal Policy Optimization for Safe Reinforcement Learning (APPO) **The original author of the paper contributed code**
- 😃 **[NeurIPS 2022]** [Constrained Update Projection Approach to Safe Policy Optimization (CUP)](https://arxiv.org/abs/2209.07089) **The original author of the paper contributed code**
- 😞 **Under Test**[NeurIPS 2022] [Effects of Safety State Augmentation on
- [X] **[AAAI 2023]** Augmented Proximal Policy Optimization for Safe Reinforcement Learning (APPO) **The original author of the paper contributed code**
- [X] **[NeurIPS 2022]** [Constrained Update Projection Approach to Safe Policy Optimization (CUP)](https://arxiv.org/abs/2209.07089) **The original author of the paper contributed code**
- [ ] **[NeurIPS 2022]** (Under Testing) [Effects of Safety State Augmentation on
Safe Exploration (Swimmer)](https://arxiv.org/abs/2206.02675)
- 😃 **[NeurIPS 2022]** [Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm](https://arxiv.org/abs/2210.07573)
- 😞 **Under Test**[ICML 2022] [Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)](https://arxiv.org/abs/2202.06558)
- 😞 **Under Test**[ICML 2022] [Constrained Variational Policy Optimization for Safe Reinforcement Learning (CVPO)](https://arxiv.org/abs/2201.11927)
- 😃 **[IJCAI 2022]** [Penalized Proximal Policy Optimization for Safe Reinforcement Learning](https://arxiv.org/abs/2205.11814) **The original author of the paper contributed code**
- **[ICLR 2022]** [Constrained Policy Optimization via Bayesian World Models (LAMBDA)](https://arxiv.org/abs/2201.09802)
- **[AAAI 2022]** [Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)](https://arxiv.org/abs/2112.07701)

- [X] **[NeurIPS 2022]** [Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm](https://arxiv.org/abs/2210.07573)
- [ ] **[ICML 2022]** (Under Testing) [Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)](https://arxiv.org/abs/2202.06558)
- [ ] **[ICML 2022]** (Under Testing) [Constrained Variational Policy Optimization for Safe Reinforcement Learning (CVPO)](https://arxiv.org/abs/2201.11927)
- [X] **[IJCAI 2022]** [Penalized Proximal Policy Optimization for Safe Reinforcement Learning](https://arxiv.org/abs/2205.11814) **The original author of the paper contributed code**
- [ ] **[ICLR 2022]** [Constrained Policy Optimization via Bayesian World Models (LAMBDA)](https://arxiv.org/abs/2201.09802)
- [ ] **[AAAI 2022]** [Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)](https://arxiv.org/abs/2112.07701)

### List of Algorithms

> On Policy Safe
- :heavy_check_mark:[The Lagrange version of PPO (PPO-Lag)](https://cdn.openai.com/safexp-short.pdf)
- :heavy_check_mark:[The Lagrange version of TRPO (TRPO-Lag)](https://cdn.openai.com/safexp-short.pdf)
- :heavy_check_mark:[ICML 2017][Constrained Policy Optimization (CPO)](https://proceedings.mlr.press/v70/achiam17a)
- :heavy_check_mark:[ICLR 2019][Reward Constrained Policy Optimization (RCPO)](https://openreview.net/forum?id=SkfrvsA9FX)
- :heavy_check_mark:[ICML 2020][Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (PID-Lag)](https://arxiv.org/abs/2007.03964)
- :heavy_check_mark:[NeurIPS 2020][First Order Constrained Optimization in Policy Space (FOCOPS)](https://arxiv.org/abs/2002.06506)
- :heavy_check_mark:[AAAI 2020][IPO: Interior-point Policy Optimization under Constraints (IPO)](https://arxiv.org/abs/1910.09615)
- :heavy_check_mark:[ICLR 2020][Projection-Based Constrained Policy Optimization (PCPO)](https://openreview.net/forum?id=rke3TJrtPS)
- :heavy_check_mark:[ICML 2021][CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee](https://arxiv.org/abs/2011.05869)

- [X] [The Lagrange version of PPO (PPO-Lag)](https://cdn.openai.com/safexp-short.pdf)
- [X] [The Lagrange version of TRPO (TRPO-Lag)](https://cdn.openai.com/safexp-short.pdf)
- [X] **[ICML 2017]** [Constrained Policy Optimization (CPO)](https://proceedings.mlr.press/v70/achiam17a)
- [X] **[ICLR 2019]** [Reward Constrained Policy Optimization (RCPO)](https://openreview.net/forum?id=SkfrvsA9FX)
- [X] **[ICML 2020]** [Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (PID-Lag)](https://arxiv.org/abs/2007.03964)
- [X] **[NeurIPS 2020]** [First Order Constrained Optimization in Policy Space (FOCOPS)](https://arxiv.org/abs/2002.06506)
- [X] **[AAAI 2020]** [IPO: Interior-point Policy Optimization under Constraints (IPO)](https://arxiv.org/abs/1910.09615)
- [X] **[ICLR 2020]** [Projection-Based Constrained Policy Optimization (PCPO)](https://openreview.net/forum?id=rke3TJrtPS)
- [X] **[ICML 2021]** [CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee](https://arxiv.org/abs/2011.05869)

> Off Policy Safe
- :heavy_check_mark:The Lagrange version of TD3 (TD3-Lag)
- :heavy_check_mark:The Lagrange version of DDPG (DDPG-Lag)
- :heavy_check_mark:The Lagrange version of SAC (SAC-Lag)
- :heavy_check_mark:[ICML 2019][Lyapunov-based Safe Policy Optimization for Continuous Control (SDDPG)](https://arxiv.org/abs/1901.10031)
- :heavy_check_mark:[ICML 2019][Lyapunov-based Safe Policy Optimization for Continuous Control (SDDPG-modular)](https://arxiv.org/abs/1901.10031)
- [ICML 2022] [Constrained Variational Policy Optimization for Safe Reinforcement Learning (CVPO)](https://arxiv.org/abs/2201.11927)

- [X] The Lagrange version of TD3 (TD3-Lag)
- [X] The Lagrange version of DDPG (DDPG-Lag)
- [X] The Lagrange version of SAC (SAC-Lag)
- [X] **[ICML 2019]** [Lyapunov-based Safe Policy Optimization for Continuous Control (SDDPG)](https://arxiv.org/abs/1901.10031)
- [X] **[ICML 2019]** [Lyapunov-based Safe Policy Optimization for Continuous Control (SDDPG-modular)](https://arxiv.org/abs/1901.10031)
- [ ] **[ICML 2022]** [Constrained Variational Policy Optimization for Safe Reinforcement Learning (CVPO)](https://arxiv.org/abs/2201.11927)

> Model Base Safe

- [NeurIPS 2021][Safe Reinforcement Learning by Imagining the Near Future (SMBPO)](https://arxiv.org/abs/2202.07789)
- :heavy_check_mark:[CoRL 2021 Oral][Learning Off-Policy with Online Planning (SafeLoop)](https://arxiv.org/abs/2008.10066)
- :heavy_check_mark:[AAAI 2022][Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)](https://arxiv.org/abs/2112.07701)
- [NeurIPS 2022][Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm](https://arxiv.org/abs/2210.07573)
- [ICLR 2022] [Constrained Policy Optimization via Bayesian World Models (LAMBDA)](https://arxiv.org/abs/2201.09802)
- [ ] **[NeurIPS 2021]** [Safe Reinforcement Learning by Imagining the Near Future (SMBPO)](https://arxiv.org/abs/2202.07789)
- [X] **[CoRL 2021 (Oral)]** [Learning Off-Policy with Online Planning (SafeLoop)](https://arxiv.org/abs/2008.10066)
- [X] **[AAAI 2022]** [Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)](https://arxiv.org/abs/2112.07701)
- [ ] **[NeurIPS 2022]** [Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm](https://arxiv.org/abs/2210.07573)
- [ ] **[ICLR 2022]** [Constrained Policy Optimization via Bayesian World Models (LAMBDA)](https://arxiv.org/abs/2201.09802)

> Offline Safe
- :heavy_check_mark:[The Lagrange version of BCQ (BCQ-Lag)](https://arxiv.org/abs/1812.02900)
- :heavy_check_mark:[The Constrained version of CRR (C-CRR)](https://proceedings.neurips.cc/paper/2020/hash/588cb956d6bbe67078f29f8de420a13d-Abstract.html)
- [AAAI 2022] [Constraints Penalized Q-learning for Safe Offline Reinforcement Learning CPQ](https://arxiv.org/abs/2107.09003)
- [ICLR 2022 spotlight] [COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation](https://arxiv.org/abs/2204.08957?context=cs.AI)
- [ICML 2022][Constrained Offline Policy Optimization (COPO)](https://proceedings.mlr.press/v162/polosky22a.html)

- [X] [The Lagrange version of BCQ (BCQ-Lag)](https://arxiv.org/abs/1812.02900)
- [X] [The Constrained version of CRR (C-CRR)](https://proceedings.neurips.cc/paper/2020/hash/588cb956d6bbe67078f29f8de420a13d-Abstract.html)
- [ ] **[AAAI 2022]** [Constraints Penalized Q-learning for Safe Offline Reinforcement Learning CPQ](https://arxiv.org/abs/2107.09003)
- [ ] **[ICLR 2022 (Spotlight)]** [COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation](https://arxiv.org/abs/2204.08957?context=cs.AI)
- [ ] **[ICML 2022]** [Constrained Offline Policy Optimization (COPO)](https://proceedings.mlr.press/v162/polosky22a.html)

> Other
- :heavy_check_mark:[Safe Exploration in Continuous Action Spaces (Safety Layer)](https://arxiv.org/abs/1801.08757)
- [RA-L 2021] [Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones](https://arxiv.org/abs/2010.15920)
- [ICML 2022] [Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)](https://arxiv.org/abs/2202.06558)
- [NeurIPS 2022] [Effects of Safety State Augmentation on
Safe Exploration](https://arxiv.org/abs/2206.02675)

- [X] [Safe Exploration in Continuous Action Spaces (Safety Layer)](https://arxiv.org/abs/1801.08757)
- [ ] **[RA-L 2021]** [Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones](https://arxiv.org/abs/2010.15920)
- [ ] **[ICML 2022]** [Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)](https://arxiv.org/abs/2202.06558)
- [ ] **[NeurIPS 2022]** [Effects of Safety State Augmentation on
Safe Exploration](https://arxiv.org/abs/2206.02675)

--------------------------------------------------------------------------------

Expand Down
41 changes: 41 additions & 0 deletions docs/source/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -182,3 +182,44 @@ Binbin
Zhou
Pengfei
Yaodong
buf
Aivar
Sootla
Alexander
Cowen
Taher
Jafferjee
Ziyan
Wang
Mguni
Jun
Haitham
Ammar
Sun
Ziping
Xu
Meng
Fang
Zhenghao
Peng
Jiadong
Guo
Bo
lei
MDP
Bolei
Bou
Hao
Tuomas
Haarnoja
Aurick
Meger
Herke
Fujimoto
Lyapunov
Yinlam
Ofir
Nachum
Aleksandra
Duenez
Ghavamzadeh
12 changes: 9 additions & 3 deletions examples/train_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,24 @@
parser.add_argument(
'--algo',
type=str,
metavar='ALGO',
default='PPOLag',
help='Choose from: {PolicyGradient, PPO, PPOLag, NaturalPG,'
' TRPO, TRPOLag, PDO, NPGLag, CPO, PCPO, FOCOPS, CPPOPid,CUP',
help='Algorithm to train',
choices=omnisafe.ALGORITHMS['all'],
)
parser.add_argument(
'--env-id',
type=str,
metavar='ENV',
default='SafetyPointGoal1-v0',
help='The name of test environment',
)
parser.add_argument(
'--parallel', default=1, type=int, help='Number of paralleled progress for calculations.'
'--parallel',
default=1,
type=int,
metavar='N',
help='Number of paralleled progress for calculations.',
)
args, unparsed_args = parser.parse_known_args()
keys = [k[2:] for k in unparsed_args[0::2]]
Expand Down
1 change: 1 addition & 0 deletions omnisafe/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
# ==============================================================================
"""OmniSafe: A comprehensive and reliable benchmark for safe reinforcement learning."""

from omnisafe.algorithms import ALGORITHMS
from omnisafe.algorithms.algo_wrapper import AlgoWrapper as Agent

# from omnisafe.algorithms.env_wrapper import EnvWrapper as Env
Expand Down
Loading