Merge 30c0bb1 into 25928f2

PKU-Alignment · May 26, 2023 · abf9a0b · abf9a0b
2 parents 25928f2 + 30c0bb1
commit abf9a0b
Show file tree

Hide file tree

Showing 109 changed files with 2,330 additions and 135 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Features
 
+- Feat(off-policy): support off-policy pid and update performance for navigation by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#245](https://github.com/OmniSafeAI/omnisafe/pull/245).
+
+- Style(model-based): fix mypy and polish api docstring by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#244](https://github.com/OmniSafeAI/omnisafe/pull/244).
+
+- Feat: improve test coverage and clear redundant code by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#238](https://github.com/OmniSafeAI/omnisafe/pull/238).
+
+- Feat: update benchmarks and provide configs for reproducing results by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#238](https://github.com/OmniSafeAI/omnisafe/pull/236).
+
 - Feat: add CODEOWNERS and refine ISSUE TEMPLATE by [@Jiaming Ji](https://github.com/zmsn-2077) in PR [#233](https://github.com/PKU-Alignment/omnisafe/pull/233).
 
 - Style: support mypy checking and update docstring style by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#221](https://github.com/PKU-Alignment/omnisafe/pull/221).
@@ -27,13 +35,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Documentations
 
+- Docs: polish algorithms tutorial by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#242](https://github.com/OmniSafeAI/omnisafe/pull/242).
+
+- Docs: change link to PKU-Alignment by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#239](https://github.com/OmniSafeAI/omnisafe/pull/239).
+
 - Docs: polish readme by [@Jiaming Ji](https://github.com/zmsn-2077) in PR [#231](https://github.com/PKU-Alignment/omnisafe/pull/231).
 
 - Docs: polish algorithm tutorial and update API docs by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#225](https://github.com/PKU-Alignment/omnisafe/pull/225).
 
 ### Fixes
 
-### Refactor
+- Fix: fix adapter device and exp grid by [@Jiayi Zhou](https://github.com/Gaiejj) in PR [#243](https://github.com/OmniSafeAI/omnisafe/pull/243).
 
 ## v0.4.0
 

diff --git a/README.md b/README.md
@@ -195,9 +195,9 @@ python train_policy.py --algo PPOLag --env-id SafetyPointGoal1-v0 --parallel 1 -
 </thead>
 <tbody>
   <tr>
-    <td rowspan="4">On Policy</td>
+    <td rowspan="5">On Policy</td>
     <td rowspan="2">Primal Dual</td>
-    <td>TRPOLag; PPOLag; PDO; RCPO; OnCRPO</td>
+    <td>TRPOLag; PPOLag; PDO; RCPO</td>
   </tr>
   <tr>
     <td>TRPOPID; CPPOPID</td>
@@ -210,13 +210,17 @@ python train_policy.py --algo PPOLag --env-id SafetyPointGoal1-v0 --parallel 1 -
     <td>Penalty Function</td>
     <td>IPO; P3O</td>
   </tr>
+  <tr>
+    <td>Primal</td>
+    <td>OnCRPO</td>
+  </tr>
   <tr>
     <td rowspan="2">Off Policy</td>
     <td rowspan="2">Primal-Dual</td>
     <td>SACLag; DDPGLag; TD3Lag</td>
   </tr>
   <tr>
-    <td><span style="font-weight:400;font-style:normal">SACPID; TD3PID; DDPGPID; CVPO</span></td>
+    <td><span style="font-weight:400;font-style:normal">SACPID; TD3PID; DDPGPID</span></td>
   </tr>
   <tr>
     <td rowspan="3">Model-based</td>

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -52,6 +52,9 @@ The OmniSafe Safety-Gymnasium Benchmark evaluates the effectiveness of OmniSafe'
 - **[Preprint 2019]<sup>[[1]](#footnote1)</sup>** [The Lagrangian version of DDPG (DDPGLag)](https://cdn.openai.com/safexp-short.pdf)
 - **[Preprint 2019]<sup>[[1]](#footnote1)</sup>** [The Lagrangian version of TD3 (TD3Lag)](https://cdn.openai.com/safexp-short.pdf)
 - **[Preprint 2019]<sup>[[1]](#footnote1)</sup>** [The Lagrangian version of SAC (SACLag)](https://cdn.openai.com/safexp-short.pdf)
+- **[ICML 2020]** [Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (DDPGPID)](https://arxiv.org/abs/2007.03964)
+- **[ICML 2020]** [Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (TD3PID)](https://arxiv.org/abs/2007.03964)
+- **[ICML 2020]** [Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (SACPID)](https://arxiv.org/abs/2007.03964)
 
 > **More details can be refer to [Off Policy Experiment](https://github.com/PKU-Alignment/omnisafe/tree/main/benchmarks/off-policy/README.md).**