[RLlib] APPO on new API stack (w/ EnvRunners). #46216

sven1977 · 2024-06-24T16:25:43Z

APPO on new API stack (w/ EnvRunners).

Unified target net RLModules via a new API. The user only has to override the get_target_net_pairs... method, then the Learner can call the syncmethod on the module to sync either with or without (1.0) a tau value.
Removed additional_update entirely (replaced by a more flexible yet simpler API: before_gradient_based_update and after_gradient_based_update, which get called along with update, NOT in sequence anymore).
Added initial CartPole and multi-agent CartPole learning tests.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

…_new_api_stack

Signed-off-by: sven1977 <[email protected]>

…_new_api_stack Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/appo/appo_learner.py # rllib/algorithms/appo/tf/appo_tf_learner.py # rllib/algorithms/appo/torch/appo_torch_learner.py # rllib/algorithms/dqn/dqn_rainbow_learner.py # rllib/algorithms/dqn/dqn_rainbow_rl_module.py # rllib/algorithms/sac/torch/sac_torch_rl_module.py

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2024-06-24T16:28:48Z

rllib/algorithms/appo/appo_learner.py

-                module_id, config, mean_kl_loss_per_module[module_id]
-            )
+    @override(Learner)
+    def _after_gradient_based_update(self, *, timesteps: Dict[str, Any]) -> None:


With this PR, we get rid of additional_update_for_module and instead support customizing:

Learner.before_gradient_based_update()
Learner.after_gradient_based_update()

These get called along with the regular Learner.update() call, so we won't have the problem of 2x metrics reduction anymore or having to pass around results from the update() call back into the additional_update call (e.g. the KL values, which felt a little clumsy).

We will still have to streamline this API in the future (maybe make it per module?, give them better names, make them public, unify the timesteps arg format).

sven1977 · 2024-06-24T16:29:10Z

rllib/algorithms/appo/appo_learner.py

-    def _update_module_kl_coeff(
-        self, module_id: ModuleID, config: APPOConfig, sampled_kl: float
-    ) -> None:
+    def _update_module_kl_coeff(self, module_id: ModuleID, config: APPOConfig) -> None:


We'll take the KL directly from the metrics now.

simonsays1980

LGTM. Just a duplicate target network synch at thesetup step of DQN Rainbow/SAC

simonsays1980 · 2024-06-24T17:30:10Z

rllib/algorithms/dqn/dqn_rainbow_learner.py

-            lambda mid, module: module.sync_target_networks(tau=1.0)
-        )
+        # Initially sync target networks (w/ tau=1.0 -> full overwrite).
+        self.module.sync_target_networks(tau=1.0)


We sync twice at the beginning now - the TorchDQNRainbowRLModule does sync in its setup().

Good catch! I was debating this with myself: Should the RLModule perform the initial sync or the Learner?

Since the Learner also controls the regular syncs during training, I felt like we should do it in the Learner, then it's all in one place. The RLModule itself (at least in its inference_only mode) doesn't really care about the target nets anyways.

Signed-off-by: sven1977 <[email protected]>

…_new_api_stack

Signed-off-by: sven1977 <[email protected]>

…_new_api_stack

Signed-off-by: sven1977 <[email protected]>

…_new_api_stack

sven1977 added 6 commits June 21, 2024 20:51

wip

785863d

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into appo…

3202e94

…_new_api_stack

wip

079671b

Signed-off-by: sven1977 <[email protected]>

wip

ce5d0da

Signed-off-by: sven1977 <[email protected]>

wip

c6bd885

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners June 24, 2024 16:25

sven1977 assigned simonsays1980 Jun 24, 2024

sven1977 commented Jun 24, 2024

View reviewed changes

simonsays1980 approved these changes Jun 24, 2024

View reviewed changes

sven1977 added 2 commits June 24, 2024 23:20

wip

d8f1259

Signed-off-by: sven1977 <[email protected]>

wip

355c054

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from maxpumperla and a team as code owners June 25, 2024 09:48

sven1977 added 2 commits June 25, 2024 20:12

Merge branch 'master' of https://github.com/ray-project/ray into appo…

6eb695f

…_new_api_stack

wip

99e738b

Signed-off-by: sven1977 <[email protected]>

can-anyscale approved these changes Jun 25, 2024

View reviewed changes

sven1977 added 3 commits June 26, 2024 09:05

Merge branch 'master' of https://github.com/ray-project/ray into appo…

f5ca0b8

…_new_api_stack

wip

d245b71

Signed-off-by: sven1977 <[email protected]>

fix

d98619d

Signed-off-by: sven1977 <[email protected]>

sven1977 enabled auto-merge (squash) June 26, 2024 08:13

github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 26, 2024

sven1977 added 2 commits June 26, 2024 12:56

wip

a276a6f

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into appo…

7d964fc

…_new_api_stack

github-actions bot disabled auto-merge June 26, 2024 11:16

sven1977 enabled auto-merge (squash) June 26, 2024 11:32

sven1977 merged commit 3862ab5 into ray-project:master Jun 26, 2024
7 checks passed

sven1977 deleted the appo_new_api_stack branch June 26, 2024 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] APPO on new API stack (w/ EnvRunners). #46216

[RLlib] APPO on new API stack (w/ EnvRunners). #46216

sven1977 commented Jun 24, 2024 •

edited

Loading

sven1977 Jun 24, 2024

sven1977 Jun 24, 2024

simonsays1980 left a comment

simonsays1980 Jun 24, 2024

sven1977 Jun 25, 2024

sven1977 Jun 25, 2024

[RLlib] APPO on new API stack (w/ EnvRunners). #46216

[RLlib] APPO on new API stack (w/ EnvRunners). #46216

Conversation

sven1977 commented Jun 24, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 Jun 24, 2024

Choose a reason for hiding this comment

sven1977 Jun 24, 2024

Choose a reason for hiding this comment

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 Jun 24, 2024

Choose a reason for hiding this comment

sven1977 Jun 25, 2024

Choose a reason for hiding this comment

sven1977 Jun 25, 2024

Choose a reason for hiding this comment

sven1977 commented Jun 24, 2024 •

edited

Loading