[RLlib] MARWIL RLModule #44970

simonsays1980 · 2024-04-25T14:28:58Z

Why are these changes needed?

This PR implements MARWIL in the new API stack using Learner API and RLModule API. It relates to the proposal for the new Offline Data API to be used in its training step.

Related issue number

#44969
Closes #37775

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Simon Zehnder <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…matted commit merely for securing the work. Signed-off-by: simonsays1980 <[email protected]>

…' and 'MARWILTorchPolicy', fixed imports and tested MARWIL on non-recurrent policies. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…unction. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

sven1977 · 2024-07-25T10:31:57Z

rllib/algorithms/bc/tests/test_bc.py

@@ -59,7 +63,7 @@ def test_bc_compilation_and_learning_from_offline_file(self):
            results = algo.train()
            print(results)

-            eval_results = results.get("evaluation", {})
+            eval_results = results.get(EVALUATION_RESULTS, {})


sven1977 · 2024-07-25T10:33:05Z

rllib/algorithms/bc/torch/bc_torch_learner.py

@@ -49,11 +50,13 @@ def possibly_masked_mean(t):
            mask = None
            possibly_masked_mean = torch.mean

-        action_dist_class_train = self.module[module_id].get_train_action_dist_cls()
+        action_dist_class_train = (
+            self.module[module_id].unwrapped().get_train_action_dist_cls()


Not for this PR, but I wonder whether we should make the MARLModule return already the unwrapped RLModule automatically when we access a sub-module through __getitem__(). ...

Do we need anywhere the wrapped module? I guess not even in the DDP case.

Ah, hold on, good point. The most user-friendly way is probably to make sure the wrapper exposes:

all RLModule base methods.

all API methods that the wrapped module implements -> So the wrapper should check, what RLModule APIs (ValueFunctionAPI, TargetNetworkAPI, etc..) its wrapped module implements, then expose all this API's methods as well automatically. ?

This is how it could work. Its not a very elegant design but I tried around some weeks ago to make it elegant by just defining all methods not yet published from the wrapped module and this is non-trivial. Have to check agaon what the reasons were.

rllib/algorithms/marwil/marwil.py

rllib/offline/offline_data.py

sven1977 · 2024-07-25T11:01:20Z

rllib/algorithms/marwil/tests/test_marwil.py


        config = (
            marwil.MARWILConfig()
            .env_runners(num_env_runners=1)
+            .api_stack(
+                enable_rl_module_and_learner=True,


sven1977 · 2024-07-25T11:01:31Z

rllib/algorithms/marwil/tests/test_marwil.py


        config = (
            marwil.MARWILConfig()
+            .api_stack(
+                enable_rl_module_and_learner=True,


rllib/algorithms/marwil/tests/test_marwil_rl_module.py

rllib/tuned_examples/marwil/cartpole_marwil.py

rllib/algorithms/marwil/marwil_rl_module.py

… to 'OfflineData'. Set return to reach higher for tuned example. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

… in linting and building. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

sven1977

Looks great! Let's get this merged :) Thanks @simonsays1980 for this great PR!

Signed-off-by: simonsays1980 <[email protected]>

…nectors request finalized episodes. Signed-off-by: simonsays1980 <[email protected]>

…g as this was giving an error when 'MARWILOfflinePreLearner' tried to call a value function unneeded by BC. Deprecated hybrid stack. Signed-off-by: simonsays1980 <[email protected]>

…tting 'beta=0.0'. Signed-off-by: simonsays1980 <[email protected]>

…. BC depends now fully on MARWIL. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…epcrecated. Moved to old stack as it uses policies. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

…he learner from MARWIL. Signed-off-by: simonsays1980 <[email protected]>

Signed-off-by: simonsays1980 <[email protected]>

simonsays1980 added 11 commits September 8, 2023 15:22

Initiated MARWIL RL Module and added catalog, learner and tf_learner.

cbfd05f

Signed-off-by: Simon Zehnder <[email protected]>

Added MARWIL RL Module and started to write test.

c488da7

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into marwil-rl-module

9078af8

Signed-off-by: Simon Zehnder <[email protected]>

Implemented Torch version of MARWIL.

b4e1795

Signed-off-by: Simon Zehnder <[email protected]>

Added torch learner.

5eeb2e6

Signed-off-by: Simon Zehnder <[email protected]>

Merged master.

a1928bc

Signed-off-by: simonsays1980 <[email protected]>

Moved trainign step logic from BC to MARWIL.

3fcef32

Signed-off-by: simonsays1980 <[email protected]>

Setup MARWIL with the new stack using 'OfflineData'. This is an unfor…

e9abc27

…matted commit merely for securing the work. Signed-off-by: simonsays1980 <[email protected]>

Fixed multiple bugs in 'MARWILOfflinePreLearner', 'MARWILTorchLearner…

c930464

…' and 'MARWILTorchPolicy', fixed imports and tested MARWIL on non-recurrent policies. Signed-off-by: simonsays1980 <[email protected]>

LINTER.

8b575db

Signed-off-by: simonsays1980 <[email protected]>

Merged Master

1325beb

Signed-off-by: simonsays1980 <[email protected]>

sven1977 changed the title ~~[RLlib] - MARWIL RLModule~~ [RLlib] MARWIL RLModule Jul 24, 2024

sven1977 marked this pull request as ready for review July 24, 2024 15:36

sven1977 requested review from sven1977 and ArturNiederfahrenhorst as code owners July 24, 2024 15:36

simonsays1980 added 6 commits July 24, 2024 18:17

Removed tensorflow and fixed a small bug.

7785bda

Signed-off-by: simonsays1980 <[email protected]>

Readded 'input_read_schema' b/c it was accidentally removed.

3e680cc

Signed-off-by: simonsays1980 <[email protected]>

Readded further tests for MARWIL on continuous actions and its loss f…

90c8d03

…unction. Signed-off-by: simonsays1980 <[email protected]>

Added default 'prelearner_class' to 'MARWILConfig'.

8a8d5c5

Signed-off-by: simonsays1980 <[email protected]>

Added example to 'tuned_examples' for MARWIL.

5f051e0

Signed-off-by: simonsays1980 <[email protected]>

Added BC and MARWIL tuned_examples to learning tests.

f247e7c

Signed-off-by: simonsays1980 <[email protected]>