New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[RLlib] Add example: Pre-train an `RLModule` single-agent, then bring checkpoint into multi-agent setup and continue training. #44674

Merged

sven1977 merged 41 commits into ray-project:master from simonsays1980:rl-module-pre-training-example

Apr 16, 2024

Collaborator

simonsays1980 commented Apr 11, 2024 •

edited

Loading

Why are these changes needed?

So far, we have no example that shows users how to pre-train certain policies and load the checkpoints.

This PR shows users how to pre-train a module in single-agent mode and load its checkpoint in another training run into a MARL setup.

Related issue number

Related to #44263

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

simonsays1980 added 24 commits

March 25, 2024 12:16


          Defined new properties '_model_auto_keys' and 'model_config_dict' to …

5a85f24

…define the model config for 'RLmodule' in a unified way without interfering with the old stack. Reconfigured DQN Rainbow with it.

Signed-off-by: Simon Zehnder <[email protected]>


          Added new properties to 'PPO' and 'BC' in new stack and modified tune…

0c2dbc9

…d examples accordingly.

Signed-off-by: Simon Zehnder <[email protected]>


          Added new property to SAC in new stack and modified tuned example acc…

aa84898

…ordingly. In addition, fixed some typos.

Signed-off-by: Simon Zehnder <[email protected]>


          Fixed multiple tests and examples to incorporate the new 'model_confi…

56f4cf6

…g_dict' in 'AlgorithmConfig.rl_module' as they were failing. Something is still wrong with the VisionNet in 'connector_v2_frame_stacking' example.

Signed-off-by: Simon Zehnder <[email protected]>


          Reran multiple tests as some were failing in CI. Error in VisionNet r…

9e50e87

…emains b/c low priority.

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' into model-config-for-new-api-stack

70067f9

Signed-off-by: Simon Zehnder <[email protected]>


          Changed example settings as many example runs were failing in CI tests.

0e56410

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' into model-config-for-new-api-stack

904ca01

Signed-off-by: Simon Zehnder <[email protected]>


          Included @sven1977's review.

f50c0f5

Signed-off-by: Simon Zehnder <[email protected]>


          Quick fix as tests were failing because hard deprecation of '_enable_…

…rl_module_api' needed a 'False' for error - so only wanring.

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' into model-config-for-new-api-stack

8b7fbfd

Signed-off-by: Simon Zehnder <[email protected]>


          Adapted some connector examples to using the new model config.

61bb265

Signed-off-by: Simon Zehnder <[email protected]>


          Fixed a minor bug in PPO model config.

32cf531

Signed-off-by: Simon Zehnder <[email protected]>


          Removed bug in SingleAgentEnvRunner using the old model_config_dict i…

3f365e0

…nstead of model_config.

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' into model-config-for-new-api-stack

0d1635e

Signed-off-by: Simon Zehnder <[email protected]>


          Adjusted a couple of examples to use the 'model_config_dict' in 'rl_m…

84269c4

…odule()'.

Signed-off-by: Simon Zehnder <[email protected]>


          Fixed some minor bugs in CI tests.

7d92725

Signed-off-by: Simon Zehnder <[email protected]>


          Fixed bug in docs.

ff8c29a

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' into model-config-for-new-api-stack

6f1252f

Signed-off-by: Simon Zehnder <[email protected]>


          Merging master and linting.

74017bd

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' into model-config-for-new-api-stack

9fc4f86

Signed-off-by: Simon Zehnder <[email protected]>


          Fixed remaining bugs in test and docs caused by external RLModule's n…

…ot using the corresponding default model configuration of the training algorithm. Also added a pre-training example for MARL.

Signed-off-by: Simon Zehnder <[email protected]>


          Moved example to another branch.

b2f92f2

Signed-off-by: Simon Zehnder <[email protected]>


          Added a pre-training example for RLModules in a MARL setting. Pre-tra…

21cea3b

…in single module and load its checkpoint into a MARL setting for one policy.

Signed-off-by: Simon Zehnder <[email protected]>

sven1977 changed the title ~~RLModule pre-training example for multi-agent setup~~ [RLlib] RLModule pre-training example for multi-agent setup.

simonsays1980 and others added 5 commits

April 11, 2024 18:08


          Fixed failing test in 'rllib/tests'. The test was failing because the…

9f595f7

… external module did not use the default model config of the algorithm.

Signed-off-by: Simon Zehnder <[email protected]>


          Fixed failing test code in docs.

f5d37c2

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' into model-config-for-new-api-stack

57db62b

Signed-off-by: Simon Zehnder <[email protected]>


          Merge branch 'master' of https://github.com/ray-project/ray into mode…

f465a3b

…l-config-for-new-api-stack


          fixes

98d46b1

Signed-off-by: sven1977 <[email protected]>

sven1977 marked this pull request as ready for review

April 15, 2024 11:18

sven1977 requested review from sven1977, avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha and a team as code owners

April 15, 2024 11:18


          Merge branch 'master' into rl-module-pre-training-example

891cbf2

sven1977 reviewed

View reviewed changes

rllib/examples/rl_modules/pretraining_rlm.py Outdated Show resolved Hide resolved

sven1977 reviewed

View reviewed changes

rllib/examples/rl_modules/pretraining_rlm.py Outdated Show resolved Hide resolved

sven1977 reviewed

View reviewed changes

rllib/examples/rl_modules/pretraining_rlm.py Outdated Show resolved Hide resolved

sven1977 reviewed

View reviewed changes

rllib/examples/rl_modules/pretraining_rlm.py Outdated

+                  config = (
+                      PPOConfig()
+                      # Enable the new API stack (RLModule and Learner APIs).
+                      .experimental(_enable_new_api_stack=True)

Contributor

sven1977 Apr 15, 2024

This is done automatically by the run_rllib_example_script_experiment util.

sven1977 reviewed

View reviewed changes

rllib/examples/rl_modules/pretraining_rlm.py Outdated Show resolved Hide resolved

sven1977 reviewed

View reviewed changes

rllib/examples/rl_modules/pretraining_rlm.py Outdated

+                  marl_module_spec = MultiAgentRLModuleSpec(module_specs=module_specs)
+                  # Register our environment with tune if we use multiple agents.
+                  if args.num_agents > 0:

Contributor

sven1977 Apr 15, 2024

Is this if-block needed? We assert that this command line arg is >0 above.

Collaborator Author

simonsays1980 Apr 15, 2024

Yeah I guess we can remove this here. Good catch @sven1977 !

Collaborator Author

simonsays1980 Apr 15, 2024

Great catch! I removed this in the follow-up commit.

sven1977 reviewed

View reviewed changes

rllib/examples/rl_modules/pretraining_rlm.py Outdated Show resolved Hide resolved


          Apply suggestions from code review

be72c17

Signed-off-by: Sven Mika <[email protected]>

sven1977 approved these changes

View reviewed changes

Contributor

sven1977 left a comment

Super nice example and PR! Thanks @simonsays1980 !
Just a few nits and waiting for:

We must add this great example to the BUILD!
Can we rename the script into a more descriptive name? Like pretraining_single_agent_training_multi_agent <- something like this that more describes the exact sequence of things we do here.


          Added review from @sven1977 and fixed a minor bug with the number of …

25846c8

…agents.

Signed-off-by: Simon Zehnder <[email protected]>

Contributor

sven1977 commented Apr 15, 2024

Ok, cool! Can we also add this example script to BUILD?

simonsays1980 added 2 commits

April 15, 2024 16:02


          Added example to the BUILD file.

724b580

Signed-off-by: Simon Zehnder <[email protected]>


          Changed paths in BUILD file.

5ccb496

Signed-off-by: Simon Zehnder <[email protected]>

simonsays1980 self-assigned this

simonsays1980 added 2 commits

April 16, 2024 10:17


          Added args to BUILD.

abd565d

Signed-off-by: Simon Zehnder <[email protected]>


          Added more args and a comma.

72f1be1

Signed-off-by: Simon Zehnder <[email protected]>

sven1977 reviewed

View reviewed changes

rllib/BUILD

@@ @@ -2873,7 +2873,14 @@ py_test( @@
                   size = "small",
                   srcs = ["examples/rl_modules/classes/mobilenet_rlm.py"],
               )
+              py_test(

Contributor

sven1977 Apr 16, 2024

Awesome!

sven1977 reviewed

View reviewed changes

rllib/utils/test_utils.py Outdated Show resolved Hide resolved


          Apply suggestions from code review

e9b4af1

Signed-off-by: Sven Mika <[email protected]>

sven1977 merged commit d8c7234 into ray-project:master

5 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

sven1977 sven1977 approved these changes

avnishn Awaiting requested review from avnishn

ArturNiederfahrenhorst Awaiting requested review from ArturNiederfahrenhorst

maxpumperla Awaiting requested review from maxpumperla

kouroshHakha Awaiting requested review from kouroshHakha

Labels

None yet