-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib]: Cleanup examples folder: Add example restoring 1 of n agents from a checkpoint. #45462
[RLlib]: Cleanup examples folder: Add example restoring 1 of n agents from a checkpoint. #45462
Conversation
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
… multi-agent environment. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
rllib/examples/multi_agent/restore_1_of_n_agents_from_checkpoint.py
Outdated
Show resolved
Hide resolved
rllib/examples/multi_agent/restore_1_of_n_agents_from_checkpoint.py
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,151 @@ | |||
"""Simple example of loading module weights for 1 of n agents from checkpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's not use "simple".
"An example script showing how to load RLModule weights for 1 out of n agents from a checkpoint" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, not simple for everyone lol. I know what you mean, its actually quite some complexity to make this possible in MA scenarios - and so powerful.
@@ -0,0 +1,151 @@ | |||
"""Simple example of loading module weights for 1 of n agents from checkpoint. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a tiny paragraph here saying the usual:
This example:
- runs a multi-agent Pendulum experiment with ... policies ... blabla
- saves a checkpoint of the used MultiAgentRLModule every blabla iterations
- stops the experiment after the agents reach a combined return of ...
- picks the best of both trained policies (based on episode return) and restores only the corresponding RLModule.
- runs a second experiment with the restored RLModule (single-agent) .... blabla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just a few nits on comments/docstrings.
Awesome example! One more down. :)
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
…he issues. In addition added 'no_main' tag to test in BUILD b/c linter errored out. Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
#45462 adds a new tests by changing bazel rule instead of adding a new test file; this case can only be covered by our previous logic of computing new tests; recover this logic (in addition to the logic of computing new tests by looking at changed test files) Test: - CI --------- Signed-off-by: can <[email protected]>
#45462 adds a new tests by changing bazel rule instead of adding a new test file; this case can only be covered by our previous logic of computing new tests; recover this logic (in addition to the logic of computing new tests by looking at changed test files) This is a redo of #45495 which got reverted. The difference now is that we run the bazel command in a container instead of on the current environment. bazel seems to have issues sharing the cache when calling bazel within bazel (https://buildkite.com/ray-project/microcheck/builds/444#018fa23a-6e31-435b-a0ea-412ca2d1017b/175-1476) Test: - CI - full microcheck run: https://buildkite.com/ray-project/microcheck/builds/464 Signed-off-by: can <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Signed-off-by: Simon Zehnder <[email protected]>
Why are these changes needed?
Restoring certain agents from checkpoint is a frequent use case and we should provide examples for this scenario. This PR is adding such an example in the new API. stack. The example does the following:
n
agents onPendulum-v1
MultiEnv
.policy 0
from this checkpoint.policy 0
restored from checkpoint.This example shows that training further on from a restored checkpoint - even for only a single agent - results in faster convergence.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.