-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Cleanup examples/
folder 03.
#44559
[RLlib] Cleanup examples/
folder 03.
#44559
Conversation
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_03 # Conflicts: # rllib/examples/multi_agent/multi_agent_pendulum.py and wip Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think that labeling some examples as "only-old-stack" or putting them into a folder "old-stack" helps users to distinguish which examples can be executed in the new stack.
} | ||
|
||
class RestoreWeightsCallback(DefaultCallbacks): | ||
def on_algorithm_init(self, *, algorithm: "Algorithm", **kwargs) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this callback actually called before weights synching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. I'll check this and fix, if necessary. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This callback is called after the initial weights initialization of all workers (from the randomly initialized learners).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should be fine.
- First we initialize the algo randomly (and sync all the workers).
- Then we override only(!) the 1st policy with the given weights.
) | ||
|
||
|
||
class GuessTheNumberGame(MultiAgentEnv): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a multi-agent environment that uses __common__
infos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. We should make this a separate (new) example script, then.
@@ -0,0 +1,207 @@ | |||
# TODO (sven): Move this example script into the new API stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaics this example is only for old stack (using exploration). Can we put old stack examples under a specific folder old stack?
I see users trying to use exploration algorithms with our new stack and failing with this combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we'll have to translate it into the new stack (using RLModule instead of the old Exploration API).
) | ||
.framework(args.framework) | ||
.resources( | ||
# How many GPUs does the local worker (driver) need? For most algos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put here also an example for mutiple worker learners? This is still ambiguous how exactly to set multiple learner workers and assign them GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO. Sorry, I'm trying to keep this PR completely free of actual code changes.
@@ -0,0 +1,188 @@ | |||
# TODO (sven): Move this example script into the new API stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well: Is this possible with new stack and can we otherwise put this example under a specific foler named old_stack
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a new folder called _old_api_stack
, but I would like to only put scripts in there that don't need to be translated to the new API stack b/c whatever is demo'd in these will no longer be supported (or relevant).
All the other scripts currently still on the old stack (that are NOT in that folder) will have to be translated.
I agree with you that we should mark all these differences more clearly.
Suggestion:
- Old-API stack scripts that will NOT be supported in the future (those in
_old_api_stack/
): adddeprecation_warning
to top explaining that these scripts will be deprecated fully (w/o replacement) - Old-API stack scripts that we will translated (those NOT in
_old_api_stack/
): add warning to top explaining that we are about to translate these to the new stack (and will retire/remove their old stack version). - New API stack: Leave as-is (this should be the new normal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be possible. This is not even new stack specific. We just have to use PPO/DQNRLModule instead of PPO/DQNPolicy.
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_03
Signed-off-by: sven1977 <[email protected]>
Co-authored-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]>
Signed-off-by: sven1977 <[email protected]>
… cleanup_examples_folder_03
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_03
Cleanup
examples/
folder (vol. 03).The planned, final structure for the
examples
folder of RLlib will be as follows:examples
directory will move into one of the above sub-folders._old_api_stack
) has been created to host only those scripts that show things that are relevant to the old API stack only. This folder will also eventually disappear (before 3.0).old_api_stack
,catalog
,env
,export
,inference_and_serving
,learner
,policy
,rl_module
, andserving
will all be removed soon (replaced by the plural naming (e.g.env
->envs
) or a cleaner name (e.g.export
->checkpoints
) or replaced altogether (e.g.policy
)).TODOs (in several follow-up PRs):
rllib.examples
folder for now until we figure out their final location (e.g.parametric_actions_cartpole.py
). Most of these are related to action spaces or action manipulation and will most likely move into theconnectors
folder, but we'll see.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.