-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Trainer sub-class for APPO (instead of using build_trainer()
).
#20424
Conversation
…ner_sub_class_appo
…ner_sub_class_appo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome looking PR.
just a couple of minor questions.
thanks a lot.
@@ -33,16 +33,21 @@ def test_validate_config_idempotent(self): | |||
""" | |||
# Given: | |||
standard_config = copy.deepcopy(COMMON_CONFIG) | |||
trainer = pg.PGTrainer(env="CartPole-v0", config=standard_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem to be belong to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does :( . We changed the validate_config
method from static
to normal, so now we do require an actual Trainer object to call this method, which we do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would changing it from static to normal require that the trainer call this with the standard config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I explained in a misleading way: Not the standard config is the problem/important here. It could be any config for this test that is a ok PG config!
But you do have to call it on an instantiated object, not just call the static function like before.
|
||
Users should override this method to implement custom validation | ||
behavior. It is recommended to call `super().validate_config()` in | ||
this override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, we are not calling super().validate_config() in this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the base method. Super (tune.Trainable) doesn't have a validate_config
.
When users sub-class Trainer
, they should call the super's (Trainer's) validate config to get all the logic that this default implementation already provides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah stupid me ...
# `execution_plan` is provided, use it inside | ||
# `self.execution_plan()`. | ||
if execution_plan is not None: | ||
return execution_plan(workers, config, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, I don't get why we need this all of a sudden. is this really related to APPO trainer?
where does this execution_plan come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just keeping things backward compatible (with the soon-to-be-deprecated build_trainer
function).
New:
- Trainer now has a method:
execution_plan
with the default implementation (the one that e.g. PGTrainer uses). - Users can override it.
- However, it is still a static method, such that backward compatibility with
build_trainer
will not break. - It was necessary to introduce this in this PR as APPO is the first agent that we convert to sub-class that brings its own exec plan logic (PGTrainer used the default one; all other Trainers are still using
build_trainer
).
Hey @gjoliver , please take another look. I have addressed all your questions. Thanks! :) |
…ner_sub_class_appo
@@ -33,16 +33,21 @@ def test_validate_config_idempotent(self): | |||
""" | |||
# Given: | |||
standard_config = copy.deepcopy(COMMON_CONFIG) | |||
trainer = pg.PGTrainer(env="CartPole-v0", config=standard_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would changing it from static to normal require that the trainer call this with the standard config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
|
||
Users should override this method to implement custom validation | ||
behavior. It is recommended to call `super().validate_config()` in | ||
this override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah stupid me ...
Trainer sub-class for APPO (instead of using
build_trainer()
).NOTE: We inherit APPOTrainer here directly from ImpalaTrainer (which is still built via
build_trainer()
). This shows nicely that the two APIs are compatible (despite the fact that we'll soon start deprecatingbuild_trainer
).Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.