Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Trainer sub-class for APPO (instead of using build_trainer()). #20424

Merged
merged 8 commits into from
Nov 22, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 16, 2021

Trainer sub-class for APPO (instead of using build_trainer()).
NOTE: We inherit APPOTrainer here directly from ImpalaTrainer (which is still built via build_trainer()). This shows nicely that the two APIs are compatible (despite the fact that we'll soon start deprecating build_trainer).

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 20, 2021
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome looking PR.
just a couple of minor questions.
thanks a lot.

@@ -33,16 +33,21 @@ def test_validate_config_idempotent(self):
"""
# Given:
standard_config = copy.deepcopy(COMMON_CONFIG)
trainer = pg.PGTrainer(env="CartPole-v0", config=standard_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem to be belong to this PR?

Copy link
Contributor Author

@sven1977 sven1977 Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does :( . We changed the validate_config method from static to normal, so now we do require an actual Trainer object to call this method, which we do here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would changing it from static to normal require that the trainer call this with the standard config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I explained in a misleading way: Not the standard config is the problem/important here. It could be any config for this test that is a ok PG config!
But you do have to call it on an instantiated object, not just call the static function like before.


Users should override this method to implement custom validation
behavior. It is recommended to call `super().validate_config()` in
this override.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, we are not calling super().validate_config() in this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the base method. Super (tune.Trainable) doesn't have a validate_config.
When users sub-class Trainer, they should call the super's (Trainer's) validate config to get all the logic that this default implementation already provides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah stupid me ...

# `execution_plan` is provided, use it inside
# `self.execution_plan()`.
if execution_plan is not None:
return execution_plan(workers, config, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, I don't get why we need this all of a sudden. is this really related to APPO trainer?
where does this execution_plan come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keeping things backward compatible (with the soon-to-be-deprecated build_trainer function).

New:

  • Trainer now has a method: execution_plan with the default implementation (the one that e.g. PGTrainer uses).
  • Users can override it.
  • However, it is still a static method, such that backward compatibility with build_trainer will not break.
  • It was necessary to introduce this in this PR as APPO is the first agent that we convert to sub-class that brings its own exec plan logic (PGTrainer used the default one; all other Trainers are still using build_trainer).

@sven1977
Copy link
Contributor Author

Hey @gjoliver , please take another look. I have addressed all your questions. Thanks! :)

@sven1977 sven1977 assigned avnishn and unassigned gjoliver Nov 22, 2021
@@ -33,16 +33,21 @@ def test_validate_config_idempotent(self):
"""
# Given:
standard_config = copy.deepcopy(COMMON_CONFIG)
trainer = pg.PGTrainer(env="CartPole-v0", config=standard_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would changing it from static to normal require that the trainer call this with the standard config?

@sven1977 sven1977 merged commit 9d2fe57 into ray-project:master Nov 22, 2021
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.


Users should override this method to implement custom validation
behavior. It is recommended to call `super().validate_config()` in
this override.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah stupid me ...

@sven1977 sven1977 deleted the trainer_sub_class_appo branch June 2, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants