Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. #20055

Merged
merged 25 commits into from
Nov 11, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 4, 2021

POC: The utility method: rllib/agents/trainer_template.py::build_trainer should be deprecated. It's confusing to look for a certain Trainer functionality and not know, whether to check the template class or the Trainer class directly.
Instead of build_trainer(), custom Trainer classes should be created via sub-classing, as is done in this PR for the PGTrainer example.

This PR:

  • Sub-classes PGTrainer directly from Trainer.
  • Design enhancement: No longer overrides Trainable.train() (it shouldn't as it's fully defined by Trainer's super class: Trainable). Instead, only override Trainable.setup() (as is intended by Trainable).
  • Move PG into new rllib/trainer directory to move toward disentangling the ambiguity of the term "agent", which should be purely used as: "an acting entity within a (possibly multi-agent) environment".
  • Make minor adjustments to Trainer class and build_trainer() to make sure a) sub-classing from Trainer, b) using build_trainer, and c) legacy sub-classing from Trainer (e.g. ES and ARS algos) all still work ok.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@PublicAPI
def train(self) -> ResultDict:
"""Overrides super.train to synchronize global vars."""
def step(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't override Trainable.train() anymore (which was never a good idea anyways as sub-classes of Trainable should only override setup().

@@ -404,8 +404,9 @@ def _test(what, method_to_test, obs_space, full_fetch, explore, timestep,
if what is trainer:
# Get the obs-space from Workers.env (not Policy) due to possible
# pre-processor up front.
worker_set = getattr(trainer, "workers",
getattr(trainer, "_workers", None))
worker_set = getattr(trainer, "workers")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For algos like ES and ARS, that still use self._workers instead of self.workers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO to migrate ES and ARS, so we can get rid of this weird if?

@@ -0,0 +1,86 @@
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply moved here, no changes done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there still things in agents/ directory if we move the policies etc here?
personally, I feel like the existing structure of having all the custom overrides in agents/ felt pretty natural. this also minimizes disruptive changes for users. basically:

rllib/
  - trainer/   # generic trainer definitions
  - policy/    # generic policy definitions
  - evaluation/   # worker definitions
  - agents/
     - pg/    # pg overrides of policy and trainer
     - other agents/

as written, having policy overrides under trainer/ feels slightly illogical.

Copy link
Contributor Author

@sven1977 sven1977 Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. However, we are already doing this illogically today: We define policy-overrides inside the agents dir.
The reason of starting to rename agents into trainer with this PR is that the word "agent" should only be used for acting entities in a multi-agent/single-agent environment, not for the "thing that trains policies". Our Trainers - e.g. PPOTrainer - used to be called Agents - e.g. "PPOAgent" - and we renamed these classes some time ago w/o renaming the directory at the same time. I would like to start moving everything from "agents" into "trainer" and then remove the agents dir entirely. But yes, maybe we should do this all at once (and not in this PR) or one by one. Not sure.

On the policy overrides: Would it be better to move these into the policy dir? I'm not sure this would be a good idea.

My suggestion would therefore be:

rllib/
  trainer/  # all contents of `agents` will move into here, eventually
    trainer.py
    pg/
      pg_trainer.py
      pg_tf_policy.py
      pg_torch_policy.py
    ppo/
      ...
    a3c/
      ...
  agents/  # deprecate soon: b/c confusing terminology, which clashes with single-agent/multi-agent envs
  policy/  # generic policy defs (remain as-is)
  

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know, what you think.

@@ -0,0 +1,56 @@
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply moved here, no changes done.

@@ -0,0 +1,16 @@
from ray.rllib.agents.trainer import with_common_config
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to start separating the default config for each algo from the rest of the files.

@@ -262,19 +225,6 @@ def _before_evaluate(self):
if before_evaluate_fn:
before_evaluate_fn(self)

@override(Trainer)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved into Trainer.

self.execution_plan = execution_plan
self.train_exec_impl = execution_plan(
self.workers, config, **self._kwargs_for_execution_plan())
if execution_plan is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_execution_plan() was moved into Trainer, so a value of None is ok here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually can you just add this as a comment:

Override the default_execution_plan set in Trainer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, overall, I think this looks great, and matches exactly how I imagine a custom trainer should be implemented.
a couple of structural suggestions, but love this!

self.execution_plan = execution_plan
self.train_exec_impl = execution_plan(
self.workers, config, **self._kwargs_for_execution_plan())
if execution_plan is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually can you just add this as a comment:

Override the default_execution_plan set in Trainer.

@@ -0,0 +1,86 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there still things in agents/ directory if we move the policies etc here?
personally, I feel like the existing structure of having all the custom overrides in agents/ felt pretty natural. this also minimizes disruptive changes for users. basically:

rllib/
  - trainer/   # generic trainer definitions
  - policy/    # generic policy definitions
  - evaluation/   # worker definitions
  - agents/
     - pg/    # pg overrides of policy and trainer
     - other agents/

as written, having policy overrides under trainer/ feels slightly illogical.

@@ -0,0 +1,14 @@
from ray.rllib.trainer.pg.pg_trainer import PGTrainer, DEFAULT_CONFIG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can avoid having to export custom loss functions if we keep trainer and policy overrides in existing directory structure?
see the other comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undid the moving of PG into rllib/trainer. We can do this another time and maybe all Trainer classes at once.

@sven1977 sven1977 changed the title [RLlib] POC: Prove that trainer_template is not needed; simplify PGTrainer [RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. Nov 5, 2021
@sven1977
Copy link
Contributor Author

sven1977 commented Nov 9, 2021

Hey @gjoliver , could you take another look? I think I fixed all requests and answered all remaining questions.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks very good actually.
just a couple of really simple questions.

workers.reset(healthy_workers)
self.train_exec_impl = self.execution_plan(
workers, self.config, **self._kwargs_for_execution_plan())
if self.train_exec_impl is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I wonder why we need to check this here ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do. What if the user does not provide an execution_plan but implements step() herself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. so in this case user does her own init() and step(), and we can't really recover anything for her here. that is quite interesting.
can you please help comment this.
thanks.

# By default, `setup` should create both worker sets: "rollout workers"
# for collecting samples for training and - if applicable - "evaluation
# workers".
except NotImplementedError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, instead of this try except, we should be able to simply put all the logics in this except block into a default self._init() function?
that seems to have the same effect as this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. We are trying to deprecate the _init() method (in favor of Trainable.setup() as indented by Tune Trainable API). Moving functionality now into _init just to avoid the try/except would counter that effort and again make it harder to read the code in setup() (use would have to jump into _init to find the WorkerSet generating code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I figured it doesn't hurt to ask :) thanks for the explanation.

@@ -404,8 +404,9 @@ def _test(what, method_to_test, obs_space, full_fetch, explore, timestep,
if what is trainer:
# Get the obs-space from Workers.env (not Policy) due to possible
# pre-processor up front.
worker_set = getattr(trainer, "workers",
getattr(trainer, "_workers", None))
worker_set = getattr(trainer, "workers")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a TODO to migrate ES and ARS, so we can get rid of this weird if?

@gjoliver
Copy link
Member

there also seems to be a relevant test failure:

File "/ray/python/ray/rllib/agents/trainer.py", line 2073, in setstate

  | (_RemoteSingleAgentEnv pid=18233) if self.local_replay_buffer is not None:
  | (_RemoteSingleAgentEnv pid=18233) AttributeError: 'PGTrainer' object has no attribute 'local_replay_buffer'

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks for your patience with my questions and comments. nice change!

# By default, `setup` should create both worker sets: "rollout workers"
# for collecting samples for training and - if applicable - "evaluation
# workers".
except NotImplementedError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I figured it doesn't hurt to ask :) thanks for the explanation.

workers.reset(healthy_workers)
self.train_exec_impl = self.execution_plan(
workers, self.config, **self._kwargs_for_execution_plan())
if self.train_exec_impl is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. so in this case user does her own init() and step(), and we can't really recover anything for her here. that is quite interesting.
can you please help comment this.
thanks.

@sven1977 sven1977 merged commit 6f85af4 into ray-project:master Nov 11, 2021
krfricke added a commit that referenced this pull request Nov 12, 2021
krfricke added a commit that referenced this pull request Nov 12, 2021
krfricke added a commit that referenced this pull request Nov 12, 2021
…sing, not `trainer_template.py`. (#20055)" (#20284)"

This reverts commit 246787c.
sven1977 pushed a commit that referenced this pull request Nov 16, 2021
…ing, not `trainer_template.py`." (#20285)

* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787c.
Co-authored-by: sven1977 <[email protected]>
wuisawesome pushed a commit that referenced this pull request Nov 20, 2021
…ing, not `trainer_template.py`." (#20285)

* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787c.
Co-authored-by: sven1977 <[email protected]>
wuisawesome pushed a commit that referenced this pull request Nov 21, 2021
…ing, not `trainer_template.py`." (#20285)

* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)"
This reverts commit 246787c.
Co-authored-by: sven1977 <[email protected]>
@sven1977 sven1977 deleted the poc_deprecate_trainer_template branch June 2, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants