-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
.
#20055
[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
.
#20055
Conversation
…deprecate_trainer_template
rllib/agents/trainer.py
Outdated
@PublicAPI | ||
def train(self) -> ResultDict: | ||
"""Overrides super.train to synchronize global vars.""" | ||
def step(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't override Trainable.train()
anymore (which was never a good idea anyways as sub-classes of Trainable
should only override setup()
.
@@ -404,8 +404,9 @@ def _test(what, method_to_test, obs_space, full_fetch, explore, timestep, | |||
if what is trainer: | |||
# Get the obs-space from Workers.env (not Policy) due to possible | |||
# pre-processor up front. | |||
worker_set = getattr(trainer, "workers", | |||
getattr(trainer, "_workers", None)) | |||
worker_set = getattr(trainer, "workers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For algos like ES and ARS, that still use self._workers
instead of self.workers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a TODO to migrate ES and ARS, so we can get rid of this weird if?
rllib/trainer/pg/pg_torch_policy.py
Outdated
@@ -0,0 +1,86 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply moved here, no changes done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there still things in agents/ directory if we move the policies etc here?
personally, I feel like the existing structure of having all the custom overrides in agents/ felt pretty natural. this also minimizes disruptive changes for users. basically:
rllib/
- trainer/ # generic trainer definitions
- policy/ # generic policy definitions
- evaluation/ # worker definitions
- agents/
- pg/ # pg overrides of policy and trainer
- other agents/
as written, having policy overrides under trainer/ feels slightly illogical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. However, we are already doing this illogically today: We define policy-overrides inside the agents
dir.
The reason of starting to rename agents
into trainer
with this PR is that the word "agent" should only be used for acting entities in a multi-agent/single-agent environment, not for the "thing that trains policies". Our Trainers - e.g. PPOTrainer - used to be called Agents - e.g. "PPOAgent" - and we renamed these classes some time ago w/o renaming the directory at the same time. I would like to start moving everything from "agents" into "trainer" and then remove the agents dir entirely. But yes, maybe we should do this all at once (and not in this PR) or one by one. Not sure.
On the policy overrides: Would it be better to move these into the policy
dir? I'm not sure this would be a good idea.
My suggestion would therefore be:
rllib/
trainer/ # all contents of `agents` will move into here, eventually
trainer.py
pg/
pg_trainer.py
pg_tf_policy.py
pg_torch_policy.py
ppo/
...
a3c/
...
agents/ # deprecate soon: b/c confusing terminology, which clashes with single-agent/multi-agent envs
policy/ # generic policy defs (remain as-is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know, what you think.
rllib/trainer/pg/pg_tf_policy.py
Outdated
@@ -0,0 +1,56 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply moved here, no changes done.
rllib/trainer/pg/default_config.py
Outdated
@@ -0,0 +1,16 @@ | |||
from ray.rllib.agents.trainer import with_common_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to start separating the default config for each algo from the rest of the files.
@@ -262,19 +225,6 @@ def _before_evaluate(self): | |||
if before_evaluate_fn: | |||
before_evaluate_fn(self) | |||
|
|||
@override(Trainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved into Trainer
.
self.execution_plan = execution_plan | ||
self.train_exec_impl = execution_plan( | ||
self.workers, config, **self._kwargs_for_execution_plan()) | ||
if execution_plan is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_execution_plan()
was moved into Trainer
, so a value of None is ok here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually can you just add this as a comment:
Override the default_execution_plan set in Trainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, overall, I think this looks great, and matches exactly how I imagine a custom trainer should be implemented.
a couple of structural suggestions, but love this!
self.execution_plan = execution_plan | ||
self.train_exec_impl = execution_plan( | ||
self.workers, config, **self._kwargs_for_execution_plan()) | ||
if execution_plan is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually can you just add this as a comment:
Override the default_execution_plan set in Trainer.
rllib/trainer/pg/pg_torch_policy.py
Outdated
@@ -0,0 +1,86 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there still things in agents/ directory if we move the policies etc here?
personally, I feel like the existing structure of having all the custom overrides in agents/ felt pretty natural. this also minimizes disruptive changes for users. basically:
rllib/
- trainer/ # generic trainer definitions
- policy/ # generic policy definitions
- evaluation/ # worker definitions
- agents/
- pg/ # pg overrides of policy and trainer
- other agents/
as written, having policy overrides under trainer/ feels slightly illogical.
rllib/trainer/pg/__init__.py
Outdated
@@ -0,0 +1,14 @@ | |||
from ray.rllib.trainer.pg.pg_trainer import PGTrainer, DEFAULT_CONFIG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can avoid having to export custom loss functions if we keep trainer and policy overrides in existing directory structure?
see the other comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undid the moving of PG into rllib/trainer
. We can do this another time and maybe all Trainer classes at once.
PGTrainer
class that works by sub-classing, not trainer_template.py
.
…deprecate_trainer_template
…deprecate_trainer_template # Conflicts: # rllib/agents/trainer.py
Hey @gjoliver , could you take another look? I think I fixed all requests and answered all remaining questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks very good actually.
just a couple of really simple questions.
workers.reset(healthy_workers) | ||
self.train_exec_impl = self.execution_plan( | ||
workers, self.config, **self._kwargs_for_execution_plan()) | ||
if self.train_exec_impl is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I wonder why we need to check this here ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do. What if the user does not provide an execution_plan
but implements step()
herself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. so in this case user does her own init() and step(), and we can't really recover anything for her here. that is quite interesting.
can you please help comment this.
thanks.
# By default, `setup` should create both worker sets: "rollout workers" | ||
# for collecting samples for training and - if applicable - "evaluation | ||
# workers". | ||
except NotImplementedError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious, instead of this try except, we should be able to simply put all the logics in this except block into a default self._init() function?
that seems to have the same effect as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. We are trying to deprecate the _init()
method (in favor of Trainable.setup()
as indented by Tune Trainable API). Moving functionality now into _init
just to avoid the try/except would counter that effort and again make it harder to read the code in setup()
(use would have to jump into _init
to find the WorkerSet generating code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I figured it doesn't hurt to ask :) thanks for the explanation.
@@ -404,8 +404,9 @@ def _test(what, method_to_test, obs_space, full_fetch, explore, timestep, | |||
if what is trainer: | |||
# Get the obs-space from Workers.env (not Policy) due to possible | |||
# pre-processor up front. | |||
worker_set = getattr(trainer, "workers", | |||
getattr(trainer, "_workers", None)) | |||
worker_set = getattr(trainer, "workers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a TODO to migrate ES and ARS, so we can get rid of this weird if?
there also seems to be a relevant test failure: File "/ray/python/ray/rllib/agents/trainer.py", line 2073, in setstate | (_RemoteSingleAgentEnv pid=18233) if self.local_replay_buffer is not None: |
…deprecate_trainer_template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks for your patience with my questions and comments. nice change!
# By default, `setup` should create both worker sets: "rollout workers" | ||
# for collecting samples for training and - if applicable - "evaluation | ||
# workers". | ||
except NotImplementedError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I figured it doesn't hurt to ask :) thanks for the explanation.
workers.reset(healthy_workers) | ||
self.train_exec_impl = self.execution_plan( | ||
workers, self.config, **self._kwargs_for_execution_plan()) | ||
if self.train_exec_impl is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. so in this case user does her own init() and step(), and we can't really recover anything for her here. that is quite interesting.
can you please help comment this.
thanks.
…deprecate_trainer_template
…ing, not `trainer_template.py`." (#20285) * Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit 246787c. Co-authored-by: sven1977 <[email protected]>
…ing, not `trainer_template.py`." (#20285) * Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit 246787c. Co-authored-by: sven1977 <[email protected]>
…ing, not `trainer_template.py`." (#20285) * Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit 246787c. Co-authored-by: sven1977 <[email protected]>
POC: The utility method:
rllib/agents/trainer_template.py::build_trainer
should be deprecated. It's confusing to look for a certain Trainer functionality and not know, whether to check the template class or theTrainer
class directly.Instead of
build_trainer()
, custom Trainer classes should be created via sub-classing, as is done in this PR for thePGTrainer
example.This PR:
PGTrainer
directly fromTrainer
.Trainable.train()
(it shouldn't as it's fully defined byTrainer's
super class:Trainable
). Instead, only overrideTrainable.setup()
(as is intended byTrainable
).Trainer
class andbuild_trainer()
to make sure a) sub-classing from Trainer, b) usingbuild_trainer
, and c) legacy sub-classing from Trainer (e.g. ES and ARS algos) all still work ok.Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.