[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. #20055

sven1977 · 2021-11-04T08:52:57Z

POC: The utility method: rllib/agents/trainer_template.py::build_trainer should be deprecated. It's confusing to look for a certain Trainer functionality and not know, whether to check the template class or the Trainer class directly.
Instead of build_trainer(), custom Trainer classes should be created via sub-classing, as is done in this PR for the PGTrainer example.

This PR:

Sub-classes PGTrainer directly from Trainer.
Design enhancement: No longer overrides Trainable.train() (it shouldn't as it's fully defined by Trainer's super class: Trainable). Instead, only override Trainable.setup() (as is intended by Trainable).
Move PG into new rllib/trainer directory to move toward disentangling the ambiguity of the term "agent", which should be purely used as: "an acting entity within a (possibly multi-agent) environment".
Make minor adjustments to Trainer class and build_trainer() to make sure a) sub-classing from Trainer, b) using build_trainer, and c) legacy sub-classing from Trainer (e.g. ES and ARS algos) all still work ok.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…deprecate_trainer_template

sven1977 · 2021-11-04T09:06:48Z

rllib/agents/trainer.py

-    @PublicAPI
-    def train(self) -> ResultDict:
-        """Overrides super.train to synchronize global vars."""
+    def step(self):


We don't override Trainable.train() anymore (which was never a good idea anyways as sub-classes of Trainable should only override setup().

sven1977 · 2021-11-04T09:23:30Z

rllib/utils/test_utils.py

@@ -404,8 +404,9 @@ def _test(what, method_to_test, obs_space, full_fetch, explore, timestep,
        if what is trainer:
            # Get the obs-space from Workers.env (not Policy) due to possible
            # pre-processor up front.
-            worker_set = getattr(trainer, "workers",
-                                 getattr(trainer, "_workers", None))
+            worker_set = getattr(trainer, "workers")


For algos like ES and ARS, that still use self._workers instead of self.workers.

add a TODO to migrate ES and ARS, so we can get rid of this weird if?

sven1977 · 2021-11-04T09:24:08Z

rllib/trainer/pg/pg_torch_policy.py

@@ -0,0 +1,86 @@
+"""


Simply moved here, no changes done.

are there still things in agents/ directory if we move the policies etc here?
personally, I feel like the existing structure of having all the custom overrides in agents/ felt pretty natural. this also minimizes disruptive changes for users. basically:

rllib/ - trainer/ # generic trainer definitions - policy/ # generic policy definitions - evaluation/ # worker definitions - agents/ - pg/ # pg overrides of policy and trainer - other agents/

as written, having policy overrides under trainer/ feels slightly illogical.

I see your point. However, we are already doing this illogically today: We define policy-overrides inside the agents dir.
The reason of starting to rename agents into trainer with this PR is that the word "agent" should only be used for acting entities in a multi-agent/single-agent environment, not for the "thing that trains policies". Our Trainers - e.g. PPOTrainer - used to be called Agents - e.g. "PPOAgent" - and we renamed these classes some time ago w/o renaming the directory at the same time. I would like to start moving everything from "agents" into "trainer" and then remove the agents dir entirely. But yes, maybe we should do this all at once (and not in this PR) or one by one. Not sure.

On the policy overrides: Would it be better to move these into the policy dir? I'm not sure this would be a good idea.

My suggestion would therefore be:

rllib/ trainer/ # all contents of `agents` will move into here, eventually trainer.py pg/ pg_trainer.py pg_tf_policy.py pg_torch_policy.py ppo/ ... a3c/ ... agents/ # deprecate soon: b/c confusing terminology, which clashes with single-agent/multi-agent envs policy/ # generic policy defs (remain as-is)

Let me know, what you think.

sven1977 · 2021-11-04T09:24:17Z

rllib/trainer/pg/pg_tf_policy.py

@@ -0,0 +1,56 @@
+"""


Simply moved here, no changes done.

sven1977 · 2021-11-04T09:24:43Z

rllib/trainer/pg/default_config.py

@@ -0,0 +1,16 @@
+from ray.rllib.agents.trainer import with_common_config


Would like to start separating the default config for each algo from the rest of the files.

sven1977 · 2021-11-04T09:25:03Z

rllib/agents/trainer_template.py

@@ -262,19 +225,6 @@ def _before_evaluate(self):
            if before_evaluate_fn:
                before_evaluate_fn(self)

-        @override(Trainer)


moved into Trainer.

sven1977 · 2021-11-04T09:25:54Z

rllib/agents/trainer_template.py

-            self.execution_plan = execution_plan
-            self.train_exec_impl = execution_plan(
-                self.workers, config, **self._kwargs_for_execution_plan())
+            if execution_plan is not None:


default_execution_plan() was moved into Trainer, so a value of None is ok here.

actually can you just add this as a comment:

Override the default_execution_plan set in Trainer.

gjoliver

ok, overall, I think this looks great, and matches exactly how I imagine a custom trainer should be implemented.
a couple of structural suggestions, but love this!

gjoliver · 2021-11-05T08:15:58Z

rllib/agents/trainer_template.py

-            self.execution_plan = execution_plan
-            self.train_exec_impl = execution_plan(
-                self.workers, config, **self._kwargs_for_execution_plan())
+            if execution_plan is not None:


actually can you just add this as a comment:

Override the default_execution_plan set in Trainer.

gjoliver · 2021-11-05T08:31:18Z

rllib/trainer/pg/pg_torch_policy.py

@@ -0,0 +1,86 @@
+"""


are there still things in agents/ directory if we move the policies etc here?
personally, I feel like the existing structure of having all the custom overrides in agents/ felt pretty natural. this also minimizes disruptive changes for users. basically:

rllib/ - trainer/ # generic trainer definitions - policy/ # generic policy definitions - evaluation/ # worker definitions - agents/ - pg/ # pg overrides of policy and trainer - other agents/

as written, having policy overrides under trainer/ feels slightly illogical.

gjoliver · 2021-11-05T08:33:33Z

rllib/trainer/pg/__init__.py

@@ -0,0 +1,14 @@
+from ray.rllib.trainer.pg.pg_trainer import PGTrainer, DEFAULT_CONFIG


I guess we can avoid having to export custom loss functions if we keep trainer and policy overrides in existing directory structure?
see the other comment below.

Undid the moving of PG into rllib/trainer. We can do this another time and maybe all Trainer classes at once.

…deprecate_trainer_template

…deprecate_trainer_template # Conflicts: # rllib/agents/trainer.py

sven1977 · 2021-11-09T09:34:37Z

Hey @gjoliver , could you take another look? I think I fixed all requests and answered all remaining questions.

gjoliver

looks very good actually.
just a couple of really simple questions.

gjoliver · 2021-11-10T10:05:52Z

rllib/agents/trainer.py

        workers.reset(healthy_workers)
-        self.train_exec_impl = self.execution_plan(
-            workers, self.config, **self._kwargs_for_execution_plan())
+        if self.train_exec_impl is not None:


hmm I wonder why we need to check this here ...?

We do. What if the user does not provide an execution_plan but implements step() herself?

I see. so in this case user does her own init() and step(), and we can't really recover anything for her here. that is quite interesting.
can you please help comment this.
thanks.

gjoliver · 2021-11-10T10:08:18Z

rllib/agents/trainer.py

+        # By default, `setup` should create both worker sets: "rollout workers"
+        # for collecting samples for training and - if applicable - "evaluation
+        # workers".
+        except NotImplementedError:


I am curious, instead of this try except, we should be able to simply put all the logics in this except block into a default self._init() function?
that seems to have the same effect as this.

Not sure. We are trying to deprecate the _init() method (in favor of Trainable.setup() as indented by Tune Trainable API). Moving functionality now into _init just to avoid the try/except would counter that effort and again make it harder to read the code in setup() (use would have to jump into _init to find the WorkerSet generating code).

yeah, I figured it doesn't hurt to ask :) thanks for the explanation.

gjoliver · 2021-11-10T10:11:12Z

rllib/utils/test_utils.py

@@ -404,8 +404,9 @@ def _test(what, method_to_test, obs_space, full_fetch, explore, timestep,
        if what is trainer:
            # Get the obs-space from Workers.env (not Policy) due to possible
            # pre-processor up front.
-            worker_set = getattr(trainer, "workers",
-                                 getattr(trainer, "_workers", None))
+            worker_set = getattr(trainer, "workers")


add a TODO to migrate ES and ARS, so we can get rid of this weird if?

gjoliver · 2021-11-10T10:12:37Z

there also seems to be a relevant test failure:

File "/ray/python/ray/rllib/agents/trainer.py", line 2073, in setstate

| (_RemoteSingleAgentEnv pid=18233) if self.local_replay_buffer is not None:
| (_RemoteSingleAgentEnv pid=18233) AttributeError: 'PGTrainer' object has no attribute 'local_replay_buffer'

…deprecate_trainer_template

gjoliver

cool, thanks for your patience with my questions and comments. nice change!

gjoliver · 2021-11-10T21:39:00Z

rllib/agents/trainer.py

+        # By default, `setup` should create both worker sets: "rollout workers"
+        # for collecting samples for training and - if applicable - "evaluation
+        # workers".
+        except NotImplementedError:


yeah, I figured it doesn't hurt to ask :) thanks for the explanation.

gjoliver · 2021-11-10T21:45:46Z

rllib/agents/trainer.py

        workers.reset(healthy_workers)
-        self.train_exec_impl = self.execution_plan(
-            workers, self.config, **self._kwargs_for_execution_plan())
+        if self.train_exec_impl is not None:


I see. so in this case user does her own init() and step(), and we can't really recover anything for her here. that is quite interesting.
can you please help comment this.
thanks.

…deprecate_trainer_template

…t `trainer_template.py`. (#20055)" This reverts commit 6f85af4.

…t `trainer_template.py`. (#20055)" (#20284) This reverts commit 6f85af4.

…sing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit 246787c.

…ing, not `trainer_template.py`." (#20285) * Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit 246787c. Co-authored-by: sven1977 <[email protected]>

sven1977 added 4 commits November 3, 2021 22:28

wip.

a4f30be

wip.

a4262bd

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

2010956

…deprecate_trainer_template

wip.

e182556

sven1977 commented Nov 4, 2021

View reviewed changes

wip.

e1c5f42

sven1977 requested a review from gjoliver November 4, 2021 09:22

sven1977 assigned gjoliver Nov 4, 2021

sven1977 commented Nov 4, 2021

View reviewed changes

rllib/trainer/pg/pg_tf_policy.py Outdated

@@ -0,0 +1,56 @@

"""

Copy link

Contributor Author

sven1977 Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply moved here, no changes done.

sven1977 commented Nov 4, 2021

View reviewed changes

sven1977 added 5 commits November 4, 2021 10:31

wip.

c8967a6

wip

0f2abda

wip.

b824242

wip.

4907c58

wip.

c55e3ee

gjoliver reviewed Nov 5, 2021

View reviewed changes

sven1977 changed the title ~~[RLlib] POC: Prove that trainer_template is not needed; simplify PGTrainer~~ [RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. Nov 5, 2021

sven1977 added 7 commits November 5, 2021 15:42

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

a31bbd3

…deprecate_trainer_template

wip.

42cc572

wip.

85d3bea

wip.

c4f4a28

wip.

fa17d3c

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

29fda43

…deprecate_trainer_template # Conflicts: # rllib/agents/trainer.py

wip.

f85b42d

gjoliver reviewed Nov 10, 2021

View reviewed changes

sven1977 added 4 commits November 10, 2021 16:25

wip.

1415f37

wip.

749c62d

wip.

23a9653

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

3d9f949

…deprecate_trainer_template

gjoliver approved these changes Nov 10, 2021

View reviewed changes

sven1977 added 4 commits November 11, 2021 09:26

wip.

faf7b9f

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

73292fa

…deprecate_trainer_template

wip.

d283ab2

wip.

7d50c61

sven1977 merged commit 6f85af4 into ray-project:master Nov 11, 2021

krfricke added a commit that referenced this pull request Nov 12, 2021

Revert "[RLlib] POC: PGTrainer class that works by sub-classing, no…

cbae3fb

…t `trainer_template.py`. (#20055)" This reverts commit 6f85af4.

krfricke mentioned this pull request Nov 12, 2021

Revert "[RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py." #20284

Merged

krfricke added a commit that referenced this pull request Nov 12, 2021

Revert "[RLlib] POC: PGTrainer class that works by sub-classing, no…

246787c

…t `trainer_template.py`. (#20055)" (#20284) This reverts commit 6f85af4.

krfricke added a commit that referenced this pull request Nov 12, 2021

Revert "Revert "[RLlib] POC: PGTrainer class that works by sub-clas…

464f8b3

…sing, not `trainer_template.py`. (#20055)" (#20284)" This reverts commit 246787c.

sven1977 deleted the poc_deprecate_trainer_template branch June 2, 2023 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. #20055

[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. #20055

sven1977 commented Nov 4, 2021 •

edited

Loading

sven1977 Nov 4, 2021

sven1977 Nov 4, 2021

gjoliver Nov 10, 2021

sven1977 Nov 4, 2021

gjoliver Nov 5, 2021

sven1977 Nov 5, 2021 •

edited

Loading

sven1977 Nov 5, 2021

sven1977 Nov 4, 2021

sven1977 Nov 4, 2021

sven1977 Nov 4, 2021

sven1977 Nov 4, 2021

gjoliver Nov 5, 2021

sven1977 Nov 5, 2021

gjoliver left a comment

gjoliver Nov 5, 2021

gjoliver Nov 5, 2021

gjoliver Nov 5, 2021

sven1977 Nov 9, 2021

sven1977 commented Nov 9, 2021

gjoliver left a comment

gjoliver Nov 10, 2021

sven1977 Nov 10, 2021

gjoliver Nov 10, 2021

gjoliver Nov 10, 2021

sven1977 Nov 10, 2021

gjoliver Nov 10, 2021

gjoliver Nov 10, 2021

gjoliver commented Nov 10, 2021

gjoliver left a comment

gjoliver Nov 10, 2021

gjoliver Nov 10, 2021

		@@ -0,0 +1,16 @@
		from ray.rllib.agents.trainer import with_common_config

		@@ -0,0 +1,14 @@
		from ray.rllib.trainer.pg.pg_trainer import PGTrainer, DEFAULT_CONFIG

[RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. #20055

[RLlib] POC: PGTrainer class that works by sub-classing, not trainer_template.py. #20055

Conversation

sven1977 commented Nov 4, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Nov 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Nov 9, 2021

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver commented Nov 10, 2021

File "/ray/python/ray/rllib/agents/trainer.py", line 2073, in setstate

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. #20055

[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. #20055

sven1977 commented Nov 4, 2021 •

edited

Loading

sven1977 Nov 5, 2021 •

edited

Loading