[RLlib] Examples folder: All `training_iteration` translations. #23712

sven1977 · 2022-04-05T12:57:29Z

All scripts in rllib/examples that use execution_plan are translated to using the new training_iteration API.

random_parametric_agent.py
two_trainer_workflow.py

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…_training_itr

avnishn

Looks good to me, have some questions.

avnishn · 2022-04-05T12:59:51Z

rllib/evaluation/rollout_worker.py

@@ -922,8 +922,9 @@ def learn_on_batch(self, samples: SampleBatchType) -> Dict:
                    summarize(samples)
                )
            )
+
+        info_out = {}


Why make this change?

This was a bug. There are some cases, where info_out would not be defined later in the code. I wanted to make sure it's at least an empty dict :)

yeah, it's a good idea to define info_out here.
but now that we are doing this, we should try to make sure we always and only do info_out.update(...) in the subsequent code, like line 945.

avnishn · 2022-04-05T13:01:10Z

rllib/execution/train_ops.py

@@ -36,7 +36,7 @@


 @ExperimentalAPI
-def train_one_step(trainer, train_batch) -> Dict:
+def train_one_step(trainer, train_batch, policies_to_train=None) -> Dict:


Where is this change implemented in the code base?

Yeah, actually, let me add a nice docstring for this function. ...

avnishn · 2022-04-05T13:02:18Z

rllib/examples/two_trainer_workflow.py

+        dqn_train_results = {}
+        dqn_train_batch = self.local_replay_buffer.replay()
+        if dqn_train_batch is not None:
+            dqn_train_results = train_one_step(self, dqn_train_batch, ["dqn_policy"])


Oh it's used here.

gjoliver

nice pr. I know it's already merged, but I have a few comments ...

gjoliver · 2022-04-05T18:11:56Z

rllib/evaluation/rollout_worker.py

@@ -922,8 +922,9 @@ def learn_on_batch(self, samples: SampleBatchType) -> Dict:
                    summarize(samples)
                )
            )
+
+        info_out = {}


yeah, it's a good idea to define info_out here.
but now that we are doing this, we should try to make sure we always and only do info_out.update(...) in the subsequent code, like line 945.

gjoliver · 2022-04-05T18:14:19Z

rllib/examples/random_parametric_agent.py

-# Create a new Trainer using the Policy and config defined above and a new
-# execution plan.
+# Backward compatibility, just in case users want to use the erroneous old name.
+RandomParametriclPolicy = RandomParametricPolicy


is this really necessary?? :) :)
this is an example script ... folks probably shouldn't import an example script and use it as a library?

gjoliver · 2022-04-05T18:16:35Z

rllib/examples/random_parametric_agent.py

-        # Return training metrics.
-        return StandardMetricsReporting(rollouts, workers, config)
+        # Return (empty) training metrics.
+        return {}


why not collect rollout related metrics here?

B/c it's done automatically by RLlib after this. So we always just return the learner stats here.
But yes, we should start thinking about a way to customize this bit of the iteration.

gjoliver · 2022-04-05T18:22:57Z

rllib/examples/two_trainer_workflow.py


-        return StandardMetricsReporting(train_op, workers, config)
+        # Combine results for PPO and DQN into one results dict.
+        results = dict(ppo_train_results, **dqn_train_results)


this small example is actually really cool!! 2 trainers train 2 policies for different agents in a same ma env. nice.

it doesn't feel exactly right here though, we should be combining the result dict? for example, not overwriting ppo steps with dqn steps, but summing them up?
I wonder how Concurrently does it.

No, they are not overridden. The PPO stats and DQN stats will reside under their policy ID keys.

results = { "ppo_policy": [some stats], "dqn_policy": [some other stats] }

…project#23712)

sven1977 added 6 commits April 4, 2022 16:43

wip

14e5ecd

wip

9a08860

wip

f4142e6

Merge branch 'master' of https://github.com/ray-project/ray into rest…

ef5d085

…_training_itr

wip

da7d31b

wip

9d626c8

sven1977 requested review from gjoliver and avnishn as code owners April 5, 2022 12:57

sven1977 assigned gjoliver and avnishn and unassigned gjoliver Apr 5, 2022

avnishn approved these changes Apr 5, 2022

View reviewed changes

sven1977 merged commit 434265e into ray-project:master Apr 5, 2022

gjoliver reviewed Apr 5, 2022

View reviewed changes

edoakes pushed a commit to edoakes/ray that referenced this pull request Apr 7, 2022

[RLlib] Examples folder: All training_iteration translations. (ray-…

6873c5d

…project#23712)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Examples folder: All `training_iteration` translations. #23712

[RLlib] Examples folder: All `training_iteration` translations. #23712

sven1977 commented Apr 5, 2022 •

edited

Loading

avnishn left a comment

avnishn Apr 5, 2022

sven1977 Apr 5, 2022

gjoliver Apr 5, 2022

avnishn Apr 5, 2022

sven1977 Apr 5, 2022

avnishn Apr 5, 2022

gjoliver left a comment

gjoliver Apr 5, 2022

gjoliver Apr 5, 2022

gjoliver Apr 5, 2022

sven1977 Apr 6, 2022

gjoliver Apr 5, 2022

sven1977 Apr 6, 2022

[RLlib] Examples folder: All training_iteration translations. #23712

[RLlib] Examples folder: All training_iteration translations. #23712

Conversation

sven1977 commented Apr 5, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] Examples folder: All `training_iteration` translations. #23712

[RLlib] Examples folder: All `training_iteration` translations. #23712

sven1977 commented Apr 5, 2022 •

edited

Loading