[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. #18879

sven1977 · 2021-09-24T10:43:30Z

Addressing issue #18116

Unify all RLlib Trainer.train() -> results structures:

The current return values (dict) of Trainer.train() are not consistent between the following different setups:

multi-agent
multi-GPU
algos that use >1 SGD iterations (e.g. PPO).
tf and torch

Most importantly, the returned dict should have an "info" key under which there is a "learner" key (LEARNER_INFO), that holds a (multi-agent) policy dict ("default_policy" as the only key in case we don't have a multi-agent setup).
Under each policy key, the policy can store its stats_fn output (key: LEARNER_STATS) and the extra batch fetches (e.g. td-errors).

For example:

result = trainer.train()
result = {
  info:
    learner:
      default_policy:
        learner_stats:
          loss: ...
          cur_lr: ...
        td_error: np.array(...)

Also added a new structure test to each algorithm's simple compilation test that confirms that this structure is abided to. Since these tests make sure all algos return the same structure across any of the above cases and combinations thereof.
Note that multi-GPU for multi-agent is not yet supported.

#18116

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…i-gpu-metric-fix

sven1977 · 2021-09-24T10:43:56Z

@Bam4d @mvindiola1 ^

sven1977 · 2021-09-24T10:44:28Z

Waiting for tests to pass.

This PR does not address the multi-GPU torch race condition yet (@mvindiola1).

…y_info_learner_stats_and_add_stats_structure_tests

sven1977 · 2021-09-24T15:25:25Z

rllib/utils/sgd.py


 logger = logging.getLogger(__name__)


-def averaged(kv, axis=None):


no longer needed

gjoliver

wow, I feel like we are making great progress in cleaning up the APIs :)
awesome change, just a few simple questions.

gjoliver · 2021-09-24T17:40:21Z

rllib/execution/multi_gpu_learner_thread.py

                offset=0, buffer_index=buffer_idx)
+            learner_info_builder.add_learn_on_batch_results(


maybe just add a comment here pointing out that "multi-gpu for multi-agent" is not supported yet.

Great point, will do!

Added a comment to each location, where we use the new LearnerInfoBuilder explaining that it unifies results dict structures across the different setups (multi-GPU, multi-agent, tf/torch, etc..).

gjoliver · 2021-09-24T17:46:20Z

rllib/utils/test_utils.py

+
+    from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID
+    from ray.rllib.utils.metrics.learner_info import LEARNER_INFO, \
+        LEARNER_STATS_KEY


is there a reason not to import these at the top of the file?
not important, just a style nit.

Yeah, unfortunately, adding it to the top would lead to a circular import.

hmm, that may be a sign that we should break up files / restructure things a little better.
we can do it later though, no need to block this PR.

You are right. This typically happens when we have a structure like:

base_class_a.py -> imports something from utils base_class_b.py -> imports something from utils utils.py -> some simple utils, BUT also contains a useful function (e.g. for testing/debugging) that does need one of a or b

oh ok, get it.
sounds like we just need to break utils.py into base_utils.py and utils.py. terrible at naming, you get the ideas.

gjoliver · 2021-09-24T17:48:51Z

rllib/utils/test_utils.py

+        LEARNER_STATS_KEY
+
+    # Assert that all the keys are where we would expect them.
+    for key in ["info", "hist_stats", "timers", "perf", "episode_reward_mean"]:


should we define this list somewhere as a constant too?
also, should we make sure train_results doesn't contain any keys that we don't know here?
so next time someone adds a new key to the result set, this will make sure they update the known key list.

Actually, this is just a pretty random list of keys that "should" be there, but it's not exhaustive and users can add any other keys to their custom Trainer's results dict.

For this test only, we could try adding more keys to this list to make sure our built-in algos are guaranteeing that at least these keys are always there.

I added more keys, but again, user may add even more in their custom Trainer classes.
So this structure is not a very strict requirement. The focus in this test here is only on the "info" key, though.

gjoliver · 2021-09-24T17:50:19Z

rllib/utils/test_utils.py

+        assert key in train_results, \
+            f"'{key}' not found in `train_results` ({train_results})!"
+
+    is_multi_agent = len(train_results["policy_reward_min"]) > 0


now that everything is consistent, can we check the size of the policy dict or something?
seems like a more direct condition than reward_min.

Yeah, you are totally right! We even have a util function that returns is_multiagent based on the config. I'll use that here instead.

gjoliver · 2021-09-24T17:51:09Z

rllib/utils/test_utils.py

+        if "td_error" in policy_stats:
+            configured_b = train_results["config"]["train_batch_size"]
+            actual_b = policy_stats["td_error"].shape[0]
+            assert (configured_b - actual_b) / actual_b <= 0.1


nice!!! :) 👍

gjoliver · 2021-09-24T17:52:06Z

rllib/utils/metrics/learner_info.py

+
+    def add_learn_on_batch_results(
+            self,
+            results: dict,


typing.Dict instead of dict?

…y_info_learner_stats_and_add_stats_structure_tests

sven1977 · 2021-09-29T14:21:43Z

Hey @gjoliver , could you give this another go? I made a couple of changes as suggested and tests are passing now.

gjoliver

Thanks, nice change.

Bam4d and others added 9 commits September 8, 2021 15:18

fixing policy metric reporting for multi-gpu

5932c28

stats from multi-gpu towers now reported

6d0da36

use merge dicts as suggested

288164e

missing dependency

57953c4

merge dict params

91abdb6

fix formatting

cf5c20e

Merge branch 'master' of https://github.com/ray-project/ray into mult…

74bb7a8

…i-gpu-metric-fix

wip.

3ad0ca5

wip.

1e6bab1

sven1977 added 2 commits September 24, 2021 13:21

wip.

fbca9fd

Merge branch 'master' of https://github.com/ray-project/ray into unif…

785a480

…y_info_learner_stats_and_add_stats_structure_tests

sven1977 requested a review from gjoliver September 24, 2021 11:27

sven1977 assigned gjoliver Sep 24, 2021

fixes

91226ca

sven1977 commented Sep 24, 2021

View reviewed changes

rllib/utils/sgd.py

logger = logging.getLogger(__name__)

def averaged(kv, axis=None):

Copy link

Contributor Author

sven1977 Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed

sven1977 added 3 commits September 24, 2021 17:46

fix.

4042612

fix.

2f1bbb4

wip.

bcb2ec1

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Sep 24, 2021

gjoliver reviewed Sep 24, 2021

View reviewed changes

sven1977 added 2 commits September 27, 2021 10:23

Merge branch 'master' of https://github.com/ray-project/ray into unif…

88565d5

…y_info_learner_stats_and_add_stats_structure_tests

wip.

a47cefb

gjoliver approved these changes Sep 30, 2021

View reviewed changes

sven1977 merged commit ed85f59 into ray-project:master Sep 30, 2021

sven1977 deleted the unify_info_learner_stats_and_add_stats_structure_tests branch June 2, 2023 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. #18879

[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. #18879

sven1977 commented Sep 24, 2021 •

edited

Loading

sven1977 commented Sep 24, 2021

sven1977 commented Sep 24, 2021

sven1977 Sep 24, 2021

gjoliver left a comment

gjoliver Sep 24, 2021

sven1977 Sep 27, 2021

sven1977 Sep 27, 2021

gjoliver Sep 24, 2021

sven1977 Sep 27, 2021

gjoliver Sep 28, 2021

sven1977 Sep 29, 2021

gjoliver Sep 29, 2021

gjoliver Sep 24, 2021

sven1977 Sep 27, 2021

gjoliver Sep 28, 2021

sven1977 Sep 29, 2021

gjoliver Sep 24, 2021

sven1977 Sep 27, 2021

sven1977 Sep 27, 2021

gjoliver Sep 24, 2021

gjoliver Sep 24, 2021

sven1977 Sep 27, 2021

sven1977 commented Sep 29, 2021

gjoliver left a comment


		logger = logging.getLogger(__name__)


		def averaged(kv, axis=None):

		offset=0, buffer_index=buffer_idx)
		learner_info_builder.add_learn_on_batch_results(

[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. #18879

[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. #18879

Conversation

sven1977 commented Sep 24, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 commented Sep 24, 2021

sven1977 commented Sep 24, 2021

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Sep 29, 2021

gjoliver left a comment

Choose a reason for hiding this comment

sven1977 commented Sep 24, 2021 •

edited

Loading