Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. #18879

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Sep 24, 2021

Addressing issue #18116

Unify all RLlib Trainer.train() -> results structures:

The current return values (dict) of Trainer.train() are not consistent between the following different setups:

  • multi-agent
  • multi-GPU
  • algos that use >1 SGD iterations (e.g. PPO).
  • tf and torch

Most importantly, the returned dict should have an "info" key under which there is a "learner" key (LEARNER_INFO), that holds a (multi-agent) policy dict ("default_policy" as the only key in case we don't have a multi-agent setup).
Under each policy key, the policy can store its stats_fn output (key: LEARNER_STATS) and the extra batch fetches (e.g. td-errors).

For example:

result = trainer.train()
result = {
  info:
    learner:
      default_policy:
        learner_stats:
          loss: ...
          cur_lr: ...
        td_error: np.array(...)

Also added a new structure test to each algorithm's simple compilation test that confirms that this structure is abided to. Since these tests make sure all algos return the same structure across any of the above cases and combinations thereof.
Note that multi-GPU for multi-agent is not yet supported.

#18116

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977
Copy link
Contributor Author

@Bam4d @mvindiola1 ^

@sven1977
Copy link
Contributor Author

Waiting for tests to pass.

This PR does not address the multi-GPU torch race condition yet (@mvindiola1).


logger = logging.getLogger(__name__)


def averaged(kv, axis=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed

@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Sep 24, 2021
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I feel like we are making great progress in cleaning up the APIs :)
awesome change, just a few simple questions.

offset=0, buffer_index=buffer_idx)
learner_info_builder.add_learn_on_batch_results(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just add a comment here pointing out that "multi-gpu for multi-agent" is not supported yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to each location, where we use the new LearnerInfoBuilder explaining that it unifies results dict structures across the different setups (multi-GPU, multi-agent, tf/torch, etc..).


from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID
from ray.rllib.utils.metrics.learner_info import LEARNER_INFO, \
LEARNER_STATS_KEY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason not to import these at the top of the file?
not important, just a style nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unfortunately, adding it to the top would lead to a circular import.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that may be a sign that we should break up files / restructure things a little better.
we can do it later though, no need to block this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This typically happens when we have a structure like:

base_class_a.py -> imports something from utils
base_class_b.py -> imports something from utils

utils.py -> some simple utils, BUT also contains a useful function (e.g. for testing/debugging) that does need one of a or b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok, get it.
sounds like we just need to break utils.py into base_utils.py and utils.py. terrible at naming, you get the ideas.

LEARNER_STATS_KEY

# Assert that all the keys are where we would expect them.
for key in ["info", "hist_stats", "timers", "perf", "episode_reward_mean"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we define this list somewhere as a constant too?
also, should we make sure train_results doesn't contain any keys that we don't know here?
so next time someone adds a new key to the result set, this will make sure they update the known key list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is just a pretty random list of keys that "should" be there, but it's not exhaustive and users can add any other keys to their custom Trainer's results dict.

For this test only, we could try adding more keys to this list to make sure our built-in algos are guaranteeing that at least these keys are always there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more keys, but again, user may add even more in their custom Trainer classes.
So this structure is not a very strict requirement. The focus in this test here is only on the "info" key, though.

assert key in train_results, \
f"'{key}' not found in `train_results` ({train_results})!"

is_multi_agent = len(train_results["policy_reward_min"]) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that everything is consistent, can we check the size of the policy dict or something?
seems like a more direct condition than reward_min.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are totally right! We even have a util function that returns is_multiagent based on the config. I'll use that here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if "td_error" in policy_stats:
configured_b = train_results["config"]["train_batch_size"]
actual_b = policy_stats["td_error"].shape[0]
assert (configured_b - actual_b) / actual_b <= 0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!!! :) 👍


def add_learn_on_batch_results(
self,
results: dict,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing.Dict instead of dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sven1977
Copy link
Contributor Author

Hey @gjoliver , could you give this another go? I made a couple of changes as suggested and tests are passing now.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice change.

@sven1977 sven1977 merged commit ed85f59 into ray-project:master Sep 30, 2021
@sven1977 sven1977 deleted the unify_info_learner_stats_and_add_stats_structure_tests branch June 2, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants