-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. #18879
Conversation
…i-gpu-metric-fix
Waiting for tests to pass. This PR does not address the multi-GPU torch race condition yet (@mvindiola1). |
…y_info_learner_stats_and_add_stats_structure_tests
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
def averaged(kv, axis=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, I feel like we are making great progress in cleaning up the APIs :)
awesome change, just a few simple questions.
offset=0, buffer_index=buffer_idx) | ||
learner_info_builder.add_learn_on_batch_results( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just add a comment here pointing out that "multi-gpu for multi-agent" is not supported yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to each location, where we use the new LearnerInfoBuilder explaining that it unifies results dict structures across the different setups (multi-GPU, multi-agent, tf/torch, etc..).
|
||
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID | ||
from ray.rllib.utils.metrics.learner_info import LEARNER_INFO, \ | ||
LEARNER_STATS_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason not to import these at the top of the file?
not important, just a style nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, unfortunately, adding it to the top would lead to a circular import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, that may be a sign that we should break up files / restructure things a little better.
we can do it later though, no need to block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This typically happens when we have a structure like:
base_class_a.py -> imports something from utils
base_class_b.py -> imports something from utils
utils.py -> some simple utils, BUT also contains a useful function (e.g. for testing/debugging) that does need one of a or b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok, get it.
sounds like we just need to break utils.py into base_utils.py and utils.py. terrible at naming, you get the ideas.
rllib/utils/test_utils.py
Outdated
LEARNER_STATS_KEY | ||
|
||
# Assert that all the keys are where we would expect them. | ||
for key in ["info", "hist_stats", "timers", "perf", "episode_reward_mean"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we define this list somewhere as a constant too?
also, should we make sure train_results doesn't contain any keys that we don't know here?
so next time someone adds a new key to the result set, this will make sure they update the known key list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is just a pretty random list of keys that "should" be there, but it's not exhaustive and users can add any other keys to their custom Trainer's results dict.
For this test only, we could try adding more keys to this list to make sure our built-in algos are guaranteeing that at least these keys are always there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more keys, but again, user may add even more in their custom Trainer classes.
So this structure is not a very strict requirement. The focus in this test here is only on the "info" key, though.
rllib/utils/test_utils.py
Outdated
assert key in train_results, \ | ||
f"'{key}' not found in `train_results` ({train_results})!" | ||
|
||
is_multi_agent = len(train_results["policy_reward_min"]) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that everything is consistent, can we check the size of the policy dict or something?
seems like a more direct condition than reward_min.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are totally right! We even have a util function that returns is_multiagent
based on the config. I'll use that here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if "td_error" in policy_stats: | ||
configured_b = train_results["config"]["train_batch_size"] | ||
actual_b = policy_stats["td_error"].shape[0] | ||
assert (configured_b - actual_b) / actual_b <= 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!! :) 👍
rllib/utils/metrics/learner_info.py
Outdated
|
||
def add_learn_on_batch_results( | ||
self, | ||
results: dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typing.Dict instead of dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…y_info_learner_stats_and_add_stats_structure_tests
Hey @gjoliver , could you give this another go? I made a couple of changes as suggested and tests are passing now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice change.
Addressing issue #18116
Unify all RLlib Trainer.train() -> results structures:
The current return values (dict) of Trainer.train() are not consistent between the following different setups:
Most importantly, the returned dict should have an "info" key under which there is a "learner" key (
LEARNER_INFO
), that holds a (multi-agent) policy dict ("default_policy" as the only key in case we don't have a multi-agent setup).Under each policy key, the policy can store its
stats_fn
output (key:LEARNER_STATS
) and the extra batch fetches (e.g. td-errors).For example:
Also added a new structure test to each algorithm's simple compilation test that confirms that this structure is abided to. Since these tests make sure all algos return the same structure across any of the above cases and combinations thereof.
Note that multi-GPU for multi-agent is not yet supported.
#18116
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.