-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib]: Rename input_evaluation
to off_policy_estimation_methods
#25107
[RLlib]: Rename input_evaluation
to off_policy_estimation_methods
#25107
Conversation
input evaluation
to off_policy_estimation_methods
input evaluation
to off_policy_estimation_methods
input evaluation
to off_policy_estimation_methods
input_evaluation
to off_policy_estimation_methods
if isinstance(config["input_evaluation"], tuple): | ||
config["input_evaluation"] = list(config["input_evaluation"]) | ||
elif not isinstance(config["input_evaluation"], list): | ||
input_evaluation = config.get("input_evaluation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
rllib/agents/trainer.py
Outdated
if isinstance(config["off_policy_estimation_methods"], tuple): | ||
config["off_policy_estimation_methods"] = list( | ||
config["off_policy_estimation_methods"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ray.rllib.utils import force_list # <- beginning of file
...
config["off_policy_estimation_methods"] = force_list(config["off_policy_estimation_methods"])
This even works if the user only provides a single class (no list).
ImportanceSampling, | ||
WeightedImportanceSampling, | ||
] | ||
self.off_policy_estimation_methods = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason, why we want to change this default?
I'm worried that some users may rely on this being in their results dict and all of a sudden wonder where this data went and how to switch it back on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. It's not needed here as we do this on the eval worker track. Please ignore my comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not sure-Like, we can keep it for backwards compatibility, but if the user doesn't explicitly ask for it, should it be enabled? We can discuss this further in the offline eval worker PR.
@@ -945,7 +941,16 @@ def offline_data( | |||
if actions_in_input_normalized is not None: | |||
self.actions_in_input_normalized = actions_in_input_normalized | |||
if input_evaluation is not None: | |||
self.input_evaluation = input_evaluation | |||
deprecation_warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
rllib/agents/trainer_config.py
Outdated
new="offline_data(off_policy_estimation_methods={})".format( | ||
input_evaluation | ||
), | ||
error=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, TrainerConfig objects are relatively new, so let's make this error=True
.
@@ -134,9 +134,6 @@ def on_train_result(self, *, trainer, result, **kwargs): | |||
# Evaluate every other training iteration (together | |||
# with every other call to Trainer.train()). | |||
"evaluation_interval": args.evaluation_interval, | |||
"evaluation_config": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason, you removed this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting for future reference, but that line of code wasn't actually doing anything, since eval workers don't read from offline data. Will be fixed in another PR soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense. Sampler -> No evaluation. Cool, I think that also answers the question on the default value of off_policy_estimation_methods
being empty list (I think we can leave this as by default (input=sampler), no results are generate anyways).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR @rapotdar ! Just a few tiny nits and 1/2 questions before we can merge.
Thanks a ton for cleaning this up. A nice example of why we should always use names that match the terms wiedly used in the literature. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now. Thanks for answering the questions @rapotdar ! Awesome work.
Ah, sorry, I have to ask for one more thing: Could you quickly check our docs in Then we can merge. Thanks a ton! |
Actually, I think the docs are already fixed. Should be ready to merge. Thank you! |
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.