Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib]: Rename input_evaluation to off_policy_estimation_methods #25107

Merged
merged 6 commits into from
May 27, 2022

Conversation

ghost
Copy link

@ghost ghost commented May 23, 2022

Why are these changes needed?

Related issue number

Checks

  • [Y] I've run scripts/format.sh to lint the changes in this PR.
  • [Y] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [] Unit tests
    • Release tests
    • This PR is not tested :(

@ghost ghost changed the title Rename input evaluation to off_policy_estimation_methods [RLlib]: Rename input evaluation to off_policy_estimation_methods May 23, 2022
@ghost ghost changed the title [RLlib]: Rename input evaluation to off_policy_estimation_methods [RLlib]: Rename input_evaluation to off_policy_estimation_methods May 23, 2022
if isinstance(config["input_evaluation"], tuple):
config["input_evaluation"] = list(config["input_evaluation"])
elif not isinstance(config["input_evaluation"], list):
input_evaluation = config.get("input_evaluation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Comment on lines 1876 to 1879
if isinstance(config["off_policy_estimation_methods"], tuple):
config["off_policy_estimation_methods"] = list(
config["off_policy_estimation_methods"]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ray.rllib.utils import force_list  # <- beginning of file
...
config["off_policy_estimation_methods"] = force_list(config["off_policy_estimation_methods"])

This even works if the user only provides a single class (no list).

ImportanceSampling,
WeightedImportanceSampling,
]
self.off_policy_estimation_methods = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason, why we want to change this default?
I'm worried that some users may rely on this being in their results dict and all of a sudden wonder where this data went and how to switch it back on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It's not needed here as we do this on the eval worker track. Please ignore my comment above.

Copy link
Author

@ghost ghost May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not sure-Like, we can keep it for backwards compatibility, but if the user doesn't explicitly ask for it, should it be enabled? We can discuss this further in the offline eval worker PR.

@@ -945,7 +941,16 @@ def offline_data(
if actions_in_input_normalized is not None:
self.actions_in_input_normalized = actions_in_input_normalized
if input_evaluation is not None:
self.input_evaluation = input_evaluation
deprecation_warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

new="offline_data(off_policy_estimation_methods={})".format(
input_evaluation
),
error=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, TrainerConfig objects are relatively new, so let's make this error=True.

@@ -134,9 +134,6 @@ def on_train_result(self, *, trainer, result, **kwargs):
# Evaluate every other training iteration (together
# with every other call to Trainer.train()).
"evaluation_interval": args.evaluation_interval,
"evaluation_config": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason, you removed this here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting for future reference, but that line of code wasn't actually doing anything, since eval workers don't read from offline data. Will be fixed in another PR soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. Sampler -> No evaluation. Cool, I think that also answers the question on the default value of off_policy_estimation_methods being empty list (I think we can leave this as by default (input=sampler), no results are generate anyways).

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR @rapotdar ! Just a few tiny nits and 1/2 questions before we can merge.
Thanks a ton for cleaning this up. A nice example of why we should always use names that match the terms wiedly used in the literature. :)

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks for answering the questions @rapotdar ! Awesome work.

@sven1977
Copy link
Contributor

sven1977 commented May 25, 2022

Ah, sorry, I have to ask for one more thing: Could you quickly check our docs in ray.docs.source.rllib and search for the section on offline RL training? In there, please also change the names and make sure that everything is still described in the correct way.

Then we can merge.

Thanks a ton!

@ghost
Copy link
Author

ghost commented May 25, 2022

Actually, I think the docs are already fixed. Should be ready to merge.

Thank you!

@sven1977 sven1977 merged commit ab81c8e into ray-project:master May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants