[RLlib]: Rename `input_evaluation` to `off_policy_estimation_methods` #25107

ghost · 2022-05-23T22:45:33Z

Why are these changes needed?

Related issue number

Checks

[Y] I've run scripts/format.sh to lint the changes in this PR.
[Y] I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [] Unit tests
- Release tests
- This PR is not tested :(

sven1977 · 2022-05-24T14:41:51Z

rllib/agents/trainer.py

-        if isinstance(config["input_evaluation"], tuple):
-            config["input_evaluation"] = list(config["input_evaluation"])
-        elif not isinstance(config["input_evaluation"], list):
+        input_evaluation = config.get("input_evaluation")


sven1977 · 2022-05-24T14:43:04Z

rllib/agents/trainer.py

+        if isinstance(config["off_policy_estimation_methods"], tuple):
+            config["off_policy_estimation_methods"] = list(
+                config["off_policy_estimation_methods"]
+            )


from ray.rllib.utils import force_list # <- beginning of file ... config["off_policy_estimation_methods"] = force_list(config["off_policy_estimation_methods"])

This even works if the user only provides a single class (no list).

sven1977 · 2022-05-24T14:43:48Z

rllib/agents/trainer_config.py

-            ImportanceSampling,
-            WeightedImportanceSampling,
-        ]
+        self.off_policy_estimation_methods = []


Any reason, why we want to change this default?
I'm worried that some users may rely on this being in their results dict and all of a sudden wonder where this data went and how to switch it back on.

Got it. It's not needed here as we do this on the eval worker track. Please ignore my comment above.

Actually, I'm not sure-Like, we can keep it for backwards compatibility, but if the user doesn't explicitly ask for it, should it be enabled? We can discuss this further in the offline eval worker PR.

sven1977 · 2022-05-24T14:44:19Z

rllib/agents/trainer_config.py

@@ -945,7 +941,16 @@ def offline_data(
        if actions_in_input_normalized is not None:
            self.actions_in_input_normalized = actions_in_input_normalized
        if input_evaluation is not None:
-            self.input_evaluation = input_evaluation
+            deprecation_warning(


sven1977 · 2022-05-24T14:44:43Z

rllib/agents/trainer_config.py

+                new="offline_data(off_policy_estimation_methods={})".format(
+                    input_evaluation
+                ),
+                error=False,


Actually, TrainerConfig objects are relatively new, so let's make this error=True.

sven1977 · 2022-05-24T14:45:28Z

rllib/examples/parallel_evaluation_and_training.py

@@ -134,9 +134,6 @@ def on_train_result(self, *, trainer, result, **kwargs):
        # Evaluate every other training iteration (together
        # with every other call to Trainer.train()).
        "evaluation_interval": args.evaluation_interval,
-        "evaluation_config": {


Any reason, you removed this here?

Commenting for future reference, but that line of code wasn't actually doing anything, since eval workers don't read from offline data. Will be fixed in another PR soon.

Yes, makes sense. Sampler -> No evaluation. Cool, I think that also answers the question on the default value of off_policy_estimation_methods being empty list (I think we can leave this as by default (input=sampler), no results are generate anyways).

sven1977

Awesome PR @rapotdar ! Just a few tiny nits and 1/2 questions before we can merge.
Thanks a ton for cleaning this up. A nice example of why we should always use names that match the terms wiedly used in the literature. :)

sven1977

Looks good now. Thanks for answering the questions @rapotdar ! Awesome work.

sven1977 · 2022-05-25T11:23:52Z

Ah, sorry, I have to ask for one more thing: Could you quickly check our docs in ray.docs.source.rllib and search for the section on offline RL training? In there, please also change the names and make sure that everything is still described in the correct way.

Then we can merge.

Thanks a ton!

ghost · 2022-05-25T23:11:37Z

Actually, I think the docs are already fixed. Should be ready to merge.

Thank you!

Rohan138 added 2 commits May 23, 2022 15:32

Renamed input_evaluation to off_policy_estimation_methods

e3e499d

Fix lint

2dcf6aa

ghost requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and kouroshHakha as code owners May 23, 2022 22:45

ghost changed the title ~~Rename input evaluation to off_policy_estimation_methods~~ [RLlib]: Rename input evaluation to off_policy_estimation_methods May 23, 2022

ghost changed the title ~~[RLlib]: Rename input evaluation to off_policy_estimation_methods~~ [RLlib]: Rename input_evaluation to off_policy_estimation_methods May 23, 2022

Rohan138 added 2 commits May 23, 2022 16:22

Disable IS/WIS by default in trainer_config.py

b65a7a3

Fix example

347cde5

sven1977 reviewed May 24, 2022

View reviewed changes

Rohan138 and others added 2 commits May 24, 2022 13:46

Minor changes

9b989ce

Merge branch 'ray-project:master' into rename_input_evaluation

8f602f2

sven1977 approved these changes May 25, 2022

View reviewed changes

sven1977 merged commit ab81c8e into ray-project:master May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib]: Rename `input_evaluation` to `off_policy_estimation_methods` #25107

[RLlib]: Rename `input_evaluation` to `off_policy_estimation_methods` #25107

ghost commented May 23, 2022

sven1977 May 24, 2022

sven1977 May 24, 2022

sven1977 May 24, 2022

sven1977 May 24, 2022

ghost May 24, 2022 •

edited by ghost

Loading

sven1977 May 24, 2022

sven1977 May 24, 2022

sven1977 May 24, 2022

ghost May 24, 2022

sven1977 May 25, 2022

sven1977 left a comment

sven1977 left a comment

sven1977 commented May 25, 2022 •

edited

Loading

ghost commented May 25, 2022

[RLlib]: Rename input_evaluation to off_policy_estimation_methods #25107

[RLlib]: Rename input_evaluation to off_policy_estimation_methods #25107

Conversation

ghost commented May 23, 2022

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost May 24, 2022 • edited by ghost Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

sven1977 commented May 25, 2022 • edited Loading

ghost commented May 25, 2022

[RLlib]: Rename `input_evaluation` to `off_policy_estimation_methods` #25107

[RLlib]: Rename `input_evaluation` to `off_policy_estimation_methods` #25107

ghost May 24, 2022 •

edited by ghost

Loading

sven1977 commented May 25, 2022 •

edited

Loading