[RLlib]: Move OPE to evaluation config #25911

Rohan138 · 2022-06-18T01:44:34Z

This PR moves the OPE methods to the evaluation config (and thus the evaluation workers).
Note that I had to remove the OPE from some of the CQL and MARWIL tests because they use evaluation input = "sampler", so using OPE is redundant. However, user code using config.input_evaluation should still work with a deprecation warning.

Previous PR: #25899 (already merged)
Next PR: #26279

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…fixes

sven1977 · 2022-06-21T13:53:08Z

rllib/algorithms/algorithm_config.py

@@ -841,6 +842,18 @@ def evaluation(
                IMPORTANT NOTE: Policy gradient algorithms are able to find the optimal
                policy, even if this is a stochastic one. Setting "explore=False" here
                will result in the evaluation workers not using this optimal policy!
+            off_policy_estimation_methods: Specify how to evaluate the current policy,


Could we produce a deprecation error if a user still uses either the old input_evaluation OR the new off_policy_estimation_methods inside the offline_data() method?

sven1977 · 2022-06-21T13:54:46Z

rllib/algorithms/cql/tests/test_cql.py

@@ -45,14 +44,12 @@ def test_cql_compilation(self):
                env="Pendulum-v1",
            )
            .offline_data(
-                input_=[data_file],
+                input_=data_file,


Just making sure: Is this properly described in the docs?
Which options do the users have here (and did this change recently)?
List of str, str, "datasets", "sampler", etc..

We currently support str, List of str, Dict of different inputs, dataset, sampler, and a few others. Str and list of str are both backed by JsonReader so there's no difference for this example, but we should migrate all of these to DatasetReader eventually to avoid the multiple input reader bug I metnioned earlier.

sven1977 · 2022-06-21T13:56:32Z

rllib/evaluation/metrics.py

@@ -147,8 +147,8 @@ def summarize_episodes(
    if new_episodes is None:
        new_episodes = episodes

-    episodes, estimates = _partition(episodes)
-    new_episodes, _ = _partition(new_episodes)
+    episodes, _ = _partition(episodes)


Was this a bug? Can we add a one-line comment on why we are ignoring the estimates on episodes, but not new_episodes.

Yes, we should be running OPE on the new episodes from the current training epoch only, not all of the episodes so far. (One of the next PRs removes this entire section of code, though, so I don't think we need to put a comment)

sven1977 · 2022-06-21T13:56:53Z

rllib/evaluation/rollout_worker.py

-                logger.warning(
-                    "Requested 'simulation' input evaluation method: "
-                    "will discard all sampler outputs and keep only metrics."
+            if method_type == "simulation":


sven1977 · 2022-06-21T13:57:37Z

rllib/evaluation/rollout_worker.py

@@ -761,7 +759,7 @@ def wrap(env):
            else:
                raise ValueError(
                    f"Unknown off_policy_estimation type: {method_type}! Must be "
-                    "either `simulation|is|wis|dm|dr` or a sub-class of ray.rllib."
+                    "either a class path or a sub-class of ray.rllib."


Nice. Better to force explicitness and not allow too many shortcut options for the user.

I still have the class path as a TODO, will fix that after the Trainer PR, but agree

sven1977 · 2022-06-21T13:58:14Z

rllib/evaluation/rollout_worker.py

                soft_horizon=soft_horizon,
                no_done_at_end=no_done_at_end,
                observation_fn=observation_fn,
                sample_collector_class=policy_config.get("sample_collector"),
                render=render,
+                blackhole_outputs="simulation" in off_policy_estimation_methods,


Add TODO to deprecate this once we completely disallow "simulation".

Already handled in the next PR since that moves OPE out to the Algorithm from the eval rolloout worker.

what is blackhole_outputs ?

sven1977 · 2022-06-21T13:59:40Z

rllib/offline/estimators/direct_method.py

+    train_test_split_val: float = 0.0,
+    k: int = 0,
+) -> Generator[Tuple[List[SampleBatch]], None, None]:
+    """Utility function that returns either a train/test split or


Nit: Our docstrings should always just have a single sentence one-line at the top, then an empty line, then any other information. E.g.:

def xyz(..): """Some single sentence description. Anything else blabla Args: ... """

sven1977 · 2022-06-21T14:00:07Z

rllib/offline/estimators/direct_method.py

+        `train_test_split_val * n_episodes` episodes and an evaluation batch
+        with `(1 - train_test_split_val) * n_episodes` episodes. If not
+        specified, use `k` for k-fold cross validation instead.
+        k: k-fold cross validation for training model and evaluating OPE.
    Returns:


Not: Empty line before Returns:.

sven1977 · 2022-06-21T14:01:01Z

rllib/offline/estimators/direct_method.py

    return


-@DeveloperAPI
+@ExperimentalAPI


Actually, sorry, but let's not use ExperimentalAPI anymore. We are urged by other libraries teams to move RLlib to the generic Ray API annotations (which doesn't have ExperimentalAPI). Let's leave as DeveloperAPI for now.

sven1977 · 2022-06-21T14:01:29Z

rllib/offline/estimators/doubly_robust.py

 from ray.rllib.utils.typing import SampleBatchType
 from ray.rllib.policy.sample_batch import SampleBatch
 from ray.rllib.utils.numpy import convert_to_numpy
 import numpy as np


-@DeveloperAPI
+@ExperimentalAPI


Same here (see above).

sven1977

Looks great. Thanks for the cleanup @Rohan138 ! Just a few questions and nits to fix.

sven1977 · 2022-06-21T14:01:41Z

rllib/offline/estimators/fqe_torch_model.py

 from ray.rllib.utils.framework import try_import_torch
 from ray.rllib.utils.typing import ModelConfigDict, TensorType

 torch, nn = try_import_torch()


-@DeveloperAPI
+@ExperimentalAPI


sven1977 · 2022-06-21T14:01:49Z

rllib/offline/estimators/importance_sampling.py

 from ray.rllib.utils.typing import SampleBatchType
 from typing import List
 import numpy as np


-@DeveloperAPI
+@ExperimentalAPI


sven1977 · 2022-06-21T14:01:53Z

rllib/offline/estimators/off_policy_estimator.py

    namedtuple("OffPolicyEstimate", ["estimator_name", "metrics"])
 )


-@DeveloperAPI
+@ExperimentalAPI


sven1977 · 2022-06-21T14:01:57Z

rllib/offline/estimators/off_policy_estimator.py

 class OffPolicyEstimator:
    """Interface for an off policy reward estimator."""

-    @DeveloperAPI
+    @ExperimentalAPI


sven1977 · 2022-06-21T14:03:29Z

rllib/tests/test_io.py

-    def test_agent_input_eval_sim(self):
-        for fw in framework_iterator():
+    def test_agent_input_eval_sampler(self):
+        for fw in ["torch"]:


Please use

config = ... for fw in framework_iterator(config, frameworks="torch"):

instead for consistency.

That was supposed to have TF too, fixed now

sven1977 · 2022-06-21T14:04:57Z

rllib/tests/test_io.py

-                    "off_policy_estimation_methods": {
-                        "simulation": {"type": "simulation"}
-                    },
+                    "input": self.test_dir + fw,


Quick question: We still support both a) a dir string (read all JSON files in the dir) and b) a list of filenames, correct?

Yes, we support both.

…on-eval

avnishn

ok this p much looks good to me.

I left quite a few questions, if you could answer those first and then lets move forward.

avnishn · 2022-07-05T21:40:19Z

rllib/algorithms/algorithm.py

+        # Offline RL settings.
+        input_evaluation = config.get("input_evaluation")
+        if input_evaluation is not None and input_evaluation is not DEPRECATED_VALUE:
+            ope_dict = {str(ope): {"type": ope} for ope in input_evaluation}


I maintain that this is a wierd format and we should change this at some point

Agree, I had to use the same format for the q_model as well in the other PR, it's super awkward

avnishn · 2022-07-05T21:47:19Z

rllib/evaluation/rollout_worker.py

+                    old='off_policy_estimation_methods={"simulation"}',
+                    new='input="sampler"',
+                    help="The `simulation` estimation method has been deprecated."
+                    "If you want to run online evaluation on your data, use"


if you say simulation, or sampler, doesn't the RW just rely on the sampler, which could be backed by a dataset or an env?

simulation was backed by AsyncSampler with blackhole_outputs=True, but yes. blackhole_outputs seems to ignore the samples collected by the RW for everything except computing episode reward metrics in metrics.py

avnishn · 2022-07-05T22:12:33Z

rllib/evaluation/rollout_worker.py

                soft_horizon=soft_horizon,
                no_done_at_end=no_done_at_end,
                observation_fn=observation_fn,
                sample_collector_class=policy_config.get("sample_collector"),
                render=render,
+                blackhole_outputs="simulation" in off_policy_estimation_methods,


what is blackhole_outputs ?

avnishn · 2022-07-05T22:13:38Z

rllib/evaluation/worker_set.py

@@ -682,7 +677,7 @@ def valid_module(class_path):
            log_level=config["log_level"],
            callbacks=config["callbacks"],
            input_creator=input_creator,
-            off_policy_estimation_methods=off_policy_estimation_methods,
+            off_policy_estimation_methods=config["off_policy_estimation_methods"],


makes sense to me -- should be able to use OPE even if the input is a sampler (for debug purposes, or if working with a separate eval ds)

Yup, only thing that probably doesn't work currently is mixed input e.g. ("sampler" + json)

avnishn · 2022-07-05T22:14:23Z

rllib/offline/estimators/direct_method.py

@@ -20,23 +20,27 @@
 logger = logging.getLogger()


-@ExperimentalAPI
+@DeveloperAPI
 def train_test_split(


so where does this get invoked, now that we will require train and test datasets to be specified ahead of time?

The whole function gets deleted in the next PR; the user should split them ahead of time, or we can make this a utility in Ray Train/Datasets.

avnishn · 2022-07-08T18:56:50Z

Looks like the failing tests are unrelated. This is ok to merge

Signed-off-by: Stefan van der Kleij <[email protected]>

rapotdar added 10 commits June 17, 2022 14:14

wip

6223a71

wip

1288837

split fixes

40451fa

Fix k

5f9a599

lint

dd0683f

drop k to 2

cb127b1

wip

66208f5

Move OPE to evaluation

bf99b00

Move OPE to evaluation

2a5b931

Fix deprecations

3146176

Rohan138 requested review from sven1977, gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners June 18, 2022 01:44

Rohan138 assigned avnishn and sven1977 Jun 18, 2022

Rohan138 added rllib RLlib related issues rllib-offline-rl Offline RL problems labels Jun 18, 2022

rapotdar added 8 commits June 17, 2022 19:08

Merge branch 'master' of https://github.com/ray-project/ray into ope-…

90a21e2

…fixes

fix train_test_split

d1c5465

Merge branch 'ope-fixes' into ope-on-eval

42e6529

Minor fixes, drop assertions (they only apply for k_fold_cv not splits)

68ce0b4

Merge branch 'ope-fixes' into ope-on-eval

6556fd6

Fix tests

c29424e

Add back deprecated simulation method

447e154

fix tests

89ad211

sven1977 reviewed Jun 21, 2022

View reviewed changes

Rohan138 and others added 5 commits June 21, 2022 12:04

Minor fixes

cf8fc15

fix merge

8a0ef65

Fix iterator

b629db1

Change to DeveloperAPI

e6758e7

Merge branch 'master' of https://github.com/ray-project/ray into ope-…

57ac5bd

…on-eval

This was referenced Jun 29, 2022

[RLlib]: Switch OPE methods to use policy Q-model #26201

Closed

[RLlib]: Fix OPE trainables #26279

Merged

avnishn reviewed Jul 5, 2022

View reviewed changes

avnishn approved these changes Jul 8, 2022

View reviewed changes

richardliaw merged commit 09ce471 into ray-project:master Jul 12, 2022

Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022

[RLlib]: Move OPE to evaluation config (ray-project#25911)

0d0fb17

Signed-off-by: Stefan van der Kleij <[email protected]>

[RLlib]: Move OPE to evaluation config #25911

[RLlib]: Move OPE to evaluation config #25911

Conversation

Rohan138 commented Jun 18, 2022 • edited Loading

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rohan138 Jul 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rohan138 Jul 5, 2022 • edited Loading

Choose a reason for hiding this comment

avnishn commented Jul 8, 2022

Rohan138 commented Jun 18, 2022 •

edited

Loading

sven1977 Jun 21, 2022 •

edited

Loading

Rohan138 Jul 5, 2022 •

edited

Loading

Rohan138 Jul 5, 2022 •

edited

Loading