[RLlib] `before_sub_environment_reset()` callback enhancements (add `next_episode` arg). #28600

sven1977 · 2022-09-19T11:44:49Z

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

…re_sub_env_reset_callback Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/callbacks.py # rllib/algorithms/tests/test_callbacks.py # rllib/evaluation/env_runner_v2.py # rllib/evaluation/sampler.py

gjoliver · 2022-09-19T16:41:55Z

rllib/evaluation/env_runner_v2.py

@@ -791,6 +792,8 @@ def _handle_done_episode(
                    env_id
                ],
                env_index=env_id,
+                # Create new episode under this env_id.
+                next_episode=self._active_episodes[env_id],


ok, I chatted with wes. if this is what they need, we don't really need to introduce a new callback, since doing self._active_episodes[env_id] here would trigger the on_episode_start() call.
essentially, what we are doing here is moving the on_episode_start() call to happen before reset().
so all we have to do is to make the self._active_episodes[env_id] call here, with a comment that says Assign the policy mapping for the next episode before env.reset() call.
does this make sense?

As per our discussion with Wes yesterday.

We add a new callback: on_episode_created, which is triggered right after the Episode(V2)? instance has been instantiated, but before(!) the sub-environment is reset.

Then we reset the sub-environment.

Then we trigger on_episode_start (like we did before). This way the on_episode_start behavior remains unaltered/no API changes.

Signed-off-by: sven1977 <[email protected]>

…re_sub_env_reset_callback

Signed-off-by: sven1977 <[email protected]>

sven1977 · 2022-09-20T10:09:33Z

rllib/algorithms/callbacks.py

@@ -108,26 +108,36 @@ def on_sub_environment_created(
        pass

    @OverrideToImplementCustomLogic
-    def before_sub_environment_reset(
+    def on_episode_created(


Renamed the callback as discussed.

sven1977 · 2022-09-20T10:09:41Z

rllib/algorithms/callbacks.py

        self,
        *,
        worker: "RolloutWorker",
-        sub_environment: EnvType,
+        base_env: BaseEnv,


Same signature as on_episode_start

sven1977 · 2022-09-20T10:10:18Z

rllib/algorithms/tests/test_callbacks.py

@@ -131,15 +141,17 @@ def test_before_sub_environment_reset(self):
            .callbacks(BeforeSubEnvironmentResetCallback)
        )

-        for _ in framework_iterator(config, frameworks=("tf", "torch")):
+        # Test with and without Connectors.


Testing w/ and w/o connectors more important than different frameworks (env-runners are framework-agnostic).

sven1977 · 2022-09-20T10:10:43Z

rllib/evaluation/env_runner_v2.py

@@ -394,15 +396,12 @@ def run(self) -> Iterator[SampleBatchType]:
            and other fields as dictated by `policy`.
        """
        # Before the very first poll (this will reset all vector sub-environments):
-        # Call custom `before_sub_environment_reset` callbacks for all sub-environments.
+        # Create all upcoming episodes and call `on_episode_created` callbacks for


Create initial episodes and do callbacks.

sven1977 · 2022-09-20T10:11:04Z

rllib/evaluation/env_runner_v2.py

@@ -854,10 +850,35 @@ def _handle_done_episode(
            # Step after adding initial obs. This will give us 0 env and agent step.
            new_episode.step()

+    def create_episode(self, env_id: EnvID) -> EpisodeV2:


sven1977 · 2022-09-20T10:12:07Z

rllib/evaluation/sampler.py

@@ -856,7 +831,11 @@ def _process_observations(
            # This will be filled with dummy observations below.
            all_agents_obs = {}

-        if not is_new_episode:
+        # If this episode is brand-new, call the episode start callback(s).


Needed to add this flag here to the old Episode class. Otherwise, we would never start increasing the length property. This is not a problem for EpisodeV2.

sven1977 · 2022-09-20T10:13:06Z

rllib/evaluation/env_runner_v2.py

+        """
+        # Create a new episode under the same `env_id` and call the
+        # `on_episode_created` callbacks.
+        new_episode = self._active_episodes[env_id]


Ideally, I would like to get rid of active_episodes being a default_dict. I think it adds a lot of confusion here, making things happen under the hood w/o tight control by the env_runner itself.

Signed-off-by: sven1977 <[email protected]>

…re_sub_env_reset_callback

Signed-off-by: sven1977 <[email protected]>

gjoliver

cool, still have some minor questions / comments.
looking solid.

gjoliver · 2022-09-21T07:36:09Z

rllib/algorithms/callbacks.py

-        This method gets called before every `try_reset()` is called by RLlib
-        on a sub-environment (usually a gym.Env). This includes the very first (initial)
-        reset performed on each sub-environment.
+        episode


is this a typo? should remove?

can I suggest we also update the doc string for on_episode_start() to point out the difference between it and on_episode_created()? basically, on_episode_start() gets called after base_env.try_reset() is done.
since users may be reading about on_episode_start() without noticing the details here.

Yeah, sorry, a leftover. :) Removed.

Cleaned up both docstrings and added exact sequence of events to both.

gjoliver · 2022-09-21T07:47:03Z

rllib/evaluation/env_runner_v2.py

@@ -530,6 +516,10 @@ def _process_observations(
                continue

            episode: EpisodeV2 = self._active_episodes[env_id]
+            # If this episode is brand-new, call the episode start callback(s).
+            # Note: EpisodeV2s are initialized with length=-1 (before the reset).
+            if episode.length == -1:


can I suggest we don't rely on the internals of EpisodeV2 directly?
it will be better if we create an API on EpisodeV2 similar to Episode

@property def started(self) -> bool: return bool(self._has_init_obs)

then we can do:

if not episode.started: self._call_on_episode_start(episode, env_id)

it's likely safer if we rely on self._has_init_obs rather than the initial value of length?

another quick question is why do we do this here.
why don't we self._call_on_episode_start(episode, env_id) right after the reset() op is done?
although we may need to do this at a couple of places, I feel like it's mentally easier if they happen one after another right away?

I think we have to do this here due to the different behavior of ray-remote envs (which return a ASYNC_RESET_RETURN upon reset and only publish those reset results via the next poll call, other than "normal", non-remote envs).

We do next_episode_length = episode.length + 1 right after this, though :)
Also accessing the episode's internal properties.

Either way: Fixed it, added the suggested property.

huh, remote env ... 😢
thanks for the fix though. feels a bit better we don't rely on that -1.

…re_sub_env_reset_callback

Signed-off-by: sven1977 <[email protected]>

gjoliver

thanks for powering through all the changes :)

gjoliver · 2022-09-21T18:01:38Z

a lot of tests failing though, they look related.

Signed-off-by: sven1977 <[email protected]>

…re_sub_env_reset_callback

sven1977 added 7 commits September 16, 2022 11:22

wip

c3b253f

Signed-off-by: sven1977 <[email protected]>

wip

2233bf0

Signed-off-by: sven1977 <[email protected]>

wip

970dcda

Signed-off-by: sven1977 <[email protected]>

wip

0331289

Signed-off-by: sven1977 <[email protected]>

wip

1348cfa

Signed-off-by: sven1977 <[email protected]>

wip

8bb3078

Signed-off-by: sven1977 <[email protected]>

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and krfricke as code owners September 19, 2022 11:44

sven1977 assigned gjoliver Sep 19, 2022

gjoliver requested changes Sep 19, 2022

View reviewed changes

sven1977 added 3 commits September 20, 2022 11:45

wip

68acf7e

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into befo…

4ba31b9

…re_sub_env_reset_callback

wip

47855af

Signed-off-by: sven1977 <[email protected]>

sven1977 commented Sep 20, 2022

View reviewed changes

sven1977 added 4 commits September 20, 2022 19:28

wip

bb7e29f

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into befo…

a4926e7

…re_sub_env_reset_callback

wip

f30f03a

Signed-off-by: sven1977 <[email protected]>

wip

bb5feb8

Signed-off-by: sven1977 <[email protected]>

gjoliver reviewed Sep 21, 2022

View reviewed changes

sven1977 added 5 commits September 21, 2022 10:15

Merge branch 'master' of https://github.com/ray-project/ray into befo…

2b8251d

…re_sub_env_reset_callback

wip

c81aad3

Signed-off-by: sven1977 <[email protected]>

wip

675360d

Signed-off-by: sven1977 <[email protected]>

wip

9df5c93

Signed-off-by: sven1977 <[email protected]>

wip

9b7d46a

Signed-off-by: sven1977 <[email protected]>

gjoliver approved these changes Sep 21, 2022

View reviewed changes

sven1977 added 3 commits September 22, 2022 14:14

wip

e520c41

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into befo…

63e8753

…re_sub_env_reset_callback

Merge branch 'master' of https://github.com/ray-project/ray into befo…

489b935

…re_sub_env_reset_callback

sven1977 merged commit a47adb9 into ray-project:master Sep 23, 2022

rickyyx mentioned this pull request Sep 26, 2022

[core][release] dask_on_ray_large_scale_test_no_spilling failed with RayActorError on low memory #28778

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] `before_sub_environment_reset()` callback enhancements (add `next_episode` arg). #28600

[RLlib] `before_sub_environment_reset()` callback enhancements (add `next_episode` arg). #28600

sven1977 commented Sep 19, 2022 •

edited

Loading

gjoliver Sep 19, 2022

sven1977 Sep 20, 2022

sven1977 Sep 20, 2022

sven1977 Sep 20, 2022

sven1977 Sep 20, 2022

sven1977 Sep 20, 2022

sven1977 Sep 20, 2022

sven1977 Sep 20, 2022

sven1977 Sep 20, 2022

gjoliver left a comment

gjoliver Sep 21, 2022

gjoliver Sep 21, 2022

sven1977 Sep 21, 2022

gjoliver Sep 21, 2022

gjoliver Sep 21, 2022

sven1977 Sep 21, 2022

sven1977 Sep 21, 2022

gjoliver Sep 21, 2022

gjoliver left a comment

gjoliver commented Sep 21, 2022

[RLlib] before_sub_environment_reset() callback enhancements (add next_episode arg). #28600

[RLlib] before_sub_environment_reset() callback enhancements (add next_episode arg). #28600

Conversation

sven1977 commented Sep 19, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjoliver left a comment

Choose a reason for hiding this comment

gjoliver commented Sep 21, 2022

[RLlib] `before_sub_environment_reset()` callback enhancements (add `next_episode` arg). #28600

[RLlib] `before_sub_environment_reset()` callback enhancements (add `next_episode` arg). #28600

sven1977 commented Sep 19, 2022 •

edited

Loading