[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

sven1977 · 2024-05-01T16:12:10Z

Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger).

This PR introduces a new example script (and adds it to the CI), which

defines a custom callback
renders the env on all EnvRunners and stores images temporarily inside the episode
compiles a video of the finished episode
logs the video of the best and worst performing episodes (per iteration) via the MetricsLogger available on the EnvRunners
Shows how the videos can be viewed through a simple WandB (Tune) setup.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

…lete_metrics_and_stats_do_over

Signed-off-by: sven1977 <[email protected]>

…lete_metrics_and_stats_do_over

Signed-off-by: sven1977 <[email protected]>

…nup_examples_folder_03 # Conflicts: # rllib/examples/multi_agent/multi_agent_pendulum.py and wip Signed-off-by: sven1977 <[email protected]>

Signed-off-by: sven1977 <[email protected]>

…lete_metrics_and_stats_do_over

Signed-off-by: sven1977 <[email protected]>

…nup_examples_folder_03

Signed-off-by: sven1977 <[email protected]>

Co-authored-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]>

Signed-off-by: sven1977 <[email protected]>

… cleanup_examples_folder_03

Signed-off-by: sven1977 <[email protected]>

…ics_do_over_04_env_rendering_example_script Signed-off-by: sven1977 <[email protected]> # Conflicts: # doc/source/rllib/package_ref/learner.rst # rllib/utils/metrics/metrics_logger.py # rllib/utils/metrics/stats.py

aslonnie · 2024-05-02T01:44:21Z

does not seem to require doc owners approval. I am adding @angelinalg explicitly (if this needs review on the example's wording)

Signed-off-by: sven1977 <[email protected]>

…ics_do_over_04_env_rendering_example_script

Signed-off-by: sven1977 <[email protected]>

simonsays1980

LGTM. A couple of questions. Some nits and nuts.

simonsays1980 · 2024-05-02T11:24:22Z

rllib/examples/envs/custom_env_render_method.py

+        done = self.cur_pos >= self.end_pos or truncated
+        return [self.cur_pos], 10.0 if done else -0.1, done, truncated, {}
+
+    def render(self, mode="rgb"):


Is there any particular reason why we diverge here from the standard gymnasium API? The API does

Not receive any parameters anymore, i.e. def render(self)

Does return type: RenderFrame | list[RenderFrame] | None but not bool.

So sorry, this was a left-over from the next PR. This entire script has been re-done and packed into a separate PR. Removed from this one.

simonsays1980 · 2024-05-02T11:29:37Z

rllib/examples/envs/env_rendering_and_recording.py

+        """
+        # If we have a vector env, only render the sub-env at index 0.
+        if isinstance(env.unwrapped, gym.vector.VectorEnv):
+            image = env.envs[0].render()


Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a np.ndarray(shape(K, H, W, 3)) with K the number of images?

And if we have a vector environment, how does this look like in Tensorboard?

I like however that we use the standard API of gymnasium here and the user just needs to override this.

Yes, me, too. This was a total mess in the old RLlib, with the render_env option.

Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a np.ndarray(shape(K, H, W, 3)) with K the number of images?

Correct (I think this is even from your own PR a while back, where you enabled the WandB logger to actually log images and videos):

shape=3D -> single image e.g. [c, h, w]

shape=4D -> n images (all of the same size) e.g. [N, c, h, w]

shape=5D -> 1 video with shape [L, c, h, w]

c=channels, L=length of video, w=width, h=height

And if we have a vector environment, how does this look like in Tensorboard?

No idea. If you chose reduce=None, then all logged videos/images will be placed in a list(!), not batched and thus WandB will display them as separate images/videos. This having a list (as opposed to a batched array) is a must for videos as they may have different lengths.

simonsays1980 · 2024-05-02T11:31:48Z

rllib/examples/envs/env_rendering_and_recording.py

+            # Create a video from the images by simply stacking them.
+            video = np.expand_dims(
+                np.stack(images, axis=0), axis=0
+            )  # TODO: test to make sure WandB properly logs videos.


This TODO is important. And we should ping the Tune team to take a look into the WandB logger when using schedulers or large checkpoints.

Ah, no, this works already totally fine. Let me remove this ....

removed confusing TODO.

simonsays1980 · 2024-05-02T11:32:09Z

rllib/examples/envs/env_rendering_and_recording.py

-        # an example on how to use a Viewer object.
-        return np.random.randint(0, 256, size=(300, 400, 3), dtype=np.uint8)
+            # Create a video from the images by simply stacking them.
+            video = np.expand_dims(


Can we add a small note, describing the shape of the array?

simonsays1980 · 2024-05-02T11:34:02Z

rllib/examples/envs/env_rendering_and_recording.py

+        # Best video.
+        metrics_logger.log_value(
+            "episode_videos_best",
+            self.best_episode_and_return[0],


More sense to me makes to log the best of the best and the worst of the horst.

Images alone are already large, videos are larger and then multiple of them ... my machine would probably not make it well.

Yeah, that's why I crunch them down to 64 x 86. True, we should maybe further reduce them on the Algorithm side after the EnvRunner each return their best, but that would require another callback that goes through these videos, compares, filters, etc..
I wanted to avoid this complexity. One could also simply log only the videos from one hard-coded EnvRunner (e.g. index=1) and reduce the number of videos this way. Or only log every nth iteration, etc..

simonsays1980 · 2024-05-02T11:35:47Z

rllib/examples/envs/env_rendering_and_recording.py

+            clear_on_reduce=True,
+        )
+        # Worst video.
+        metrics_logger.log_value(


The log_value at first confused me as I was expecting a single scalar value to be logged, but in comparison to the other logging methods, this makes sense, as we have only a single "variable"

Yeah, our Tune logging API does NOT allow for specifying any data types (videos, images, etc..) this early. We have to "communicate" with it via the tensor format/shape. So there is only really log_valuenothing else. log_dict and log_n_dicts are convenience methods to avoid having to call log_value a dozen times or so.

simonsays1980 · 2024-05-02T11:38:00Z

rllib/env/multi_agent_episode.py

@@ -278,6 +278,10 @@ def __init__(
            [] if render_images is None else render_images
        )

+        # Caches for temporary per-timestep data. May be used to store custom metrics
+        # from within a callback for the ongoing episode (e.g. render images).
+        self._temporary_timestep_data = defaultdict(list)


Awesome, so we have our old episode.media somehow back :)

Haha, yeah, there had to be some temporary data cache available to the users. One possible way to avoid having this at all in the episode would be to tell the user to store these in their custom callbacks instance directly and take care of keeping these clean, but I'm not sure yet. We might deprecate this again, if it turns out that that's the more transparent solution. What I like about the episode storage is that it auto-clears itself once the episode is finalized, making sure the user cannot dump infinite data into an episode and cause leaks.

Signed-off-by: sven1977 <[email protected]>

sven1977 and others added 30 commits March 13, 2024 12:50

wip

6f1b505

Signed-off-by: sven1977 <[email protected]>

wip

b6e2714

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into comp…

4dfb2ce

…lete_metrics_and_stats_do_over

wip

e6402c6

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into comp…

33487cc

…lete_metrics_and_stats_do_over

doctest fix

a02abbd

Signed-off-by: sven1977 <[email protected]>

wip

d9f3e6e

Signed-off-by: sven1977 <[email protected]>

wip

e909a73

Signed-off-by: sven1977 <[email protected]>

wip

bdaa04c

Signed-off-by: sven1977 <[email protected]>

wip

52d9e12

Signed-off-by: sven1977 <[email protected]>

wip

f77ffdb

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

81c4c79

…nup_examples_folder_03 # Conflicts: # rllib/examples/multi_agent/multi_agent_pendulum.py and wip Signed-off-by: sven1977 <[email protected]>

wip

1672675

Signed-off-by: sven1977 <[email protected]>

wip

adf9e8c

Signed-off-by: sven1977 <[email protected]>

wip

c9e5c2f

Signed-off-by: sven1977 <[email protected]>

wip

683bc4b

Signed-off-by: sven1977 <[email protected]>

wip

5bd220f

Signed-off-by: sven1977 <[email protected]>

LINT

d931945

Signed-off-by: sven1977 <[email protected]>

wip

e9888de

Signed-off-by: sven1977 <[email protected]>

fixes

0e97d8f

Signed-off-by: sven1977 <[email protected]>

fixes

7584cce

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into comp…

5ba69af

…lete_metrics_and_stats_do_over

wip

36bfa57

Signed-off-by: sven1977 <[email protected]>

fixes

5565d4f

Signed-off-by: sven1977 <[email protected]>

fixes

27c793d

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

d75b31a

…nup_examples_folder_03

fixes

bf9cef0

Signed-off-by: sven1977 <[email protected]>

Apply suggestions from code review

872b49b

Co-authored-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]>

fixes

743dabd

Signed-off-by: sven1977 <[email protected]>

Merge remote-tracking branch 'origin/cleanup_examples_folder_03' into…

d50a39c

… cleanup_examples_folder_03

sven1977 assigned simonsays1980 and sven1977 and unassigned sven1977 May 1, 2024

sven1977 marked this pull request as ready for review May 1, 2024 16:56

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha, simonsays1980 and a team as code owners May 1, 2024 16:56

sven1977 added 2 commits May 1, 2024 21:13

wip

a86d05c

Signed-off-by: sven1977 <[email protected]>

aslonnie requested review from angelinalg and removed request for a team May 2, 2024 01:43

sven1977 added 3 commits May 2, 2024 12:07

wip

f7398dc

Signed-off-by: sven1977 <[email protected]>

Merge branch 'master' of https://github.com/ray-project/ray into metr…

4be4d12

…ics_do_over_04_env_rendering_example_script

wip

156b8c6

Signed-off-by: sven1977 <[email protected]>

simonsays1980 approved these changes May 2, 2024

View reviewed changes

sven1977 added 5 commits May 2, 2024 14:19

LINT

ab4ddba

Signed-off-by: sven1977 <[email protected]>

fixes and LINT

1d99de8

Signed-off-by: sven1977 <[email protected]>

fixes and LINT

d242b14

Signed-off-by: sven1977 <[email protected]>

fix

d08c0e5

Signed-off-by: sven1977 <[email protected]>

fix

311ea3f

Signed-off-by: sven1977 <[email protected]>

sven1977 merged commit afba35d into ray-project:master May 2, 2024
5 checks passed

sven1977 deleted the metrics_do_over_04_env_rendering_example_script branch May 2, 2024 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

sven1977 commented May 1, 2024 •

edited

Loading

aslonnie commented May 2, 2024

simonsays1980 left a comment

simonsays1980 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024

sven1977 May 2, 2024 •

edited

Loading

simonsays1980 May 2, 2024

sven1977 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024 •

edited

Loading

simonsays1980 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

sven1977 May 2, 2024

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

Conversation

sven1977 commented May 1, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

aslonnie commented May 2, 2024

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented May 1, 2024 •

edited

Loading

sven1977 May 2, 2024 •

edited

Loading

sven1977 May 2, 2024 •

edited

Loading