Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented May 1, 2024

Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger).

This PR introduces a new example script (and adds it to the CI), which

  • defines a custom callback
  • renders the env on all EnvRunners and stores images temporarily inside the episode
  • compiles a video of the finished episode
  • logs the video of the best and worst performing episodes (per iteration) via the MetricsLogger available on the EnvRunners
  • Shows how the videos can be viewed through a simple WandB (Tune) setup.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

sven1977 and others added 30 commits March 13, 2024 12:50
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_03

# Conflicts:
#	rllib/examples/multi_agent/multi_agent_pendulum.py

and wip

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Sven Mika <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 assigned simonsays1980 and sven1977 and unassigned sven1977 May 1, 2024
@sven1977 sven1977 marked this pull request as ready for review May 1, 2024 16:56
Signed-off-by: sven1977 <[email protected]>
…ics_do_over_04_env_rendering_example_script

Signed-off-by: sven1977 <[email protected]>

# Conflicts:
#	doc/source/rllib/package_ref/learner.rst
#	rllib/utils/metrics/metrics_logger.py
#	rllib/utils/metrics/stats.py
@aslonnie aslonnie requested review from angelinalg and removed request for a team May 2, 2024 01:43
@aslonnie
Copy link
Collaborator

aslonnie commented May 2, 2024

does not seem to require doc owners approval. I am adding @angelinalg explicitly (if this needs review on the example's wording)

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of questions. Some nits and nuts.

done = self.cur_pos >= self.end_pos or truncated
return [self.cur_pos], 10.0 if done else -0.1, done, truncated, {}

def render(self, mode="rgb"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason why we diverge here from the standard gymnasium API? The API does

  1. Not receive any parameters anymore, i.e. def render(self)
  2. Does return type: RenderFrame | list[RenderFrame] | None but not bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry, this was a left-over from the next PR. This entire script has been re-done and packed into a separate PR. Removed from this one.

"""
# If we have a vector env, only render the sub-env at index 0.
if isinstance(env.unwrapped, gym.vector.VectorEnv):
image = env.envs[0].render()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a np.ndarray(shape(K, H, W, 3)) with K the number of images?

And if we have a vector environment, how does this look like in Tensorboard?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like however that we use the standard API of gymnasium here and the user just needs to override this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, me, too. This was a total mess in the old RLlib, with the render_env option.

Copy link
Contributor Author

@sven1977 sven1977 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a np.ndarray(shape(K, H, W, 3)) with K the number of images?

Correct (I think this is even from your own PR a while back, where you enabled the WandB logger to actually log images and videos):

  • shape=3D -> single image e.g. [c, h, w]
  • shape=4D -> n images (all of the same size) e.g. [N, c, h, w]
  • shape=5D -> 1 video with shape [L, c, h, w]

c=channels, L=length of video, w=width, h=height

And if we have a vector environment, how does this look like in Tensorboard?

No idea. If you chose reduce=None, then all logged videos/images will be placed in a list(!), not batched and thus WandB will display them as separate images/videos. This having a list (as opposed to a batched array) is a must for videos as they may have different lengths.

# Create a video from the images by simply stacking them.
video = np.expand_dims(
np.stack(images, axis=0), axis=0
) # TODO: test to make sure WandB properly logs videos.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is important. And we should ping the Tune team to take a look into the WandB logger when using schedulers or large checkpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, this works already totally fine. Let me remove this ....

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed confusing TODO.

# an example on how to use a Viewer object.
return np.random.randint(0, 256, size=(300, 400, 3), dtype=np.uint8)
# Create a video from the images by simply stacking them.
video = np.expand_dims(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small note, describing the shape of the array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Best video.
metrics_logger.log_value(
"episode_videos_best",
self.best_episode_and_return[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More sense to me makes to log the best of the best and the worst of the horst.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Images alone are already large, videos are larger and then multiple of them ... my machine would probably not make it well.

Copy link
Contributor Author

@sven1977 sven1977 May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's why I crunch them down to 64 x 86. True, we should maybe further reduce them on the Algorithm side after the EnvRunner each return their best, but that would require another callback that goes through these videos, compares, filters, etc..
I wanted to avoid this complexity. One could also simply log only the videos from one hard-coded EnvRunner (e.g. index=1) and reduce the number of videos this way. Or only log every nth iteration, etc..

clear_on_reduce=True,
)
# Worst video.
metrics_logger.log_value(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log_value at first confused me as I was expecting a single scalar value to be logged, but in comparison to the other logging methods, this makes sense, as we have only a single "variable"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, our Tune logging API does NOT allow for specifying any data types (videos, images, etc..) this early. We have to "communicate" with it via the tensor format/shape. So there is only really log_valuenothing else. log_dict and log_n_dicts are convenience methods to avoid having to call log_value a dozen times or so.

@@ -278,6 +278,10 @@ def __init__(
[] if render_images is None else render_images
)

# Caches for temporary per-timestep data. May be used to store custom metrics
# from within a callback for the ongoing episode (e.g. render images).
self._temporary_timestep_data = defaultdict(list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, so we have our old episode.media somehow back :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah, there had to be some temporary data cache available to the users. One possible way to avoid having this at all in the episode would be to tell the user to store these in their custom callbacks instance directly and take care of keeping these clean, but I'm not sure yet. We might deprecate this again, if it turns out that that's the more transparent solution. What I like about the episode storage is that it auto-clears itself once the episode is finalized, making sure the user cannot dump infinite data into an episode and cause leaks.

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 merged commit afba35d into ray-project:master May 2, 2024
5 checks passed
@sven1977 sven1977 deleted the metrics_do_over_04_env_rendering_example_script branch May 2, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants