-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073
[RLlib] Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger). #45073
Conversation
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…lete_metrics_and_stats_do_over
Signed-off-by: sven1977 <[email protected]>
…lete_metrics_and_stats_do_over
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_03 # Conflicts: # rllib/examples/multi_agent/multi_agent_pendulum.py and wip Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…lete_metrics_and_stats_do_over
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
…nup_examples_folder_03
Signed-off-by: sven1977 <[email protected]>
Co-authored-by: angelinalg <[email protected]> Signed-off-by: Sven Mika <[email protected]>
Signed-off-by: sven1977 <[email protected]>
… cleanup_examples_folder_03
Signed-off-by: sven1977 <[email protected]>
…ics_do_over_04_env_rendering_example_script Signed-off-by: sven1977 <[email protected]> # Conflicts: # doc/source/rllib/package_ref/learner.rst # rllib/utils/metrics/metrics_logger.py # rllib/utils/metrics/stats.py
does not seem to require doc owners approval. I am adding @angelinalg explicitly (if this needs review on the example's wording) |
Signed-off-by: sven1977 <[email protected]>
…ics_do_over_04_env_rendering_example_script
Signed-off-by: sven1977 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple of questions. Some nits and nuts.
done = self.cur_pos >= self.end_pos or truncated | ||
return [self.cur_pos], 10.0 if done else -0.1, done, truncated, {} | ||
|
||
def render(self, mode="rgb"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason why we diverge here from the standard gymnasium
API? The API does
- Not receive any parameters anymore, i.e.
def render(self)
- Does return type:
RenderFrame | list[RenderFrame] | None
but notbool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sorry, this was a left-over from the next PR. This entire script has been re-done and packed into a separate PR. Removed from this one.
""" | ||
# If we have a vector env, only render the sub-env at index 0. | ||
if isinstance(env.unwrapped, gym.vector.VectorEnv): | ||
image = env.envs[0].render() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a np.ndarray(shape(K, H, W, 3))
with K
the number of images?
And if we have a vector environment, how does this look like in Tensorboard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like however that we use the standard API of gymnasium
here and the user just needs to override this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, me, too. This was a total mess in the old RLlib, with the render_env
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: Imagine a user needs to return a couple of images in array form. Does she return then a
np.ndarray(shape(K, H, W, 3))
withK
the number of images?
Correct (I think this is even from your own PR a while back, where you enabled the WandB logger to actually log images and videos):
- shape=3D -> single image e.g. [c, h, w]
- shape=4D -> n images (all of the same size) e.g. [N, c, h, w]
- shape=5D -> 1 video with shape [L, c, h, w]
c=channels, L=length of video, w=width, h=height
And if we have a vector environment, how does this look like in Tensorboard?
No idea. If you chose reduce=None
, then all logged videos/images will be placed in a list(!), not batched and thus WandB will display them as separate images/videos. This having a list (as opposed to a batched array) is a must for videos as they may have different lengths.
# Create a video from the images by simply stacking them. | ||
video = np.expand_dims( | ||
np.stack(images, axis=0), axis=0 | ||
) # TODO: test to make sure WandB properly logs videos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO is important. And we should ping the Tune team to take a look into the WandB logger when using schedulers or large checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed confusing TODO.
# an example on how to use a Viewer object. | ||
return np.random.randint(0, 256, size=(300, 400, 3), dtype=np.uint8) | ||
# Create a video from the images by simply stacking them. | ||
video = np.expand_dims( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a small note, describing the shape of the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# Best video. | ||
metrics_logger.log_value( | ||
"episode_videos_best", | ||
self.best_episode_and_return[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More sense to me makes to log the best of the best and the worst of the horst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Images alone are already large, videos are larger and then multiple of them ... my machine would probably not make it well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why I crunch them down to 64 x 86. True, we should maybe further reduce them on the Algorithm side after the EnvRunner each return their best, but that would require another callback that goes through these videos, compares, filters, etc..
I wanted to avoid this complexity. One could also simply log only the videos from one hard-coded EnvRunner (e.g. index=1) and reduce the number of videos this way. Or only log every nth iteration, etc..
clear_on_reduce=True, | ||
) | ||
# Worst video. | ||
metrics_logger.log_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log_value
at first confused me as I was expecting a single scalar value to be logged, but in comparison to the other logging methods, this makes sense, as we have only a single "variable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, our Tune logging API does NOT allow for specifying any data types (videos, images, etc..) this early. We have to "communicate" with it via the tensor format/shape. So there is only really log_value
nothing else. log_dict
and log_n_dicts
are convenience methods to avoid having to call log_value
a dozen times or so.
@@ -278,6 +278,10 @@ def __init__( | |||
[] if render_images is None else render_images | |||
) | |||
|
|||
# Caches for temporary per-timestep data. May be used to store custom metrics | |||
# from within a callback for the ongoing episode (e.g. render images). | |||
self._temporary_timestep_data = defaultdict(list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, so we have our old episode.media
somehow back :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, yeah, there had to be some temporary data cache available to the users. One possible way to avoid having this at all in the episode would be to tell the user to store these in their custom callbacks instance directly and take care of keeping these clean, but I'm not sure yet. We might deprecate this again, if it turns out that that's the more transparent solution. What I like about the episode storage is that it auto-clears itself once the episode is finalized, making sure the user cannot dump infinite data into an episode and cause leaks.
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Metrics do-over 04: New env rendering/video example script (through custom callbacks using MetricsLogger).
This PR introduces a new example script (and adds it to the CI), which
MetricsLogger
available on the EnvRunnersWhy are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.