Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune tbxlogger add images #37822

Merged
merged 16 commits into from
Feb 20, 2024

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Jul 26, 2023

Why are these changes needed?

This PR enables users to provide in the result dicitonaries also image arrays that can be presented on TensorBoard like in the following example:

tensorboard

Images can be provided either as singleton in form of an np.ndarray with dimensions (3, H, W) or in form of a 4-d np.ndarray with dimensions (N, 3, H, W) (in this case images get concatenated horizontally).

Related issue number

#21954

As this issue is a P1 Issue that should be fixed within a few weeks rllib RLlib related issues the corresponding solution involves storing the images to the episode.media as this attribute of the episode is not summarized or appended in the metrics.collect_episodes() function.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@simonsays1980
Copy link
Collaborator Author

@Yard1 @krfricke My PR fails because of a flakey test. How can I re-run? Should I just mrge the master when it updates?

@yyyuhan
Copy link

yyyuhan commented Dec 11, 2023

I applied a similar fix with the TensorboardX logger. In my case, the media metrics are being compiled into a list within the summarize_episodes function, as shown here. Consequently, in TBXLoggerCallback, I had to retrieve the last element from this list prior to invoking the add_image method.
Could you recommend a more efficient approach for this situation, similar to how you directly incorporated the image numpy array in your solution? I would appreciate your advice. Thank you.

@JulianLorenz
Copy link

JulianLorenz commented Dec 21, 2023

I also tried to apply a similar modification to TBXLoggerCallback.
I discovered the following issue with image handling:

When the image in episode.media gets processed by the JSONLoggerCallback, significant delays (minutes!) are introduced by the JSON logger. This is because the JSON logger needs to rewrite the log file at each logging call (see #21416) . I worked around this by disabling other Logger Callbacks by setting TUNE_DISABLE_AUTO_CALLBACK_LOGGERS to 1.

Also I think it needs to be decided how to handle the case when images are logged by multiple episodes at one time in summarize_episodes. In my opinion, all images should be kept as it is currently implemented. The user can then decide how to log images across multiple episodes in his algorithm class by adding a on_train_result callback:

        def on_train_result(
                self,
                *,
                algorithm: "Algorithm",
                result: dict,
                **kwargs,
        ) -> None:
            """Called at the end of Algorithm.train().

            Args:
                algorithm: Current Algorithm instance.
                result: Dict of results returned from Algorithm.train() call.
                    You can mutate this object to add additional metrics.
                kwargs: Forward compatibility placeholder.
            """
            if 'trajectory' in result['episode_media'].keys():
                result['episode_media']['myimage'] = result['episode_media']['myimage'][0]

My modification of the TBXLoggerCallback.log_trial_result() function looks like this. I decided to make it more strict with the numpy array, to not automatically interpret any 3D array as image:

            ...
            elif (isinstance(value, list) and len(value) > 0) or (
                    isinstance(value, np.ndarray) and value.size > 0
            ):
                valid_result[full_attr] = value

                # Check for list of images:
                if all(isinstance(v, np.ndarray) and v.ndim == 3 and v.shape[0] in [1, 3] for v in value):
                    if len(value) == 1:
                        # only one image
                        self._trial_writer[trial].add_image(full_attr, value[0], global_step=step)
                    else:
                        # Multiple images - stack them as tensorboard requires
                        imgs = np.stack(value)
                        self._trial_writer[trial].add_images(full_attr, imgs, global_step=step)
                    continue

                # Check for list of videos:
                if all(isinstance(v, np.ndarray) and v.ndim == 5 and v.shape[2] in [1,3] for v in value):
                    video = np.concatenate(value, axis=1)
                    self._trial_writer[trial].add_video(
                        full_attr, video, global_step=step, fps=20)
                    continue

                # Cover either a single video or a single image
                if isinstance(value, np.ndarray) and value.size > 0:
                    # Video - Must have 5 dimensions in NTCHW format:
                    # C must be either 1 for grayscale of 3 for RGB
                    if value.ndim == 5 and value.shape[2] in [1, 3]:
                        self._trial_writer[trial].add_video(
                            full_attr, value, global_step=step, fps=20
                        )
                        continue

                    # Image - Must have 3 dimensions in CHW format.
                    # C must be either 1 for grayscale of 3 for RGB
                    if value.ndim == 3 and value.shape[0] in [1, 3]:
                        self._trial_writer[trial].add_image(
                            full_attr, value, global_step=step
                        )
                        continue

                try:
                ...

@sven1977 sven1977 self-assigned this Jan 4, 2024
)
continue

# Must be multi-image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: What if this is a single video (t, w, h, c)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the definition of add_video() only 5-dimensional inputs are accepted for this function.

Do we anywhere pass data in a lower dimensional array into this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sven1977 Do you see any cases where this setup of separating arrays by dimension could fall on our feet?

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sven1977 sven1977 merged commit 60fc906 into ray-project:master Feb 20, 2024
9 checks passed
khluu pushed a commit that referenced this pull request Feb 21, 2024
@simonsays1980 simonsays1980 deleted the tune-tbxlogger-add-images branch February 21, 2024 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants