Tune tbxlogger add images #37822

simonsays1980 · 2023-07-26T17:43:06Z

Why are these changes needed?

This PR enables users to provide in the result dicitonaries also image arrays that can be presented on TensorBoard like in the following example:

Images can be provided either as singleton in form of an np.ndarray with dimensions (3, H, W) or in form of a 4-d np.ndarray with dimensions (N, 3, H, W) (in this case images get concatenated horizontally).

Related issue number

#21954

As this issue is a P1 Issue that should be fixed within a few weeks rllib RLlib related issues the corresponding solution involves storing the images to the episode.media as this attribute of the episode is not summarized or appended in the metrics.collect_episodes() function.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Simon Zehnder <[email protected]>

simonsays1980 · 2023-07-31T15:21:18Z

@Yard1 @krfricke My PR fails because of a flakey test. How can I re-run? Should I just mrge the master when it updates?

Signed-off-by: Simon Zehnder <[email protected]>

yyyuhan · 2023-12-11T15:43:19Z

I applied a similar fix with the TensorboardX logger. In my case, the media metrics are being compiled into a list within the summarize_episodes function, as shown here. Consequently, in TBXLoggerCallback, I had to retrieve the last element from this list prior to invoking the add_image method.
Could you recommend a more efficient approach for this situation, similar to how you directly incorporated the image numpy array in your solution? I would appreciate your advice. Thank you.

JulianLorenz · 2023-12-21T11:16:14Z

I also tried to apply a similar modification to TBXLoggerCallback.
I discovered the following issue with image handling:

When the image in episode.media gets processed by the JSONLoggerCallback, significant delays (minutes!) are introduced by the JSON logger. This is because the JSON logger needs to rewrite the log file at each logging call (see #21416) . I worked around this by disabling other Logger Callbacks by setting TUNE_DISABLE_AUTO_CALLBACK_LOGGERS to 1.

Also I think it needs to be decided how to handle the case when images are logged by multiple episodes at one time in summarize_episodes. In my opinion, all images should be kept as it is currently implemented. The user can then decide how to log images across multiple episodes in his algorithm class by adding a on_train_result callback:

        def on_train_result(
                self,
                *,
                algorithm: "Algorithm",
                result: dict,
                **kwargs,
        ) -> None:
            """Called at the end of Algorithm.train().

            Args:
                algorithm: Current Algorithm instance.
                result: Dict of results returned from Algorithm.train() call.
                    You can mutate this object to add additional metrics.
                kwargs: Forward compatibility placeholder.
            """
            if 'trajectory' in result['episode_media'].keys():
                result['episode_media']['myimage'] = result['episode_media']['myimage'][0]

My modification of the TBXLoggerCallback.log_trial_result() function looks like this. I decided to make it more strict with the numpy array, to not automatically interpret any 3D array as image:

            ...
            elif (isinstance(value, list) and len(value) > 0) or (
                    isinstance(value, np.ndarray) and value.size > 0
            ):
                valid_result[full_attr] = value

                # Check for list of images:
                if all(isinstance(v, np.ndarray) and v.ndim == 3 and v.shape[0] in [1, 3] for v in value):
                    if len(value) == 1:
                        # only one image
                        self._trial_writer[trial].add_image(full_attr, value[0], global_step=step)
                    else:
                        # Multiple images - stack them as tensorboard requires
                        imgs = np.stack(value)
                        self._trial_writer[trial].add_images(full_attr, imgs, global_step=step)
                    continue

                # Check for list of videos:
                if all(isinstance(v, np.ndarray) and v.ndim == 5 and v.shape[2] in [1,3] for v in value):
                    video = np.concatenate(value, axis=1)
                    self._trial_writer[trial].add_video(
                        full_attr, video, global_step=step, fps=20)
                    continue

                # Cover either a single video or a single image
                if isinstance(value, np.ndarray) and value.size > 0:
                    # Video - Must have 5 dimensions in NTCHW format:
                    # C must be either 1 for grayscale of 3 for RGB
                    if value.ndim == 5 and value.shape[2] in [1, 3]:
                        self._trial_writer[trial].add_video(
                            full_attr, value, global_step=step, fps=20
                        )
                        continue

                    # Image - Must have 3 dimensions in CHW format.
                    # C must be either 1 for grayscale of 3 for RGB
                    if value.ndim == 3 and value.shape[0] in [1, 3]:
                        self._trial_writer[trial].add_image(
                            full_attr, value, global_step=step
                        )
                        continue

                try:
                ...

Signed-off-by: Simon Zehnder <[email protected]>

sven1977 · 2024-01-04T17:39:54Z

python/ray/tune/logger/tensorboardx.py

+                    )
+                    continue
+
+                # Must be multi-image


Dumb question: What if this is a single video (t, w, h, c)?

Following the definition of add_video() only 5-dimensional inputs are accepted for this function.

Do we anywhere pass data in a lower dimensional array into this function?

@sven1977 Do you see any cases where this setup of separating arrays by dimension could fall on our feet?

python/ray/tune/logger/tensorboardx.py

Signed-off-by: Sven Mika <[email protected]>

Signed-off-by: Simon Zehnder <[email protected]>

…ray into tune-tbxlogger-add-images Signed-off-by: Simon Zehnder <[email protected]>

sven1977

LGTM!

simonsays1980 added 5 commits July 26, 2023 19:16

Added image support to TBXLogger and TBXLoggerCallback.

0aa04de

Signed-off-by: Simon Zehnder <[email protected]>

Reformatted code.

622cea2

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into tune-tbxlogger-add-images

aac64db

Merge branch 'master' into tune-tbxlogger-add-images

ae97366

Merge branch 'master' into tune-tbxlogger-add-images

d17d112

simonsays1980 added 3 commits August 2, 2023 15:41

Merge branch 'master' into tune-tbxlogger-add-images

97e3472

Merge branch 'master' into tune-tbxlogger-add-images

3e845ac

Merge branch 'master' into tune-tbxlogger-add-images

115be6f

simonsays1980 mentioned this pull request Aug 6, 2023

[Air|Tune] Allow Image logging in WandB #38157

Closed

simonsays1980 added 4 commits August 16, 2023 18:46

Merge branch 'master' into tune-tbxlogger-add-images

0b464ca

Merge branch 'master' into tune-tbxlogger-add-images

80cd7ce

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into tune-tbxlogger-add-images

6951d80

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into tune-tbxlogger-add-images

df035d0

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'master' into tune-tbxlogger-add-images

378aac4

Signed-off-by: Simon Zehnder <[email protected]>

simonsays1980 force-pushed the tune-tbxlogger-add-images branch from e10cf2c to 378aac4 Compare January 3, 2024 13:43

sven1977 self-assigned this Jan 4, 2024

sven1977 reviewed Jan 4, 2024

View reviewed changes

python/ray/tune/logger/tensorboardx.py Outdated Show resolved Hide resolved

sven1977 reviewed Jan 4, 2024

View reviewed changes

python/ray/tune/logger/tensorboardx.py Outdated Show resolved Hide resolved

sven1977 reviewed Jan 4, 2024

View reviewed changes

python/ray/tune/logger/tensorboardx.py Outdated Show resolved Hide resolved

sven1977 reviewed Jan 4, 2024

View reviewed changes

python/ray/tune/logger/tensorboardx.py Outdated Show resolved Hide resolved

sven1977 and others added 3 commits January 4, 2024 18:40

Apply suggestions from code review

41ae26d

Signed-off-by: Sven Mika <[email protected]>

Merge branch 'master' into tune-tbxlogger-add-images

f6a174e

Signed-off-by: Simon Zehnder <[email protected]>

Merge branch 'tune-tbxlogger-add-images' of github.com:simonsays1980/…

c916f3b

…ray into tune-tbxlogger-add-images Signed-off-by: Simon Zehnder <[email protected]>

sven1977 approved these changes Feb 20, 2024

View reviewed changes

sven1977 merged commit 60fc906 into ray-project:master Feb 20, 2024
9 checks passed

khluu pushed a commit that referenced this pull request Feb 21, 2024

[RLlib, Tune] TBXLogger: Add images. (#37822)

64aa186

simonsays1980 deleted the tune-tbxlogger-add-images branch February 21, 2024 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune tbxlogger add images #37822

Tune tbxlogger add images #37822

simonsays1980 commented Jul 26, 2023 •

edited

Loading

simonsays1980 commented Jul 31, 2023

yyyuhan commented Dec 11, 2023 •

edited

Loading

JulianLorenz commented Dec 21, 2023 •

edited

Loading

sven1977 Jan 4, 2024

simonsays1980 Jan 21, 2024

simonsays1980 Jan 27, 2024

sven1977 left a comment

Tune tbxlogger add images #37822

Tune tbxlogger add images #37822

Conversation

simonsays1980 commented Jul 26, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 commented Jul 31, 2023

yyyuhan commented Dec 11, 2023 • edited Loading

JulianLorenz commented Dec 21, 2023 • edited Loading

sven1977 Jan 4, 2024

Choose a reason for hiding this comment

simonsays1980 Jan 21, 2024

Choose a reason for hiding this comment

simonsays1980 Jan 27, 2024

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented Jul 26, 2023 •

edited

Loading

yyyuhan commented Dec 11, 2023 •

edited

Loading

JulianLorenz commented Dec 21, 2023 •

edited

Loading