Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Truncate progress bar description #46801

Merged
merged 5 commits into from
Jul 30, 2024

Conversation

scottjlee
Copy link
Contributor

@scottjlee scottjlee commented Jul 25, 2024

Why are these changes needed?

For ProgressBars used by Ray Data's executor to display operator completion progress, the description of the bar is currently the full operator name. This can become very long and unwieldy with operators with long names, or datasets with many operators (e.g. consecutive MapBatches operators become fused into one giant operator with a really long name).

This PR adds logic to truncate the ProgressBar's description if it exceeds 100 characters. There is also a parameter to disable this truncation, and always show the full progress bar description.

Related issue number

For the following script:

import ray
import time
import os

paths = ["s3://anonymous@air-example-data/iris.csv"]
ds = ray.data.read_csv(paths, override_num_blocks=20)
num_map_ops = 100

def f_with_really_long_name(batch):
    time.sleep(1)
    return batch

for _ in range(num_map_ops):
    ds = ds.map_batches(f_with_really_long_name)

ds.materialize()

We can compare the output before and after:

Before
Running 0: 0 bundle [00:00, ? bundle/s]lly_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatch
- ReadCSV->SplitBlocks(20): 1 active, 0 queued, [cpu: 1.0, objects: 5.6KB]: : 18 bundle [00:01, 16.37 bundle/s]es(f_with_really_long_name
- MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_rRunning: 2/10 CPU, 0/0 GPU, 512.0MB/1.0GB object_store_memory: : 0 bundle [00:01, ? bundle/s]apBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name): 2 active, 17 queued, [cpu: 2.0, objects: 512.0MB]: : 0 bundle [00:01, ? bundle/s]
After (note the `...` in the last line)
✔️  Dataset execution finished in 32.55 seconds: 100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:32<00:00,  1.63s/ bundle]]
- ReadCSV->SplitBlocks(20): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B]: : 20 bundle [00:32,  1.63s/ bundle]]name): 3 active, 17 queued, [cpu: 3.0, objects: 768.0
- MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->...->MapBatches(f_with_really_long_name): 0 active, 0 queued, [cpu: 0.0, objects: 840.0B

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@scottjlee scottjlee marked this pull request as ready for review July 25, 2024 20:12
Copy link
Contributor

@omatthew98 omatthew98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non-blocking question, otherwise lgtm.

Comment on lines 60 to 62
# If True, disables name trunctating.
self._display_full_name = display_full_name
self._desc = self._truncate_name(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this be set to True by the user? Is the expectation that they would not want that or should we have something in data context or execution options to allow for this configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah good point, i will need to expose it from DataContext

Signed-off-by: Scott Lee <[email protected]>
@@ -83,6 +87,12 @@ def __init__(
needs_warning = False
self._bar = None

def _truncate_name(self, name: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a warn-once that the name is getting truncated and that the behavior can be disabled with DEFAULT_ENABLE_PROGRESS_BAR_NAME_TRUNCATION? Not sure if users will know how to disable it otherwise

def _truncate_name(self, name: str) -> str:
ctx = ray.data.context.DataContext.get_current()
if ctx.enable_progress_bar_name_truncation and len(name) > self.MAX_NAME_LENGTH:
return name[: self.MAX_NAME_LENGTH - 3] + "..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: (Don't need to do this, just throwing it out as an idea) I'm wondering if we should include some of the text at the end. So, for example, instead of:

Map(spam)->Map(ham)->...

We could do something like

Map(spam)->...->Map(ham)

Might make it clearer which operator it ends on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sounds good. now the updated output looks like:

✔️  Dataset execution finished in 32.55 seconds: 100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:32<00:00,  1.63s/ bundle]]
- ReadCSV->SplitBlocks(20): 0 active, 0 queued, [cpu: 0.0, objects: 0.0B]: : 20 bundle [00:32,  1.63s/ bundle]]name): 3 active, 17 queued, [cpu: 3.0, objects: 768.0
- MapBatches(f_with_really_long_name)->MapBatches(f_with_really_long_name)->...->MapBatches(f_with_really_long_name): 0 active, 0 queued, [cpu: 0.0, objects: 840.0B

Signed-off-by: Scott Lee <[email protected]>
Signed-off-by: Scott Lee <[email protected]>
Comment on lines 108 to 120
op_names = name.split("->")
# Include as many operators as possible without exceeding `MAX_NAME_LENGTH`.
# Always include the first and last operator names so
# it is easy to identify the DAG.
truncated_op_names = [op_names[0]]
for i, op_name in enumerate(op_names[1:-1]):
if len("->".join(truncated_op_names)) + len(op_name) > self.MAX_NAME_LENGTH:
truncated_op_names.append("...")
break
truncated_op_names.append(op_name)
if len(op_names) > 1:
truncated_op_names.append(op_names[-1])
return "->".join(truncated_op_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not a big deal, but I think there are some edge cases where the truncated name can exceed MAX_NAME_LENGTH because we don't account for the last name or the additional "->"s.

Suggested change
op_names = name.split("->")
# Include as many operators as possible without exceeding `MAX_NAME_LENGTH`.
# Always include the first and last operator names so
# it is easy to identify the DAG.
truncated_op_names = [op_names[0]]
for i, op_name in enumerate(op_names[1:-1]):
if len("->".join(truncated_op_names)) + len(op_name) > self.MAX_NAME_LENGTH:
truncated_op_names.append("...")
break
truncated_op_names.append(op_name)
if len(op_names) > 1:
truncated_op_names.append(op_names[-1])
return "->".join(truncated_op_names)
op_names = name.split("->")
if len(op_names) == 1:
return op_names[0]
else:
# Include as many operators as possible without exceeding `MAX_NAME_LENGTH`.
# Always include the first and last operator names so
# it is easy to identify the DAG.
truncated_op_names = [op_names[0]]
for op_name in op_names[1:-1]:
if len("->".join(truncated_op_names)) + len("->") + len(op_name) + len("->") + len(op_names[-1]) > self.MAX_NAME_LENGTH:
truncated_op_names.append("...")
break
truncated_op_names.append(op_name)
truncated_op_names.append(op_names[-1])
return "->".join(truncated_op_names)

Signed-off-by: Scott Lee <[email protected]>
@bveeramani bveeramani enabled auto-merge (squash) July 30, 2024 18:44
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jul 30, 2024
@bveeramani bveeramani merged commit 727139c into ray-project:master Jul 30, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants