Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] [streaming] Improve progress bar rendering on large scale jobs #33952

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Mar 30, 2023

Why are these changes needed?

For large jobs, the progress bar can be jerky since we do not limit the number of events processed per progress bar update.

@@ -405,17 +405,6 @@ def map_batches(
To learn more about writing functions for :meth:`~Dataset.map_batches`, read
:ref:`writing user-defined functions <transform_datasets_writing_udfs>`.

.. tip::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove overly spammy / incorrect tips

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's incorrect though? We no longer want to move usage to higher-level APIs like preprocessor ?

Copy link
Contributor Author

@ericl ericl Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of the opposite of ray-project/enhancements#25.

I don't think this was ever correct in the first place. It's not really good for Data users to be told to use preprocessor when they may not need it.

@@ -405,17 +405,6 @@ def map_batches(
To learn more about writing functions for :meth:`~Dataset.map_batches`, read
:ref:`writing user-defined functions <transform_datasets_writing_udfs>`.

.. tip::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's incorrect though? We no longer want to move usage to higher-level APIs like preprocessor ?

@ericl
Copy link
Contributor Author

ericl commented Mar 30, 2023

@zhe-thoughts we could consider merging this for 2.4, don't have a strong opinion.

@ericl ericl added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. Ray 2.4 labels Mar 30, 2023
@zhe-thoughts
Copy link
Collaborator

OK then let's wait until we unfreeze master.

@ericl ericl added Ray 2.5 and removed Ray 2.4 labels Mar 31, 2023
@ericl ericl merged commit f7c856b into ray-project:master Apr 4, 2023
bveeramani added a commit that referenced this pull request Aug 29, 2024
…47393)

In each Ray Data scheduling step, Ray Data launches tasks until it can't launch any more. To ensure that the progress bar updates frequently, #33952 made it such that Ray Data can only launch 50 tasks per scheduling step. However, this cause performance issues for large workloads.

This PR fixes the issue by removing the limit on number of tasks launched while still updating the progress bar frequently.
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 12, 2024
…ay-project#47393)

In each Ray Data scheduling step, Ray Data launches tasks until it can't launch any more. To ensure that the progress bar updates frequently, ray-project#33952 made it such that Ray Data can only launch 50 tasks per scheduling step. However, this cause performance issues for large workloads.

This PR fixes the issue by removing the limit on number of tasks launched while still updating the progress bar frequently.

Signed-off-by: ujjawal-khare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ray 2.5 tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants