[data] [streaming] Improve progress bar rendering on large scale jobs #33952

ericl · 2023-03-30T22:39:41Z

Why are these changes needed?

For large jobs, the progress bar can be jerky since we do not limit the number of events processed per progress bar update.

ericl · 2023-03-30T22:40:05Z

python/ray/data/dataset.py

@@ -405,17 +405,6 @@ def map_batches(
        To learn more about writing functions for :meth:`~Dataset.map_batches`, read
        :ref:`writing user-defined functions <transform_datasets_writing_udfs>`.

-        .. tip::


Remove overly spammy / incorrect tips

Why it's incorrect though? We no longer want to move usage to higher-level APIs like preprocessor ?

It's kind of the opposite of ray-project/enhancements#25.

I don't think this was ever correct in the first place. It's not really good for Data users to be told to use preprocessor when they may not need it.

jianoaix · 2023-03-30T23:21:23Z

python/ray/data/dataset.py

@@ -405,17 +405,6 @@ def map_batches(
        To learn more about writing functions for :meth:`~Dataset.map_batches`, read
        :ref:`writing user-defined functions <transform_datasets_writing_udfs>`.

-        .. tip::


Why it's incorrect though? We no longer want to move usage to higher-level APIs like preprocessor ?

ericl · 2023-03-30T23:40:27Z

@zhe-thoughts we could consider merging this for 2.4, don't have a strong opinion.

zhe-thoughts · 2023-03-31T03:54:43Z

OK then let's wait until we unfreeze master.

…47393) In each Ray Data scheduling step, Ray Data launches tasks until it can't launch any more. To ensure that the progress bar updates frequently, #33952 made it such that Ray Data can only launch 50 tasks per scheduling step. However, this cause performance issues for large workloads. This PR fixes the issue by removing the limit on number of tasks launched while still updating the progress bar frequently.

…ay-project#47393) In each Ray Data scheduling step, Ray Data launches tasks until it can't launch any more. To ensure that the progress bar updates frequently, ray-project#33952 made it such that Ray Data can only launch 50 tasks per scheduling step. However, this cause performance issues for large workloads. This PR fixes the issue by removing the limit on number of tasks launched while still updating the progress bar frequently. Signed-off-by: ujjawal-khare <[email protected]>

smooth pb

4778309

ericl assigned c21 Mar 30, 2023

ericl requested review from scv119, clarkzinzow and jjyao as code owners March 30, 2023 22:39

ericl assigned jianoaix Mar 30, 2023

ericl requested review from jianoaix and c21 as code owners March 30, 2023 22:39

ericl commented Mar 30, 2023

View reviewed changes

jianoaix approved these changes Mar 30, 2023

View reviewed changes

ericl added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. Ray 2.4 labels Mar 30, 2023

c21 approved these changes Mar 31, 2023

View reviewed changes

ericl added Ray 2.5 and removed Ray 2.4 labels Mar 31, 2023

ericl merged commit f7c856b into ray-project:master Apr 4, 2023

bveeramani mentioned this pull request Aug 28, 2024

[Data] Remove limit on number of tasks launched per scheduling step #47393

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] [streaming] Improve progress bar rendering on large scale jobs #33952

[data] [streaming] Improve progress bar rendering on large scale jobs #33952

ericl commented Mar 30, 2023

ericl Mar 30, 2023

jianoaix Mar 30, 2023

ericl Mar 30, 2023 •

edited

Loading

jianoaix Mar 30, 2023

ericl commented Mar 30, 2023

zhe-thoughts commented Mar 31, 2023

[data] [streaming] Improve progress bar rendering on large scale jobs #33952

[data] [streaming] Improve progress bar rendering on large scale jobs #33952

Conversation

ericl commented Mar 30, 2023

Why are these changes needed?

ericl Mar 30, 2023

Choose a reason for hiding this comment

jianoaix Mar 30, 2023

Choose a reason for hiding this comment

ericl Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

jianoaix Mar 30, 2023

Choose a reason for hiding this comment

ericl commented Mar 30, 2023

zhe-thoughts commented Mar 31, 2023

ericl Mar 30, 2023 •

edited

Loading