-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] [streaming] Improve progress bar rendering on large scale jobs #33952
Conversation
@@ -405,17 +405,6 @@ def map_batches( | |||
To learn more about writing functions for :meth:`~Dataset.map_batches`, read | |||
:ref:`writing user-defined functions <transform_datasets_writing_udfs>`. | |||
|
|||
.. tip:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove overly spammy / incorrect tips
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's incorrect though? We no longer want to move usage to higher-level APIs like preprocessor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of the opposite of ray-project/enhancements#25.
I don't think this was ever correct in the first place. It's not really good for Data users to be told to use preprocessor when they may not need it.
@@ -405,17 +405,6 @@ def map_batches( | |||
To learn more about writing functions for :meth:`~Dataset.map_batches`, read | |||
:ref:`writing user-defined functions <transform_datasets_writing_udfs>`. | |||
|
|||
.. tip:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's incorrect though? We no longer want to move usage to higher-level APIs like preprocessor ?
@zhe-thoughts we could consider merging this for 2.4, don't have a strong opinion. |
OK then let's wait until we unfreeze master. |
…47393) In each Ray Data scheduling step, Ray Data launches tasks until it can't launch any more. To ensure that the progress bar updates frequently, #33952 made it such that Ray Data can only launch 50 tasks per scheduling step. However, this cause performance issues for large workloads. This PR fixes the issue by removing the limit on number of tasks launched while still updating the progress bar frequently.
…ay-project#47393) In each Ray Data scheduling step, Ray Data launches tasks until it can't launch any more. To ensure that the progress bar updates frequently, ray-project#33952 made it such that Ray Data can only launch 50 tasks per scheduling step. However, this cause performance issues for large workloads. This PR fixes the issue by removing the limit on number of tasks launched while still updating the progress bar frequently. Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
For large jobs, the progress bar can be jerky since we do not limit the number of events processed per progress bar update.