-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit #46924
Conversation
Signed-off-by: zhilong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also show an example screenshot of a progress bar output with a mix of OneToOne and AllToAll operators?
if self._num_outputs: | ||
return sum(bundle.num_rows() for bundle in self._output_buffer) | ||
return self.input_dependencies[0].num_output_rows_total() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think here, we can simply return the input dependency's num_output_rows_total()
. Because we don't know when/if self._output_buffer
is completed, this may give a lower total row count as the AllToAllOperator continues to execute and more items are added into self._output_buffer
.
if self._num_outputs: | |
return sum(bundle.num_rows() for bundle in self._output_buffer) | |
return self.input_dependencies[0].num_output_rows_total() | |
return self.input_dependencies[0].num_output_rows_total() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
python/ray/data/_internal/execution/operators/base_physical_operator.py
Outdated
Show resolved
Hide resolved
…erator.py Co-authored-by: Scott Lee <[email protected]> Signed-off-by: zhilong <[email protected]>
Also not sure if this is related to the fix I mentioned above, but the global progress bar also shows in bundles ( |
I think for the above, it's related to here
|
i think that one is related to the bar with |
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
num_rows = ( | ||
result.num_rows if hasattr(result, "num_rows") else 1 | ||
) # Default to 1 if no row count is available | ||
total_rows_processed += num_rows | ||
# TODO(zhilong): Change the total to total_row when init progress bar | ||
self.update(total_rows_processed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks for the fix here. for consistency, can we also apply the same logic in block_until_complete()
here? https://github.com/ray-project/ray/pull/46924/files/affadbab90fdb47b8838f0dc0f40d917ccbd45c3..e231e41c1d436434b7c20b2b8203bc254e3828d2#diff-968882a8882d3516ef0f814415b69f55edd379743ee4b8a9a1c849fc1afede04R137
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: zhilong <[email protected]>
Signed-off-by: zhilong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement @Bye-legumes !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions but lgtm otherwise!
return ( | ||
self._output_rows | ||
if self._output_rows | ||
else self.input_dependencies[0].num_output_rows_total() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is self.input_dependencies[0].num_output_rows_total()
something that is static? Should we cache this value with some call like self._output_rows = self.input_dependencies[0].num_output_rows_total()
?
If this total is a live total that is updated as execution continues makes sense to leave as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python/ray/data/_internal/execution/operators/base_physical_operator.py
Outdated
Show resolved
Hide resolved
Signed-off-by: zhilong <[email protected]>
Why are these changes needed?
close #46579
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.