[Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit #46924

Bye-legumes · 2024-08-01T21:06:56Z

Why are these changes needed?

close #46579

Related issue number

Checks

[√] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
[√] I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
[√] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [√] Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: zhilong <[email protected]>

scottjlee

Could you also show an example screenshot of a progress bar output with a mix of OneToOne and AllToAll operators?

scottjlee · 2024-08-05T17:17:10Z

python/ray/data/_internal/execution/operators/base_physical_operator.py

+        if self._num_outputs:
+            return sum(bundle.num_rows() for bundle in self._output_buffer)
+        return self.input_dependencies[0].num_output_rows_total()


i think here, we can simply return the input dependency's num_output_rows_total(). Because we don't know when/if self._output_buffer is completed, this may give a lower total row count as the AllToAllOperator continues to execute and more items are added into self._output_buffer.

Suggested change

if self._num_outputs:

return sum(bundle.num_rows() for bundle in self._output_buffer)

return self.input_dependencies[0].num_output_rows_total()

return self.input_dependencies[0].num_output_rows_total()

Signed-off-by: zhilong <[email protected]>

Bye-legumes · 2024-08-06T14:19:20Z

Could you also show an example screenshot of a progress bar output with a mix of OneToOne and AllToAll operators?
This is a example of sort and how it look like.

scottjlee · 2024-08-06T17:24:53Z

Could you also show an example screenshot of a progress bar output with a mix of OneToOne and AllToAll operators?
This is a example of sort and how it look like.

For the Sort and Sort Sample progress bars, I would expect that these bars also report row/s throughput, but they seem to be unknown ?. is it possible to also correct the bar behavior for this case? thanks!

Signed-off-by: zhilong <[email protected]>

Bye-legumes · 2024-08-06T18:04:02Z

Could you also show an example screenshot of a progress bar output with a mix of OneToOne and AllToAll operators?
This is a example of sort and how it look like.

For the Sort and Sort Sample progress bars, I would expect that these bars also report row/s throughput, but they seem to be unknown ?. is it possible to also correct the bar behavior for this case? thanks!

fixed now. I just need to change to def num_output_rows_total(self) -> Optional[int]: return ( self._output_rows if self._output_rows else self.input_dependencies[0].num_outputs_total() ) similar to the number of bundles.

python/ray/data/_internal/execution/operators/base_physical_operator.py

…erator.py Co-authored-by: Scott Lee <[email protected]> Signed-off-by: zhilong <[email protected]>

scottjlee · 2024-08-07T00:24:45Z

Also not sure if this is related to the fix I mentioned above, but the global progress bar also shows in bundles (197/197). Could you confirm whether this is still an issue after updating to use num_output_rows_total()?

Bye-legumes · 2024-08-07T00:36:44Z

Also not sure if this is related to the fix I mentioned above, but the global progress bar also shows in bundles (197/197). Could you confirm whether this is still an issue after updating to use num_output_rows_total()?

Here is the current screenshot...Your are right.. This still cannot fix that.. Let me check if there are are place that I need to modify...

Here is the codes that I used for testing and let me check how the parallelism influenced the rows..

import ray
ray.init(address = "10.193.182.83:6274")
ctx = ray.data.context.DatasetContext.get_current()
use_push_based_shuffle = False
num_items = 30001
parallelism = 200
import pandas as pd
import numpy as np
import time


t1 = time.time()
original = ctx.use_push_based_shuffle
ctx.use_push_based_shuffle = use_push_based_shuffle

a = list(reversed(range(num_items)))

shard = int(np.ceil(num_items / parallelism))
b = [1]*1
offset = 0
dfs = []
while offset < num_items:
    dfs.append(
        pd.DataFrame(
            {"a": a[offset : offset + shard], "b": [b]*len(a[offset : offset + shard])}
        )
    )
    offset += shard
if offset < num_items:
    dfs.append(pd.DataFrame({"a": a[offset:], "b": b[offset:]}))
ds = ray.data.from_pandas(dfs)
sorted_ds = ds.sort(key="a")
res = [tuple(row.values()) for row in sorted_ds.iter_rows()]
print(f"time used : \n{time.time()-t1}")

Bye-legumes · 2024-08-07T00:39:20Z

I think for the above, it's related to here

ray/python/ray/data/_internal/planner/exchange/sort_task_spec.py

Line 165 in 85eaffd

sample_bar = ProgressBar(

I many need to modify the task specification..

scottjlee · 2024-08-07T17:00:19Z

I think for the above, it's related to here

ray/python/ray/data/_internal/planner/exchange/sort_task_spec.py

Line 165 in 85eaffd

sample_bar = ProgressBar(

I many need to modify the task specification..

i think that one is related to the bar with Sort Sample. For this one, since the task specification is a bit more involved, we can do this in a followup. But for the global bar (Dataset execution finished in ...), the output unit still shows as blocks or bundles, which I think we should change to rows.

Signed-off-by: zhilong <[email protected]>

Bye-legumes · 2024-08-07T17:04:15Z

I think for the above, it's related to here

ray/python/ray/data/_internal/planner/exchange/sort_task_spec.py

Line 165 in 85eaffd

sample_bar = ProgressBar(

I many need to modify the task specification..

i think that one is related to the bar with Sort Sample. For this one, since the task specification is a bit more involved, we can do this in a followup. But for the global bar (Dataset execution finished in ...), the output unit still shows as blocks or bundles, which I think we should change to rows.

oh. OK, I see! I am still try to change the sort sample. But if it can be a followup for all other shuffle ops, I think it's OK now and here is what looks like. The modification to fetch_until_complete will works for all shuffle ops I think

Signed-off-by: zhilong <[email protected]>

scottjlee · 2024-08-08T18:16:25Z

python/ray/data/_internal/progress_bar.py

+                num_rows = (
+                    result.num_rows if hasattr(result, "num_rows") else 1
+                )  # Default to 1 if no row count is available
+                total_rows_processed += num_rows
+            # TODO(zhilong): Change the total to total_row when init progress bar
+            self.update(total_rows_processed)


nice, thanks for the fix here. for consistency, can we also apply the same logic in block_until_complete() here? https://github.com/ray-project/ray/pull/46924/files/affadbab90fdb47b8838f0dc0f40d917ccbd45c3..e231e41c1d436434b7c20b2b8203bc254e3828d2#diff-968882a8882d3516ef0f814415b69f55edd379743ee4b8a9a1c849fc1afede04R137

Signed-off-by: zhilong <[email protected]>

scottjlee

Thanks for the improvement @Bye-legumes !

omatthew98

Some questions but lgtm otherwise!

omatthew98 · 2024-08-09T18:36:30Z

python/ray/data/_internal/execution/operators/base_physical_operator.py

+        return (
+            self._output_rows
+            if self._output_rows
+            else self.input_dependencies[0].num_output_rows_total()


Is self.input_dependencies[0].num_output_rows_total() something that is static? Should we cache this value with some call like self._output_rows = self.input_dependencies[0].num_output_rows_total()?

If this total is a live total that is updated as execution continues makes sense to leave as is.

Right! Here the self._output_rows is not static, but it's our primary option, as it will be update here

python/ray/data/_internal/execution/operators/base_physical_operator.py

Signed-off-by: zhilong <[email protected]>

alltoall

e3bde4b

Signed-off-by: zhilong <[email protected]>

Bye-legumes requested review from ericl, scv119, c21, amogkam, scottjlee, bveeramani, raulchen, stephanie-wang and omatthew98 as code owners August 1, 2024 21:06

scottjlee self-assigned this Aug 1, 2024

Bye-legumes changed the title ~~[Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit~~ [WIP][Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit Aug 2, 2024

Bye-legumes mentioned this pull request Aug 2, 2024

[Data]Update Data progress bars to use row as the iteration unit #46699

Merged

4 tasks

Merge branch 'master' into row4alltoallop

436a06b

scottjlee reviewed Aug 5, 2024

View reviewed changes

Bye-legumes and others added 2 commits August 6, 2024 10:11

Merge branch 'master' into row4alltoallop

469dadd

fix

2278193

Signed-off-by: zhilong <[email protected]>

Bye-legumes changed the title ~~[WIP][Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit~~ [Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit Aug 6, 2024

fix

affadba

Signed-off-by: zhilong <[email protected]>

scottjlee reviewed Aug 7, 2024

View reviewed changes

python/ray/data/_internal/execution/operators/base_physical_operator.py Outdated Show resolved Hide resolved

Bye-legumes and others added 2 commits August 6, 2024 20:23

Update python/ray/data/_internal/execution/operators/base_physical_op…

2e43235

…erator.py Co-authored-by: Scott Lee <[email protected]> Signed-off-by: zhilong <[email protected]>

Merge branch 'master' into row4alltoallop

2ab1358

fix

fa03501

Signed-off-by: zhilong <[email protected]>

fix

ccd786f

Signed-off-by: zhilong <[email protected]>

fix

e231e41

Signed-off-by: zhilong <[email protected]>

scottjlee reviewed Aug 8, 2024

View reviewed changes

Bye-legumes and others added 3 commits August 8, 2024 14:22

fix

dbe70b9

Signed-off-by: zhilong <[email protected]>

Merge branch 'master' into row4alltoallop

ee0a105

fix

96d0032

Signed-off-by: zhilong <[email protected]>

scottjlee approved these changes Aug 8, 2024

View reviewed changes

scottjlee assigned omatthew98 Aug 8, 2024

Merge branch 'master' into row4alltoallop

cd75b7c

omatthew98 approved these changes Aug 9, 2024

View reviewed changes

fix

f901228

Signed-off-by: zhilong <[email protected]>

scottjlee added the go add ONLY when ready to merge, run all tests label Aug 10, 2024

anyscalesam added P1 Issue that should be fixed within a few weeks data Ray Data-related issues labels Aug 12, 2024

scottjlee merged commit 872ce54 into ray-project:master Aug 12, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit #46924

[Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit #46924

Bye-legumes commented Aug 1, 2024

scottjlee left a comment

scottjlee Aug 5, 2024

Bye-legumes Aug 6, 2024

Bye-legumes commented Aug 6, 2024 •

edited

Loading

scottjlee commented Aug 6, 2024

Bye-legumes commented Aug 6, 2024

scottjlee commented Aug 7, 2024

Bye-legumes commented Aug 7, 2024

Bye-legumes commented Aug 7, 2024

scottjlee commented Aug 7, 2024

Bye-legumes commented Aug 7, 2024 •

edited

Loading

scottjlee Aug 8, 2024

Bye-legumes Aug 8, 2024

scottjlee left a comment

omatthew98 left a comment

omatthew98 Aug 9, 2024

Bye-legumes Aug 9, 2024

[Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit #46924

[Data] AlltoAll OP, Update Data progress bars to use row as the iteration unit #46924

Conversation

Bye-legumes commented Aug 1, 2024

Why are these changes needed?

Related issue number

Checks

scottjlee left a comment

Choose a reason for hiding this comment

scottjlee Aug 5, 2024

Choose a reason for hiding this comment

Bye-legumes Aug 6, 2024

Choose a reason for hiding this comment

Bye-legumes commented Aug 6, 2024 • edited Loading

scottjlee commented Aug 6, 2024

Bye-legumes commented Aug 6, 2024

scottjlee commented Aug 7, 2024

Bye-legumes commented Aug 7, 2024

Bye-legumes commented Aug 7, 2024

scottjlee commented Aug 7, 2024

Bye-legumes commented Aug 7, 2024 • edited Loading

scottjlee Aug 8, 2024

Choose a reason for hiding this comment

Bye-legumes Aug 8, 2024

Choose a reason for hiding this comment

scottjlee left a comment

Choose a reason for hiding this comment

omatthew98 left a comment

Choose a reason for hiding this comment

omatthew98 Aug 9, 2024

Choose a reason for hiding this comment

Bye-legumes Aug 9, 2024

Choose a reason for hiding this comment

Bye-legumes commented Aug 6, 2024 •

edited

Loading

Bye-legumes commented Aug 7, 2024 •

edited

Loading