-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] combine_chunks before chunking pyarrow.Table block into batches #34352
Conversation
Signed-off-by: Jiajun Yao <[email protected]>
python/ray/data/_internal/batcher.py
Outdated
# pyarrow.Table.slice is slow when the table has many chunks | ||
# so we combine chunks into a single one to make slice faster | ||
# with the cost of an extra copy. | ||
# See https://github.com/ray-project/ray/issues/31108 for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we file an upstream bug with Arrow here? It would nice to be able to not do this in the future, since presumably this significantly increases our peak memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reached out to them via the mailing list. I can also file a GH issue.
Signed-off-by: Jiajun Yao <[email protected]>
map_batches_benchmark_single_node: https://buildkite.com/ray-project/release-tests-pr/builds/34832#01877910-398e-4751-ba0c-f94fbacd1306
|
python/ray/data/_internal/batcher.py
Outdated
and block.num_columns > 0 | ||
and block.column(0).num_chunks > 1 | ||
): | ||
block = transform_pyarrow.combine_chunks(block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we structure the code to only do combine_chunks
when necessary?
-
In
next_batch()
below, we callslice()
to get one batch from block. Can we callcombine_chunks
there instead? We don't need to callcombine_chunks
if the block is not sliced. -
How many
num_chunks
we saw in benchmark? Can we have a minimal threshold? I feelnum_chunks > 1
is a bit too aggressive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the benchmark we have 8k+ chunks. But with my test, even with 10 big chunks, combine_chunks first is still faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the size of big chunks? Shall we decide to combine chunks based on
- number of chunks
- size of each chunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @jjyao, we decided to go with current approach, and expose a constant MIN_NUM_CHUNKS_TO_TRIGGER_COMBINE_CHUNKS
, so we can change the constant easily when debugging the issue later.
thanks @jjyao! can we also make the same change inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, let's make sure to add a TODO with the linked Arrow issue.
One way to do this is to add some asserts on num chunks on all the main consumption paths. |
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jjyao!
ray-project#34352) pyarrow.Table.slice is slow when the table has many chunks which makes batching pyarrow block slow. The fix is combining chunks into a single one to make slice faster with the cost of an extra copy. Signed-off-by: Jiajun Yao <[email protected]>
ray-project#34352) pyarrow.Table.slice is slow when the table has many chunks which makes batching pyarrow block slow. The fix is combining chunks into a single one to make slice faster with the cost of an extra copy. Signed-off-by: Jiajun Yao <[email protected]> Signed-off-by: elliottower <[email protected]>
ray-project#34352) pyarrow.Table.slice is slow when the table has many chunks which makes batching pyarrow block slow. The fix is combining chunks into a single one to make slice faster with the cost of an extra copy. Signed-off-by: Jiajun Yao <[email protected]> Signed-off-by: Jack He <[email protected]>
Why are these changes needed?
pyarrow.Table.slice is slow when the table has many chunks which makes batching pyarrow block slow. The fix is combining chunks into a single one to make slice faster with the cost of an extra copy.
Related issue number
Closes #31108
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.