Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P shuffle: pickle tiny buffers into monolithic bytes objects #8321

Merged
merged 1 commit into from
Nov 7, 2023

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Nov 2, 2023

  • In dataframe shuffle with shards >= 128 kiB each, avoid one deep copy of the buffers by letting the network stack transfer the shards as individual buffers

  • In array shuffle with shards < 128 kiB each, avoid the overhead of sending the buffers separately through the network stack by pickling everything into a single opaque bytes object ahead of time

  • Blocked by and incorporates Zero-copy array shuffle #8282
    See only the last commit for review.

  • Fixes performance regression with small chunks introduced by Zero-copy array shuffle #8282

  • See performance measures in Speed up network transfer for small buffers #8318

@crusaderky crusaderky self-assigned this Nov 2, 2023
Copy link
Contributor

github-actions bot commented Nov 2, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  ±  0         27 suites  ±0   14h 0m 14s ⏱️ - 50m 52s
  3 970 tests +  7    3 849 ✔️ +11     117 💤 ±0  4  - 4 
49 875 runs  +89  47 457 ✔️ +90  2 412 💤 +3  6  - 4 

For more details on these failures, see this check.

Results for commit 99b27c3. ± Comparison against base commit c91a735.

♻️ This comment has been updated with latest results.

@crusaderky crusaderky force-pushed the p2p_monolithic_bytes branch 3 times, most recently from cc58db6 to e02e945 Compare November 2, 2023 20:32
@crusaderky crusaderky changed the title [WIP] p2p shuffle: pickle tiny buffers into monolithic bytes objects P2P shuffle: pickle tiny buffers into monolithic bytes objects Nov 2, 2023
@crusaderky
Copy link
Collaborator Author

This PR fixes the performance regressions introduced by #8282 as expected. There are outstanding mild memory usage regressions which however I don't believe should be cause for concern (+1 GiB RAM usage cluster wide, or 100 MiB per worker).

There is no material impact on dataframe shuffle and tpch tests. I suspect this may have to do with the partition sizes in our test suite being always reasonable, so you never hit the pickle.dumps code path in shuffle._core. I'll try producing a use case where partitions are oversplit to prove my point.

image
image
image

@crusaderky crusaderky marked this pull request as ready for review November 3, 2023 13:03
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @crusaderky!

@hendrikmakait hendrikmakait merged commit 0dc9e88 into dask:main Nov 7, 2023
28 of 33 checks passed
@crusaderky crusaderky deleted the p2p_monolithic_bytes branch November 7, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants