Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use bytearray().join #8312

Merged
merged 1 commit into from
Oct 30, 2023
Merged

Conversation

crusaderky
Copy link
Collaborator

When merging two uncontiguous frames, don't use bytearray().join(...). Instead use a much faster numpy-based buffer.

This can reduce the runtimes of from_frames from 60ms to 20ms when receiving a single 128 MiB numpy array sharded in two:

  • tcp: unaffected (frames are contiguous to begin with)
  • asyncio_tcp: affected (sharded frames are separate)
  • ucx: untested (can't cause a regression)
  • ws: untested (can't cause a regression)
  • inproc: N/A (no serialization)

@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  ±  0         27 suites  ±0   15h 5m 29s ⏱️ - 4m 6s
  3 941 tests +  3    3 821 ✔️ +  5     117 💤 ±0  3  - 2 
49 504 runs  +39  47 133 ✔️ +39  2 368 💤 +2  3  - 2 

For more details on these failures, see this check.

Results for commit 9c1e3a8. ± Comparison against base commit 6f9e491.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@milesgranger milesgranger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍

@fjetter
Copy link
Member

fjetter commented Oct 30, 2023

tcp: unaffected (frames are contiguous to begin with)

If TCP is unaffected, why do we care? The asyncio_tcp module is not used anywhere.

@crusaderky
Copy link
Collaborator Author

tcp: unaffected (frames are contiguous to begin with)

If TCP is unaffected, why do we care? The asyncio_tcp module is not used anywhere.

Because the tests in distribute/comm/tests/test_comms.py use the tcp fixture which is parameterized for tcp and asyncio_tcp. I decided it was quicker to fix the issue than to figure out a way to exclude asyncio_tcp for a use case that was failing because of this.

@crusaderky crusaderky merged commit 6f5109c into dask:main Oct 30, 2023
30 of 34 checks passed
@crusaderky crusaderky deleted the host_array_from_buffers branch October 30, 2023 17:12
@mrocklin
Copy link
Member

Note for the future, we should probably get more into the practice of considering nuking things as a viable option.

@crusaderky
Copy link
Collaborator Author

I agree that it's silly to maintain two tcp backends and we should either nuke asyncio_tcp and commit to tornado or invest time in fixing asyncio_tcp and nuke tornado. Something to discuss during our next meeting.

pentschev added a commit to pentschev/ucxx that referenced this pull request Oct 30, 2023
dask/distributed#8312 has changed the
`host_array` submodule where it belongs.
@dhirschfeld
Copy link
Contributor

If work is being done on asyncio_tcp, maybe porting to anyio could be on the cards?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants