Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization: zero-copy merge subframes when possible #5208

Merged

Conversation

gjoseph92
Copy link
Collaborator

Better implementation of #5112.

The point of this change (copied from original PR; note I tested and the screencap behavior works with this PR too):

When merging sub-frames before deserializing, if all the sub-frames are memoryviews backed by the same underlying buffer, "merge" them without copying by making a new memoryview pointing to the same buffer, instead of always copying.

Running @crusaderky's reproducer from #5107 on main:
before
With this PR:
after

I'd always wanted to merge the memoryviews by comparing their pointers, but couldn't find an API to access that information, hence the gnarly previous PR passing around byte offsets.

Thanks to @jakirkham on #5112 (comment), I wrote the pointer arithmetic using NumPy to access the buffer interface. Then I found @mattgwwalker's great post explaining how to do exactly what we needed just using ctypes: https://mattgwwalker.wordpress.com/2020/10/15/address-of-a-buffer-in-python/.

Use NumPy to get the pointers of the memoryviews and do pointer arithmetic. This is way more reliable and self-contained.

TODO unit test for `merge_memoryviews` and deal with NumPy situation
It's very annoying: `from_buffer` requires the buffer to be writeable, because ctypes is inconsistently strict about const correctness (see https://bugs.python.org/issue11427#msg148586). I don't think there's any clever pointer-casting we could do as mentioned in that thread, since we need to access through the buffer protocol—we can't just cast `mv.obj` to `c_void_p`, since we need the pointer into the current location within the buffer, not just at the start of the buffer.

So for read-only buffers, we try to use NumPy, and if it's not available, we'll end up silently(!) falling back on copying.
distributed/protocol/utils.py Outdated Show resolved Hide resolved
distributed/protocol/utils.py Show resolved Hide resolved
distributed/protocol/utils.py Show resolved Hide resolved
distributed/protocol/utils.py Outdated Show resolved Hide resolved
distributed/protocol/serialize.py Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

Neat that's a clever way to access pointers in Python 😄 Will take a closer look (just got back and am catching up 😅)

cc @madsbk (as you may have thoughts on zero-copy frame combination)

cc @quasiben (for general awareness)

@gjoseph92
Copy link
Collaborator Author

@jakirkham do you have any more thoughts on this? @jcrist mentioned he thought frame splitting itself might be unnecessary except for websockets, and should be handled at the transport level instead.

I think this PR has become somewhat low priority because there's discussion around significantly reworking the protocol, but I do have a feeling it would make a difference for users right now.

@jakirkham
Copy link
Member

I think this is still useful. Have some WIP review comments that I've fallen behind. Will try to catch back up later this week or next week.

Yeah at least on the read side we are allocating one big buffer, which we read into. Agree there's some legacy splitting in serialization that would be good to revisit. IIRC compression is one of the use cases for splitting during serialization (not sure if that is affecting us in deserialization as well).

To Jim's point, I think only frame splitting we do on for communication is baked into the transport layers already. One case is websockets. Another is working around a bug in OpenSSL 1.0.2. In either case neither of these should be in serialization. Jim if you see other issues please let us know.

@fjetter
Copy link
Member

fjetter commented Sep 30, 2021

I think this is a high impact issue. Are there any major concerns left before we can merge? I get that some parts of the protocol are currently in review and might change but until then this should deliver a significant improvement. The changes are well enough isolated that we can easily revert if necessary.

friendly ping @jakirkham @madsbk @jcrist

If there are no objections within the next 24h I will go ahead and merge.

@jcrist
Copy link
Member

jcrist commented Sep 30, 2021

I've given this a read through and it seems fine to me to merge as is. I still think that splitting large buffers should be done at the transport level instead of here, but for now this seems fine. Test failure is unrelated, should should be good-to-go.

@jakirkham
Copy link
Member

Have some draft review comments above that I haven't had a chance to finish. Can try to finish them, but unfortunately it won't happen for a few more days. That said, feel free to merge anyways if you need this soon and can try to follow up later

@gjoseph92
Copy link
Collaborator Author

@fjetter shall we merge this then? I'd love to see it in the next release.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay. Had a few minor comments below

distributed/protocol/utils.py Outdated Show resolved Hide resolved
distributed/protocol/utils.py Outdated Show resolved Hide resolved
@fjetter
Copy link
Member

fjetter commented Oct 7, 2021

I merged main and fixed a linting issue. will merge once green-ish

@fjetter
Copy link
Member

fjetter commented Oct 7, 2021

17 tests broke on the win py3.7 no ci1 build. I'll retrigger. if this persists we might need to look into it

@gjoseph92
Copy link
Collaborator Author

@fjetter the tests you just added in #5160 are failing on 3.7 everywhere, and ubuntu 3.8, with odd TypeErrors related to the mocks

>       assert fut2.key in mocked_gather.call_args.kwargs["to_gather"]
E       TypeError: tuple indices must be integers or slices, not str

Then there's known flaky #3574 and one distributed/tests/test_asyncprocess.py::test_exit_callback failure that must just be flaky.

@gjoseph92
Copy link
Collaborator Author

This is still open. I've merged main and the test failures feel flaky to me, but I'm not sure.

@jakirkham
Copy link
Member

FWIW merged @pentschev's PR ( #5484 ), which included some test fixes for UCX that should clear out the gpuCI failures seen here. May need to merge again to get those

Did see a couple unrelated test failures over there as well. Maybe @jrbourbeau will be able to let us know if these are flaky tests or if they need further investigation

Copy link
Contributor

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @gjoseph92, clever approach that is useful even if we move the splitting/merging to the transport layer!

@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

gpuCI now passed, thanks for checking this @jakirkham !

@gjoseph92
Copy link
Collaborator Author

gjoseph92 commented Nov 4, 2021

This is green besides test_reschedule_concurrent_requests_deadlock failing on all Windows machines. Also test_aliases_2 failed on macOS, which seems odd to me to be flaky, definitely doesn't seem related.

xref #5494

@jrbourbeau
Copy link
Member

Apologies for the delay here. A few folks have approved this PR, so I'll plan to merge this in later today if no further comments

@jrbourbeau jrbourbeau changed the title Deserialization: zero-copy merge subframes when possible [better than last PR] Deserialization: zero-copy merge subframes when possible Nov 12, 2021
@jrbourbeau jrbourbeau merged commit 5a75023 into dask:main Nov 12, 2021
@gjoseph92 gjoseph92 deleted the serialization/zero-copy-merge-memoryviews branch November 12, 2021 23:37
@gjoseph92
Copy link
Collaborator Author

Thanks @jrbourbeau and everyone for reviewing! Very excited to have this in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary deep copy causes memory flare on network comms
8 participants