Serialize and split #4541

madsbk · 2021-02-23T16:22:41Z

Simplify the serialization, splitting, and writability of objects.

This work is a precursor to #4531 that makes is possible to have msgpack extract serializable objects while supporting splitting and maintain writability of objects.

Tests added / passed
Passes black distributed / flake8 distributed

jakirkham

Thanks Mads! 😄

Had a few comments below

jakirkham · 2021-02-23T18:39:19Z

distributed/protocol/serialize.py

+    header = {
+        "serializer": "pickle",
+        "pickle-writeable": tuple(not f.readonly for f in frames[1:]),
+    }


Should we do something similar in dask_dumps and cuda_dumps?

I am not sure if we want to do that here or in the individual registered dumps/loads functions like the numpy serialization does?
Anyways, I don't think it should block this PR.

Yeah it's a good question. I think support for NumPy arrays is a bit older as it is a primary use case. So that function may just be a bit unusual because of that.

We should be ok pulling this out of the NumPy case and handling them generally. I would think that should yield simpler easier to understand code, but could be wrong about that

For context tracking writeable frames was needed to solve some gnarly issues ( #1978 ) ( #3943 ). So if there is a general way to solve this, that would be ideal to ensure they don't resurface

I agree but let's do that in a follow up PR.
It assumes that dask_dumps returns a memoryview compatible object, is that right?
Also, we apparently allow additionally frames when deserializing: https://github.com/dask/distributed/blob/master/distributed/protocol/tests/test_serialize.py#L82

Sure sounds good 🙂

Yeah though I think that is pretty closely enforced today

I think that is just showing we ignore empty frames, but could be missing something

distributed/protocol/numpy.py

distributed/protocol/serialize.py

Co-authored-by: jakirkham <[email protected]>

…nd_split

distributed/protocol/serialize.py

Co-authored-by: jakirkham <[email protected]>

jakirkham · 2021-02-25T17:20:29Z

distributed/protocol/rmm.py

@@ -31,7 +30,6 @@ def cuda_deserialize_rmm_device_buffer(header, frames):
    @dask_serialize.register(rmm.DeviceBuffer)
    def dask_serialize_rmm_device_buffer(x):
        header, frames = cuda_serialize_rmm_device_buffer(x)
-        header["writeable"] = (None,) * len(frames)


Just wanted to note that None has a special meaning here. It basically means it doesn't matter whether this is read-only or writeable. IOW skip trying to copy this. The reason we include this (and in particular on the Dask serialization path) is to avoid an extra copy of buffers we plan to move to device later

That said, I think the changes here may already capture this use case. Just wanted to surface the logic to hopefully clarify what is going on currently and catch any remaining things not yet addressed

jakirkham · 2021-02-25T19:03:39Z

Have we tried running the CUDA tests locally as well?

madsbk · 2021-02-26T08:10:55Z

Have we tried running the CUDA tests locally as well?

Yes, they are all passing on my laptop :)

jakirkham · 2021-02-26T17:16:27Z

Thanks Mads! 😄

madsbk added 5 commits February 22, 2021 13:48

Minor clean up

dda4ed9

serialize numpy handles the writeable flag

a58eede

pickle handles the writeable flag

674d743

serialize_and_split() and merge_and_deserialize()

b9d0a71

docstrings

9cb15bb

madsbk mentioned this pull request Feb 23, 2021

[REVIEW] Msgpack handles extract serialize #4531

Merged

5 tasks

madsbk marked this pull request as ready for review February 23, 2021 17:21

jakirkham reviewed Feb 24, 2021

View reviewed changes

madsbk and others added 3 commits February 24, 2021 08:52

Use numpy require() to make it writeable

e875d76

Co-authored-by: jakirkham <[email protected]>

removed merge_frames()

5c4a05d

Removed obsolete writeable and lengths header

36d481c

madsbk force-pushed the serialize_and_split branch from 3339b4b to 36d481c Compare February 24, 2021 08:42

madsbk added 3 commits February 24, 2021 10:44

use tuples to match msgpack's implicit convertion to tuples

ffaced0

Make sure compression in header is extended when splitting frames

73923ea

Merge branch 'master' of github.com:dask/distributed into serialize_a…

f24531a

…nd_split

jakirkham reviewed Feb 25, 2021

View reviewed changes

distributed/protocol/serialize.py Outdated Show resolved Hide resolved

madsbk force-pushed the serialize_and_split branch from 7b02b5c to 50853b8 Compare February 25, 2021 08:47

pickle_loads(): cast shape and type

89c6f05

Co-authored-by: jakirkham <[email protected]>

madsbk force-pushed the serialize_and_split branch from 50853b8 to 89c6f05 Compare February 25, 2021 12:08

jakirkham reviewed Feb 25, 2021

View reviewed changes

jakirkham approved these changes Feb 26, 2021

View reviewed changes

jakirkham merged commit 7f8bb81 into dask:master Feb 26, 2021

madsbk deleted the serialize_and_split branch March 1, 2021 08:03

JSKenyon mentioned this pull request Mar 19, 2021

pickle_loads attempts item assignment with tuple #4611

Closed

jakirkham mentioned this pull request Feb 1, 2022

Cythonic SchedulerState (WIP) #5176

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize and split #4541

Serialize and split #4541

madsbk commented Feb 23, 2021 •

edited

Loading

jakirkham left a comment

jakirkham Feb 23, 2021

madsbk Feb 24, 2021

jakirkham Feb 25, 2021

madsbk Feb 25, 2021

jakirkham Feb 25, 2021

jakirkham Feb 25, 2021

jakirkham commented Feb 25, 2021

madsbk commented Feb 26, 2021 •

edited

Loading

jakirkham commented Feb 26, 2021

Serialize and split #4541

Serialize and split #4541

Conversation

madsbk commented Feb 23, 2021 • edited Loading

jakirkham left a comment

Choose a reason for hiding this comment

jakirkham Feb 23, 2021

Choose a reason for hiding this comment

madsbk Feb 24, 2021

Choose a reason for hiding this comment

jakirkham Feb 25, 2021

Choose a reason for hiding this comment

madsbk Feb 25, 2021

Choose a reason for hiding this comment

jakirkham Feb 25, 2021

Choose a reason for hiding this comment

jakirkham Feb 25, 2021

Choose a reason for hiding this comment

jakirkham commented Feb 25, 2021

madsbk commented Feb 26, 2021 • edited Loading

jakirkham commented Feb 26, 2021

madsbk commented Feb 23, 2021 •

edited

Loading

madsbk commented Feb 26, 2021 •

edited

Loading