-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] [Core] [Xlang] Support Protobuf as a native serialization type #21383
Conversation
@@ -201,6 +206,29 @@ def _deserialize_object(self, data, metadata, object_ref): | |||
0] == ray_constants.OBJECT_METADATA_TYPE_ACTOR_HANDLE: | |||
obj = self._deserialize_msgpack_data(data, metadata_fields) | |||
return _actor_handle_deserializer(obj) | |||
elif metadata_fields[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a new metadata type, it can be embedded within msgpack like pickle5 data is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iycheng can chime in more. The reason here is to avoid double serialization from protobuf -> bytes -> msgpack wrapped bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can avoid double serialization and also is useful to support zero-copy semantic in the future with flatbuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl is this a blocker issue for you? I can push it into msgpack as well (in fact that's the previous implementation 😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pickle5 implementation actually avoids double serialization, by storing the pickle5 serialized data separately to enable zero-copy.
I'd like us to settle on a common implementation for these types of extensions, as we are seeing ArrowTable / other serialization extensions being proposed by @kira-lin @chaokunyang : #20242
@jovany-wang is leading the x-lang serialization improvements for 2.0; I've started a thread in Ray slack at #ray-serialization-2-point-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I have updated the PR to have the data wrapped by msgpack instead. We can't do exactly separately serialization like the way pickle5 did it because currently messagepack serializer defaults to pickle5 and there is no way to specify a mapping of type=>serializer for the msgpack extension types. This part needs more discussion.
pending rounds of design review... |
|
Pending Serialization V2 effort and alignment from @jovany-wang |
Closing this as a prototype. I will re-open this once serialization V2 API completes. |
Why are these changes needed?
This PR adds Protobuf as a native serialization type wrapped by message pack serializer. It cleans up the prototype in #21147. It enables protobuf to be send as values between Python and Java.
It solves two problem:
Checks
scripts/format.sh
to lint the changes in this PR.