Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] [Core] [Xlang] Support Protobuf as a native serialization type #21383

Closed
wants to merge 17 commits into from

Conversation

simon-mo
Copy link
Contributor

@simon-mo simon-mo commented Jan 4, 2022

Why are these changes needed?

This PR adds Protobuf as a native serialization type wrapped by message pack serializer. It cleans up the prototype in #21147. It enables protobuf to be send as values between Python and Java.

It solves two problem:

  • Protobuf is not serializable in Python by cloudpickle.
  • Cross language communication type is limited to primitives and collections. Protobuf adds schema-enforced complex data type to be sent between Python and Java. Note this is compatible with [RFC][Feature] Enhance xlang usability #20372 and any additional serializer to be added on top.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -201,6 +206,29 @@ def _deserialize_object(self, data, metadata, object_ref):
0] == ray_constants.OBJECT_METADATA_TYPE_ACTOR_HANDLE:
obj = self._deserialize_msgpack_data(data, metadata_fields)
return _actor_handle_deserializer(obj)
elif metadata_fields[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a new metadata type, it can be embedded within msgpack like pickle5 data is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iycheng can chime in more. The reason here is to avoid double serialization from protobuf -> bytes -> msgpack wrapped bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can avoid double serialization and also is useful to support zero-copy semantic in the future with flatbuffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericl is this a blocker issue for you? I can push it into msgpack as well (in fact that's the previous implementation 😅)

Copy link
Contributor

@ericl ericl Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pickle5 implementation actually avoids double serialization, by storing the pickle5 serialized data separately to enable zero-copy.

I'd like us to settle on a common implementation for these types of extensions, as we are seeing ArrowTable / other serialization extensions being proposed by @kira-lin @chaokunyang : #20242

@jovany-wang is leading the x-lang serialization improvements for 2.0; I've started a thread in Ray slack at #ray-serialization-2-point-0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I have updated the PR to have the data wrapped by msgpack instead. We can't do exactly separately serialization like the way pickle5 did it because currently messagepack serializer defaults to pickle5 and there is no way to specify a mapping of type=>serializer for the msgpack extension types. This part needs more discussion.

@ericl ericl added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. core-interface-change-approval-required This changes the Ray core behavior / API and requires broader approvals. labels Jan 4, 2022
@simon-mo simon-mo added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Jan 4, 2022
@simon-mo simon-mo changed the title [Core] [Xlang] Support Protobuf as a native serialization type [WIP] [Core] [Xlang] Support Protobuf as a native serialization type Jan 4, 2022
@simon-mo
Copy link
Contributor Author

simon-mo commented Jan 4, 2022

pending rounds of design review...

@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@simon-mo
Copy link
Contributor Author

simon-mo commented Feb 1, 2022

Pending Serialization V2 effort and alignment from @jovany-wang

@simon-mo simon-mo changed the title [WIP] [Core] [Xlang] Support Protobuf as a native serialization type [Draft] [Core] [Xlang] Support Protobuf as a native serialization type Feb 1, 2022
@simon-mo
Copy link
Contributor Author

simon-mo commented Feb 1, 2022

Closing this as a prototype. I will re-open this once serialization V2 API completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. core-interface-change-approval-required This changes the Ray core behavior / API and requires broader approvals.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants