Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add pickling support to proto messages #280

Merged
merged 2 commits into from
Dec 9, 2021
Merged

Conversation

tseaver
Copy link
Contributor

@tseaver tseaver commented Dec 8, 2021

FBO of frameworks such as Apache Beam, which use them for sharing
state between trusted hosts.

Closes #260.

FBO of frameworks such as Apache Beam, which use them for sharing
state between trusted hosts.

Closes #260.
@codecov
Copy link

codecov bot commented Dec 8, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@437fb9f). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##             main      #280   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?        22           
  Lines           ?       968           
  Branches        ?       214           
========================================
  Hits            ?       968           
  Misses          ?         0           
  Partials        ?         0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 437fb9f...1a6c742. Read the comment docs.

@tswast
Copy link

tswast commented Dec 8, 2021

I don't know enough about the details of pickling or what the "serious consequences" were when objects were pickled before, but this would greatly simplify implementations of connectors in tools like Dask and Beam.

Connectors such as https://beam.apache.org/releases/pydoc/2.33.0/apache_beam.io.gcp.bigtableio.html#apache_beam.io.gcp.bigtableio.WriteToBigTable are already pickling objects that contain proto-plus properties, so whatever "serious consequences" we are worried about are already happening over there.

@tseaver
Copy link
Contributor Author

tseaver commented Dec 8, 2021

@tswast There is a lot of fear in the Python community around pickles, because unpickling an object can import code: if an attacker can get hostile (or compromised) code onto the system, and cause a process to unpickle an object which loads a class from the "evil" code, Bad Things(TM) happen. So, loading pickles from untrusted sources can widen an attack surface. Dask and Beam connectors don't really fall into that scenario, IMNSHO.

@software-dov
Copy link
Contributor

I'm not very familiar with Beam, and it could be that this falls outside the scope of Beam's use case, but one additional complication is compatibility between different versions of the same message, e.g.

# Old software version
from mollusc_v1 import Squid as SquidV1

s_v1 = SquidV1(mass_kg=10, length_cm=20)
s_pickle = pickle.dumps(s_v1)
with open("/tmp/squid.txt", "w") as f:
    f.write(s_pickle)

# Updated software version
from mollusc_v2 import Squid as SquidV2

with open("/tmp/squid.txt", "w") as f:
    s = f.read()

s_v2 = pickle.loads(s)

assert isinstance(s_v2, SquidV2)

What happens in the above scenario? Does the unpickling fail because the exact class isn't available? Is pickle smart/flexible enough to detect that SquidV2 is the class we have in mind? Are there subtle errors? Is the message class description serialized and deserialized as part of the pickle protocol?

@tseaver
Copy link
Contributor Author

tseaver commented Dec 9, 2021

@software-dov

Does the unpickling fail because the exact class isn't available? Is pickle smart/flexible enough to detect that SquidV2 is the class we have in mind? Are there subtle errors? Is the message class description serialized and deserialized as part of the pickle protocol?

A pickle of a class instance contains the dotted-importable name of the instance's type: it ignores any import foo as bar aliasing. The class is found via the unpickler's find_class method.

Another consequence is that instances of "local" classes (defined inside another function / method) aren't picklable, because their class can't be imported: that is why I had to define the Squid class in tests/test_message_pickling.py outside of the method.

@tseaver tseaver merged commit 2b7be35 into main Dec 9, 2021
@tseaver tseaver deleted the 260-add-pickling-support branch December 9, 2021 18:12
@tseaver tseaver mentioned this pull request Dec 9, 2021
@software-dov
Copy link
Contributor

software-dov commented Dec 9, 2021

@tseaver Okay, that makes sense, and that answers my questions about old/new message interchange. The importable path will be the same assuming no breaking changes. It is then protobuf's responsibility to handle unknown/missing fields, which is well tested.

This does imply that it is not possible to pickle messages across a major version update.

@software-dov software-dov linked an issue Jan 4, 2022 that may be closed by this pull request
gcf-merge-on-green bot pushed a commit that referenced this pull request Jan 25, 2022
🤖 I have created a release *beep* *boop*
---


### [1.19.9](v1.19.8...v1.19.9) (2022-01-25)


### Bug Fixes

* add pickling support to proto messages ([#280](#280)) ([2b7be35](2b7be35))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proto Plus messages are not pickleable
3 participants