Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pluggable compression serde #407

Merged
merged 5 commits into from
Sep 12, 2022
Merged

Conversation

sontek
Copy link
Contributor

@sontek sontek commented Jul 12, 2022

This adds a new class CompressedSerde which allows you to use it as an entrypoint to compression. It defaults to using zlib and has immediate benefits on the size of messages stored in memcached but it also has compress and decompress as pluggable points if you want to use a stronger compression algorithm like lz4.

.gitignore Outdated Show resolved Hide resolved
pymemcache/serde.py Outdated Show resolved Hide resolved

FLAG_BYTES = 0
FLAG_PICKLE = 1 << 0
FLAG_INTEGER = 1 << 1
FLAG_LONG = 1 << 2
FLAG_COMPRESSED = 1 << 3 # unused, to main compatibility with python-memcached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any insight into how this flag still maps back to python-memcached. Also would this flag cause any issues since two folks can use the same flag but different compression algorithms?

Also I wonder what compatibility we still have with python-memcached before this change and now.

Copy link
Collaborator

@jparise jparise Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, a way to handle a generally compressed values with different compressors is to introduce a prefix.

serde = CompressedSerde(..., prefix="z")

... and then prepend that prefix on serialization (e.g. value = prefix + "|" + value). On deserialization, validate and strip the prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python-memcached used zlib by default but I don't think at this point there is any chance of compatibility since we are supporting multiple versions of pickle and we wouldn't be able to guarantee versions across the serialization.

I like the idea of adding a prefix on there so we identify which compressor was used. In the end the client is only going to have one (via serde) so its not going to be able to handle decompressing/compressing multiple versions but we could at least raise an error that says "Found keys compressed with a different serializer".

I don't know if we do this today though. For example, we don't track pickle versions so we can select the correct one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jparise @jogo How strongly are you feeling that you want to be able to identify the compression algorithm between clients that are configured with different serde? I have a feeling its not that high of a priority since we don't do that today with the pickle serializer but wanted to check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to do it properly we'd have to do what we discussed before, which I'm not really a fan of:

#53 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not a high priority now. I mainly wanted to explicitly make sure we weren't trying to keep any python-memcached compatibility.

pymemcache/test/test_serde.py Show resolved Hide resolved
@sontek
Copy link
Contributor Author

sontek commented Jul 15, 2022

Here is how I chose the minimum_length to use as a default:

serializer=zlib9	 start=380	 end=356
serializer=zstd 	 start=380	 end=375
serializer=zlib 	 start=381	 end=358
serializer=zlib9	 start=381	 end=358
serializer=zstd 	 start=381	 end=377
serializer=zlib 	 start=382	 end=359
serializer=zlib9	 start=382	 end=359
serializer=zstd 	 start=382	 end=379
serializer=zlib 	 start=383	 end=359
serializer=zlib9	 start=383	 end=359
serializer=zstd 	 start=383	 end=380
serializer=zlib 	 start=384	 end=360
serializer=zlib9	 start=384	 end=360
serializer=zstd 	 start=384	 end=378
serializer=zlib 	 start=385	 end=362
serializer=zlib9	 start=385	 end=362
serializer=zstd 	 start=385	 end=381
serializer=zlib 	 start=386	 end=364
serializer=zlib9	 start=386	 end=364
serializer=zstd 	 start=386	 end=383
serializer=zlib 	 start=387	 end=363
serializer=zlib9	 start=387	 end=363
serializer=zstd 	 start=387	 end=380
serializer=zlib 	 start=388	 end=364
serializer=zlib9	 start=388	 end=364
serializer=zstd 	 start=388	 end=384
serializer=zlib 	 start=389	 end=364
serializer=zlib9	 start=389	 end=364
serializer=zstd 	 start=389	 end=382
serializer=zlib 	 start=390	 end=365
serializer=zlib9	 start=390	 end=365
serializer=zstd 	 start=390	 end=383
serializer=zlib 	 start=391	 end=366
serializer=zlib9	 start=391	 end=366
serializer=zstd 	 start=391	 end=386
serializer=zlib 	 start=392	 end=367
serializer=zlib9	 start=392	 end=367
serializer=zstd 	 start=392	 end=385
serializer=zlib 	 start=393	 end=365
serializer=zlib9	 start=393	 end=365
serializer=zstd 	 start=393	 end=384
serializer=zlib 	 start=394	 end=368
serializer=zlib9	 start=394	 end=368
serializer=zstd 	 start=394	 end=387
serializer=zlib 	 start=395	 end=369
serializer=zlib9	 start=395	 end=369
serializer=zstd 	 start=395	 end=388
serializer=zlib 	 start=396	 end=374
serializer=zlib9	 start=396	 end=374
serializer=zstd 	 start=396	 end=390
serializer=zlib 	 start=397	 end=373
serializer=zlib9	 start=397	 end=373
serializer=zstd 	 start=397	 end=392
serializer=zlib 	 start=398	 end=373
serializer=zlib9	 start=398	 end=373
serializer=zstd 	 start=398	 end=389
serializer=zlib 	 start=399	 end=374
serializer=zlib9	 start=399	 end=374
serializer=zstd 	 start=399	 end=393
serializer=zlib 	 start=400	 end=374
serializer=zlib9	 start=400	 end=374
serializer=zstd 	 start=400	 end=390
serializer=zlib 	 start=401	 end=380
serializer=zlib9	 start=401	 end=380
serializer=zstd 	 start=401	 end=394
serializer=zlib 	 start=402	 end=381
serializer=zlib9	 start=402	 end=381
serializer=zstd 	 start=402	 end=393
serializer=zlib 	 start=403	 end=378
serializer=zlib9	 start=403	 end=378
serializer=zstd 	 start=403	 end=396
serializer=zlib 	 start=404	 end=380
serializer=zlib9	 start=404	 end=380
serializer=zstd 	 start=404	 end=394
serializer=zlib 	 start=405	 end=379
serializer=zlib9	 start=405	 end=379
serializer=zstd 	 start=405	 end=397
serializer=zlib 	 start=406	 end=384
serializer=zlib9	 start=406	 end=384
serializer=zstd 	 start=406	 end=397

@sontek sontek requested a review from jogo July 19, 2022 03:33
Copy link
Contributor

@jogo jogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +56 to +80
@pytest.fixture(scope="session")
def names():
names = []
for _ in range(15):
names.append(fake.name())

return names


@pytest.fixture(scope="session")
def paragraphs():
paragraphs = []
for _ in range(15):
paragraphs.append(fake.text())

return paragraphs


@pytest.fixture(scope="session")
def objects():
objects = []
for _ in range(15):
objects.append(CustomObject())

return objects
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these could be list comprehensions.

@jogo jogo merged commit 6b85dea into pinterest:master Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants