add pluggable compression serde #407

sontek · 2022-07-12T05:13:19Z

This adds a new class CompressedSerde which allows you to use it as an entrypoint to compression. It defaults to using zlib and has immediate benefits on the size of messages stored in memcached but it also has compress and decompress as pluggable points if you want to use a stronger compression algorithm like lz4.

.gitignore

pymemcache/serde.py

jogo · 2022-07-12T18:28:25Z

pymemcache/serde.py


 FLAG_BYTES = 0
 FLAG_PICKLE = 1 << 0
 FLAG_INTEGER = 1 << 1
 FLAG_LONG = 1 << 2
-FLAG_COMPRESSED = 1 << 3  # unused, to main compatibility with python-memcached


Any insight into how this flag still maps back to python-memcached. Also would this flag cause any issues since two folks can use the same flag but different compression algorithms?

Also I wonder what compatibility we still have with python-memcached before this change and now.

Relatedly, a way to handle a generally compressed values with different compressors is to introduce a prefix.

serde = CompressedSerde(..., prefix="z")

... and then prepend that prefix on serialization (e.g. value = prefix + "|" + value). On deserialization, validate and strip the prefix.

python-memcached used zlib by default but I don't think at this point there is any chance of compatibility since we are supporting multiple versions of pickle and we wouldn't be able to guarantee versions across the serialization.

I like the idea of adding a prefix on there so we identify which compressor was used. In the end the client is only going to have one (via serde) so its not going to be able to handle decompressing/compressing multiple versions but we could at least raise an error that says "Found keys compressed with a different serializer".

I don't know if we do this today though. For example, we don't track pickle versions so we can select the correct one

@jparise @jogo How strongly are you feeling that you want to be able to identify the compression algorithm between clients that are configured with different serde? I have a feeling its not that high of a priority since we don't do that today with the pickle serializer but wanted to check

I think to do it properly we'd have to do what we discussed before, which I'm not really a fan of:

#53 (comment)

I agree it's not a high priority now. I mainly wanted to explicitly make sure we weren't trying to keep any python-memcached compatibility.

pymemcache/test/test_serde.py

sontek · 2022-07-15T22:33:19Z

Here is how I chose the minimum_length to use as a default:

serializer=zlib9	 start=380	 end=356
serializer=zstd 	 start=380	 end=375
serializer=zlib 	 start=381	 end=358
serializer=zlib9	 start=381	 end=358
serializer=zstd 	 start=381	 end=377
serializer=zlib 	 start=382	 end=359
serializer=zlib9	 start=382	 end=359
serializer=zstd 	 start=382	 end=379
serializer=zlib 	 start=383	 end=359
serializer=zlib9	 start=383	 end=359
serializer=zstd 	 start=383	 end=380
serializer=zlib 	 start=384	 end=360
serializer=zlib9	 start=384	 end=360
serializer=zstd 	 start=384	 end=378
serializer=zlib 	 start=385	 end=362
serializer=zlib9	 start=385	 end=362
serializer=zstd 	 start=385	 end=381
serializer=zlib 	 start=386	 end=364
serializer=zlib9	 start=386	 end=364
serializer=zstd 	 start=386	 end=383
serializer=zlib 	 start=387	 end=363
serializer=zlib9	 start=387	 end=363
serializer=zstd 	 start=387	 end=380
serializer=zlib 	 start=388	 end=364
serializer=zlib9	 start=388	 end=364
serializer=zstd 	 start=388	 end=384
serializer=zlib 	 start=389	 end=364
serializer=zlib9	 start=389	 end=364
serializer=zstd 	 start=389	 end=382
serializer=zlib 	 start=390	 end=365
serializer=zlib9	 start=390	 end=365
serializer=zstd 	 start=390	 end=383
serializer=zlib 	 start=391	 end=366
serializer=zlib9	 start=391	 end=366
serializer=zstd 	 start=391	 end=386
serializer=zlib 	 start=392	 end=367
serializer=zlib9	 start=392	 end=367
serializer=zstd 	 start=392	 end=385
serializer=zlib 	 start=393	 end=365
serializer=zlib9	 start=393	 end=365
serializer=zstd 	 start=393	 end=384
serializer=zlib 	 start=394	 end=368
serializer=zlib9	 start=394	 end=368
serializer=zstd 	 start=394	 end=387
serializer=zlib 	 start=395	 end=369
serializer=zlib9	 start=395	 end=369
serializer=zstd 	 start=395	 end=388
serializer=zlib 	 start=396	 end=374
serializer=zlib9	 start=396	 end=374
serializer=zstd 	 start=396	 end=390
serializer=zlib 	 start=397	 end=373
serializer=zlib9	 start=397	 end=373
serializer=zstd 	 start=397	 end=392
serializer=zlib 	 start=398	 end=373
serializer=zlib9	 start=398	 end=373
serializer=zstd 	 start=398	 end=389
serializer=zlib 	 start=399	 end=374
serializer=zlib9	 start=399	 end=374
serializer=zstd 	 start=399	 end=393
serializer=zlib 	 start=400	 end=374
serializer=zlib9	 start=400	 end=374
serializer=zstd 	 start=400	 end=390
serializer=zlib 	 start=401	 end=380
serializer=zlib9	 start=401	 end=380
serializer=zstd 	 start=401	 end=394
serializer=zlib 	 start=402	 end=381
serializer=zlib9	 start=402	 end=381
serializer=zstd 	 start=402	 end=393
serializer=zlib 	 start=403	 end=378
serializer=zlib9	 start=403	 end=378
serializer=zstd 	 start=403	 end=396
serializer=zlib 	 start=404	 end=380
serializer=zlib9	 start=404	 end=380
serializer=zstd 	 start=404	 end=394
serializer=zlib 	 start=405	 end=379
serializer=zlib9	 start=405	 end=379
serializer=zstd 	 start=405	 end=397
serializer=zlib 	 start=406	 end=384
serializer=zlib9	 start=406	 end=384
serializer=zstd 	 start=406	 end=397

jogo

Thanks!

jparise · 2022-07-22T16:27:12Z

pymemcache/test/test_compression.py

+@pytest.fixture(scope="session")
+def names():
+    names = []
+    for _ in range(15):
+        names.append(fake.name())
+
+    return names
+
+
+@pytest.fixture(scope="session")
+def paragraphs():
+    paragraphs = []
+    for _ in range(15):
+        paragraphs.append(fake.text())
+
+    return paragraphs
+
+
+@pytest.fixture(scope="session")
+def objects():
+    objects = []
+    for _ in range(15):
+        objects.append(CustomObject())
+
+    return objects


nit: these could be list comprehensions.

add pluggable compression serde

fd7a6de

sontek requested review from cgordon, jogo and jparise as code owners July 12, 2022 05:13

jparise assigned jogo Jul 12, 2022

jogo reviewed Jul 12, 2022

View reviewed changes

sontek added 3 commits July 15, 2022 18:14

add some compression benchmarks

0fd419f

flake8

34407cf

fix mypy

431e7fa

set min_compress_len

dc325b2

sontek requested a review from jogo July 19, 2022 03:33

jogo approved these changes Jul 22, 2022

View reviewed changes

jparise approved these changes Jul 22, 2022

View reviewed changes

jogo merged commit 6b85dea into pinterest:master Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add pluggable compression serde #407

add pluggable compression serde #407

sontek commented Jul 12, 2022

jogo Jul 12, 2022

jparise Jul 12, 2022 •

edited

Loading

sontek Jul 13, 2022

sontek Jul 15, 2022

sontek Jul 15, 2022

jogo Jul 22, 2022

sontek commented Jul 15, 2022

jogo left a comment

jparise Jul 22, 2022

add pluggable compression serde #407

add pluggable compression serde #407

Conversation

sontek commented Jul 12, 2022

jogo Jul 12, 2022

Choose a reason for hiding this comment

jparise Jul 12, 2022 • edited Loading

Choose a reason for hiding this comment

sontek Jul 13, 2022

Choose a reason for hiding this comment

sontek Jul 15, 2022

Choose a reason for hiding this comment

sontek Jul 15, 2022

Choose a reason for hiding this comment

jogo Jul 22, 2022

Choose a reason for hiding this comment

sontek commented Jul 15, 2022

jogo left a comment

Choose a reason for hiding this comment

jparise Jul 22, 2022

Choose a reason for hiding this comment

jparise Jul 12, 2022 •

edited

Loading