Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The serialization layer is unexpectedly processed before the producer's partitioning logic #112

Open
wbarnha opened this issue Mar 8, 2024 · 0 comments

Comments

@wbarnha
Copy link
Owner

wbarnha commented Mar 8, 2024

sample code pulled from one of our internal applications:

# kafka_producer is configured with:
#    "key_serializer": json.dumps,
#    "value_serializer": json.dumps,

key = None  # None produces round-robin
if Const.FIELD_USER in message:
    key = message[Const.FIELD_USER]
kafka_producer.send(topic, key=key, value=message)

Unsurprisingly, using json.dumps will serialize key=None to 'null'.

Surprisingly, this results in key=None behaving as if it were a keyed message and always being sent to a single partition rather than round-robining.

This is because the serialization layer is processed before the partitioning logic. So by the time https://github.com/dpkp/kafka-python/blob/1.4.4/kafka/partitioner/default.py#L24 is hit, the key is already the string 'null'.

I found this extremely surprising... at a minimum we need to call this out in the docs.

Alternatively, we could offer default helpers that handle null keys/values (for deleting messages in compacted topics) in a less surprising way.

Related: dpkp#913.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant