Support Kafka record headers #1574

hnousiainen · 2018-08-14T20:27:15Z

Support Kafka record headers. Kafka 0.11.0 introduced the concept of record headers. Kafka-python record module already handles both decoding and encoding of headers. This PR implements the last remaining bits in supporting those headers for both consumer and producer operations.

This change is

tvoinarovskyi · 2018-08-20T12:45:25Z

test/test_producer.py

@@ -116,6 +122,8 @@ def test_kafka_producer_proper_record_metadata(kafka_broker, compression):

    assert record.serialized_key_size == 10
    assert record.serialized_value_size == 12
+    if headers:
+        assert record.serialized_header_size == 22


Can we check exact header data here. I want to be sure we have str as keys and bytes as values.

The record here refers to the FutureRecordMetadata / RecordMetadata that doesn't carry the actual stored values.

I've verified this manually for both Python 3.6 and 2.7 with a producer & consumer running through actual Kafka service instance. The types are str, bytes for 3.6 and unicode, str/bytes for 2.7.

tvoinarovskyi · 2018-08-20T12:46:17Z

Seems good to me. Restarted hung travis builds.

tvoinarovskyi · 2018-08-20T12:47:37Z

1 moment to be sure is that the code handles both 2.7 and 3.6 python bytes vs str differences. The parser should handle those properly, I think i made it do it.

tvoinarovskyi · 2018-08-20T12:48:06Z

And, thanks for working on this!)

hnousiainen · 2018-08-24T10:04:26Z

Thanks!

I've tested both Python 3.6 and Python 2.7 manually. I'll add one test on the parsing side in test/records/test_records.py.

hnousiainen · 2018-08-29T14:08:45Z

I added a commit with pair of positive tests on record append and record decode paths.

dpkp

radness!

dpkp · 2018-08-31T13:15:16Z

kafka/producer/kafka.py

@@ -530,6 +530,8 @@ def send(self, topic, value=None, key=None, partition=None, timestamp_ms=None):
                partition (but if key is None, partition is chosen randomly).
                Must be type bytes, or be serializable to bytes via configured
                key_serializer.
+            headers (optional): a list of header key value pairs. List items
+                are tuples of str key and bytes value.


Why not use a simple dict instead of a list of tuples? I think this would make the user interface much nicer! Do you think it would cause any problems to do it that way instead?

Probably because I defined it so on parser level. Do you happen to know if header structures support multiple keys, like http headers do? I'm kind of convinced they do.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers insists on 1) "duplicate headers with the same key must be supported" as well as 2) "The order of headers must be retained throughout a record's end-to-end lifetime: from producer to consumer"

I agree a dict would be a nicer interface, but cannot easily satisfy the original KIP-82 requirement.

dpkp · 2018-08-31T13:18:11Z

kafka/consumer/fetcher.py

@@ -456,10 +456,12 @@ def _unpack_message_set(self, tp, records):
                    value = self._deserialize(
                        self.config['value_deserializer'],
                        tp.topic, record.value)
+                    headers = record.headers


Is this also a list of tuples? If so, same question as below re: using dict. Also, where can we document this for users?

hnousiainen · 2018-09-05T09:06:59Z

Rebased + added simple examples in README.rst

tvoinarovskyi · 2018-09-05T10:52:35Z

@dpkp I would recommend merging this with list of tuples concept. It's easy to transform into the desired format by applying list(headers.items()) and dict(headers). Once we did that we can go further and extend serialize and deserialize to include headers. See https://kafka.apache.org/11/javadoc/org/apache/kafka/common/serialization/ExtendedSerializer.html
It will remain backward compatible with the same concept as values and keys defaulting to bytes

jeffwidman · 2018-09-27T21:47:10Z

After reading through the comment thread, it seems this is ready to be merged.

Any objections or further input @dpkp / @tvoinarovskyi ?

dpkp · 2018-09-27T21:49:57Z

Agree - list of tuples makes sense since that is how it is represented in the protocol. We can convert it to a dict or at a higher level if needed later.

…

On Thu, Sep 27, 2018, 2:47 PM Jeff Widman ***@***.***> wrote: After reading through the comment thread, it seems this is ready to be merged. Any objections or further input @dpkp <https://github.com/dpkp> / @tvoinarovskyi <https://github.com/tvoinarovskyi> ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1574 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAzetKUdzkp-Ff-jzlbpacCkEbWDRn92ks5ufUdggaJpZM4V9H3K> .

jeffwidman · 2018-09-27T22:22:21Z

Thanks for all your hard work @hnousiainen!

dpkp · 2018-09-27T22:24:26Z

Awesome!

hnousiainen force-pushed the htn_kafka_record_headers branch 3 times, most recently from bef4f06 to 47e66d6 Compare August 15, 2018 09:14

tvoinarovskyi approved these changes Aug 20, 2018

View reviewed changes

hnousiainen force-pushed the htn_kafka_record_headers branch from 47e66d6 to 6ebdf28 Compare August 29, 2018 14:06

dpkp approved these changes Aug 31, 2018

View reviewed changes

dpkp mentioned this pull request Aug 31, 2018

Kafka Message Headers support #1502

Closed

hnousiainen added 3 commits September 5, 2018 12:04

Add positive tests for headers in record encode/decode

e372f89

Expose record headers in ConsumerRecords

d50ddbe

Support produce with Kafka record headers

5779039

hnousiainen force-pushed the htn_kafka_record_headers branch from 6ebdf28 to 5779039 Compare September 5, 2018 09:06

jeffwidman merged commit 08c7749 into dpkp:master Sep 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Kafka record headers #1574

Support Kafka record headers #1574

hnousiainen commented Aug 14, 2018 •

edited by dpkp

Loading

tvoinarovskyi Aug 20, 2018

hnousiainen Aug 24, 2018

tvoinarovskyi commented Aug 20, 2018

tvoinarovskyi commented Aug 20, 2018

tvoinarovskyi commented Aug 20, 2018

hnousiainen commented Aug 24, 2018

hnousiainen commented Aug 29, 2018

dpkp left a comment

dpkp Aug 31, 2018

tvoinarovskyi Aug 31, 2018

hnousiainen Sep 5, 2018

dpkp Aug 31, 2018

hnousiainen commented Sep 5, 2018

tvoinarovskyi commented Sep 5, 2018 •

edited

Loading

jeffwidman commented Sep 27, 2018

dpkp commented Sep 27, 2018 via email

jeffwidman commented Sep 27, 2018

dpkp commented Sep 27, 2018 via email

Support Kafka record headers #1574

Support Kafka record headers #1574

Conversation

hnousiainen commented Aug 14, 2018 • edited by dpkp Loading

tvoinarovskyi Aug 20, 2018

Choose a reason for hiding this comment

hnousiainen Aug 24, 2018

Choose a reason for hiding this comment

tvoinarovskyi commented Aug 20, 2018

tvoinarovskyi commented Aug 20, 2018

tvoinarovskyi commented Aug 20, 2018

hnousiainen commented Aug 24, 2018

hnousiainen commented Aug 29, 2018

dpkp left a comment

Choose a reason for hiding this comment

dpkp Aug 31, 2018

Choose a reason for hiding this comment

tvoinarovskyi Aug 31, 2018

Choose a reason for hiding this comment

hnousiainen Sep 5, 2018

Choose a reason for hiding this comment

dpkp Aug 31, 2018

Choose a reason for hiding this comment

hnousiainen commented Sep 5, 2018

tvoinarovskyi commented Sep 5, 2018 • edited Loading

jeffwidman commented Sep 27, 2018

dpkp commented Sep 27, 2018 via email

jeffwidman commented Sep 27, 2018

dpkp commented Sep 27, 2018 via email

hnousiainen commented Aug 14, 2018 •

edited by dpkp

Loading

tvoinarovskyi commented Sep 5, 2018 •

edited

Loading