Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Kafka offset be used as messaging.message.id? #2971

Closed
mateuszrzeszutek opened this issue Nov 22, 2022 · 3 comments · Fixed by #2982
Closed

Should Kafka offset be used as messaging.message.id? #2971

mateuszrzeszutek opened this issue Nov 22, 2022 · 3 comments · Fixed by #2982
Assignees
Labels
area:semantic-conventions Related to semantic conventions semconv:messaging spec:trace Related to the specification/trace directory

Comments

@mateuszrzeszutek
Copy link
Member

What are you trying to achieve?

Currently the Go Kafka instrumentations sets the record offset as the value of the messaging.message_id attribute:

func finishProducerSpan(span trace.Span, partition int32, offset int64, err error) {
	span.SetAttributes(
		semconv.MessagingMessageIDKey.String(strconv.FormatInt(offset, 10)),
		semconv.MessagingKafkaPartitionKey.Int64(int64(partition)),
	)

I was requested to introduce a similar change to the Java instrumentation, but I'm not sure whether this is valid: an offset is only unique in a partition (which has a dedicated messaging.kafka.message.partition attribute), so I don't think that it is

A value used by the messaging system as an identifier for the message, represented as a string

as the spec describes this attribute; because you also need the partition id to identify the message, so it's half of an identifier really.

Should the Kafka offset be set as the value of the messaging.message.id attribute? Or, alternatively, should we introduce a messaging.kafka.message.offset attribute?

Additional context.

@mateuszrzeszutek mateuszrzeszutek added the spec:trace Related to the specification/trace directory label Nov 22, 2022
@pyohannes pyohannes added area:semantic-conventions Related to semantic conventions semconv:messaging labels Nov 22, 2022
@pyohannes
Copy link
Contributor

That's a good point, we'll discuss it in the next messaging workgroup meeting (which will take next week).

@lmolkova
Copy link
Contributor

message is a common JMS and AMQP term and afaik it's different from offset. For example, based on kafka-connect-jms Kafka can have message-id (but I don't know what it could mean for kafka broker. Also message id is a string and Kafka offset is a number.

For JMS/AMQP systems, message id could be used for deduplication and could be assigned by client. Kafka offset is assigned by broker and is available only on the consumer.

Azure messaging systems define message-id and sequence number and offset. And message id is used for deduplication on the producer side, sequence number/offset are assigned by broker.

It seems SQS and SNS support both sequence numbers and message ids.

With this, I believe mixing message.id and offset in one attribute would be confusing, and I prefer messaging.kafka.message.offset. If we, at some point, would need message.id, we can recommend constructing one for kafka from partition and offset, so it becomes unique within broker.

Since offset and/or sequence number are popular properties, we might also entertain the idea of creating generic attributes in messaging.message namespace for them, but I don't think it's a blocker for messaging.kafka.message.offset.

@pyohannes
Copy link
Contributor

We discussed this in the messaging workgroup, and we agree with what @lmolkova wrote above: the offset should not be used as message.id, because:

  • It suggests an unique identifier for a message, however, offset only uniquely identifies a message when it's combined with a partition key.
  • It creates confusion compared to other messaging systems that support both a unique message id and offset (and maybe an additional sequence number).
  • Even in certain Kafka usage scenarios (e. g. Kafka combined with JMS) a message id can be populated from message attributes, that would conflict with using the offset as message.id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions semconv:messaging spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants