-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about span relationships in messaging semantic convention #1085
Comments
@iNikem This is related to issue #958. @anuraaga please have a look here.
I don't think the spec contradicts itself here in the examples. It simply states that in the batch receive scenario laid out it does not have access to the parent span context propagated along with the message in order to make it a parent of that span. In the batch processing but individual receive scenario, you might have access to it and if so, you can use it as the parent. Same for the first example with individual processing without separate receive spans.
Backends will still be able to determine the kind of relationship between the parents by looking at their span kinds and the messaging operation attribute, if any.
I don't see an issue with that. If a back end can still correlate it with the initial producing span, it should be fine but might cause issue in the UI when it tries to display the entire trace at once. If the back end does not store traces long enough or does not look back in time long enough, it will not know that parent and treat it accordingly. Would there be any alternative?
I'd say yes.
Child spans can be created after their parent has already ended - this is supported by our data model and API. A message consumer span can also be the child of a message producer span, for example, despite the producer span likely being ended already at the time the message is consumed. |
Apologies if it's not, but this gets to concerns I have had around messaging systems like Kafka and how a span should represent the work. From my understanding of spans in such a scenario, the link between a PRODUCER and CONSUMER is more a "Follows From" and not "Parent-Child" relationship. Whatever action in a PRODUCER causes a message to be sent can be more of an indirect consequence and not a direct one. For me, "Parent-Child" should only apply when there is a direct relationship. For instance, if a message is produced to indicate 400 customers have gone overdrawn on their bank account in a day, it's not a direct cause of the 400th customer that went overdrawn, it's an indirect one. |
True. Should semantic convention guide us which of two options to choose: |
That's unfortunate @Oberon00, as I believe it helps resolve the ambiguity about what is and isn't a parent span |
I agree with @arminru; I don't see a contradiction here, though I think there's not enough clarity about the scenarios and use cases being covered. Here are my thoughts:
I'm with you so far. Especially if we're expecting that a "receive" operation pulls
Probably... But in your example, I think the currently active span might make more sense than the producer. There are some use cases that woule benefit from producer-parenting, and other use cases that would be more clear with active-span-parenting.
This may be the right way to distinguish the use cases I mentioned just above, but I'd feel more comfortable if we listed those use cases out explicitly.
This sounds good in the case that the "process" span does not already parent to the receive span. The remaining questions (That I see. Have I missed some?):
FinallyI'd like to add some nuance to the proposal, but some of the detail I think is pending a more complete analysis of use cases.
|
hey @kenfinnigan, looking at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/messaging.md#apache-kafka-example and I'm confused also. I would expect "Span Prod2" in that example to parent "Span Proc1" (and not have any links). And I would expect the desired picture for your example above to be: |
sound like a bug in the instrumentation |
Problem description
I noticed an inconsistency in how messaging semantic conventions describe parent-child relationships between producers and consumers.
First example says that consumers should have message producer as their parent. But second example says that
and thus we don't have parent-child relation between consumer spans and producers. Also there is no indication what parent should "Receive" span have.
And then third example contradicts it again and tell receiving spans to have message producers as their parent. Why second example couldn't have parents and this one can? And how shall backends handle such convention which essentially say "everything can happen"?
Analysis
Problem with old remote parent
First I would like to start a discussion about having message producer as a parent span for message consumers in an async case. Although it certainly does make sense in the majority of cases of streaming or event-based platforms, I wonder if this makes sense in situations, when we can reasonably expect to process old messages. Like in messages reprocessing in event-source systems or lambda-architectures (or kappa). If there is a gap of several days (or weeks) between consumer span and producer span, does it still make sense to have producer span as a parent?
Pull vs push consumption
I see two different ways to consume messages. Push-based consumption uses some kind of listener mechanism and focuses on message processing. Message retrieval is done by framework and in general of no interest to us. Pull-based consumption has an explicit step of "requesting" or "receiving" the message and then processing it.
Despite the problem with old remote parents, I think that in push-based systems the only sensible relationship between message producing and its processing is the parent-child one. Message processing is done in the context of some retrieve/process cycle of the framework which we don't want to see as the parent.
In pull-based system the question "what span should be the parent of whom" is not that clear to me. Somewhere in my application's code there is something similar to the following pseudo-code:
I want to see a span for
pollNext
execution, because it may block while waiting for messages. This span, following semantic convention, will be "receive" span. Depending on API,pollNext
can always return a single message or it may return a collection as in the example above. In the latter case it cannot have remote span as the parent. In the former it may. Convention's argument "the propagated trace and span IDs are not known when the receiving span is started" may be relevant or not, because we can delay starting the span until message is received and remote parent is known. If we don't use remote span as a parent for any reason, should we use currently active span (at the moment ofpollNext
invocation) as a parent? Probably yes?Then we have zero, one or more
process
method executions. Each one of them should result in a "process" span. What span should be the parent of those? I argue thatpollNext
cannot be their parent, because it has already ended (but convention tells us to do exactly this). Should they have remote producer span as a parent? Should they have currently active span as a parent? Although former case seems sensible, it will result in a big chunk of unaccounted time in the currently active span. Or is it enough to link from process spans to the current span?Proposal
Spans corresponding to "receive" operation SHOULD always use implicitly selected currently active span as a parent.
Spans corresponding to "process" operation of a single message SHOULD use the producer span as a parent. In case when "process" span corresponds to the processing of several messages at once (batch processing), producer spans SHOULD NOT be used as parents and implicitly selected parent MAY be used.
If possible "process" spans SHOULD link to the corresponding "receive" span(s).
The text was updated successfully, but these errors were encountered: