Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reworked Distributed Tracing Extension #607

Merged
merged 4 commits into from
May 9, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 25 additions & 19 deletions extensions/distributed-tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

This extension embeds context from
[Distributed Tracing](https://w3c.github.io/trace-context/) so that distributed
systems can include traces that span an event-driven system. This is the
foundation of many other systems, such as Open Tracing, on which platforms like
Prometheus are built.
systems can include traces that span an event-driven system. This extensions is meant
to contain historical data of the parent trace, in order to diagnose eventual failures of the
systems through tracing platforms like Prometheus.
slinkydeveloper marked this conversation as resolved.
Show resolved Hide resolved

## Attributes

Expand All @@ -24,25 +24,31 @@ Prometheus are built.
- Constraints
- OPTIONAL

## HTTP encoding
## Using the Distributed Tracing Extension

To integrate with existing tracing libraries, the Distributed Tracing attributes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2p and take it for what its worth. I don't see the update of this section bringing clarity the linked specification doesn't already cover. Maybe just underscore that the headers themselves are only the context of the trace.

Other event attributes may be helpful to add to trace data (or even some as log correlations or metrics dimensions), but specifically how to do log correlation, metrics or tracing is out of scope for this spec.

I'm aware that mentioning Zipkin may not be ok here, but if you did there's a lot of existing art with messaging tracing. Your call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing is out of scope for this spec.

That's what we're trying to establish here 😄 The reason of this PR is exactly because somebody thought that the goal of this spec extension is to enable tracing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If the goal is to enable tracing, it seems limiting to only discuss the tracecontext spec. many years of tools, thousands of users are familiar with B3. You might consider linking to this also, if adoption is a concern. https://github.com/openzipkin/b3-propagation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another reason why i feel we should not touch the distributed tracing argument at all, otherwise we have a clear overlap with what openzipkin/opentelemetry/w3c is doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that specifically how to do log correlation, metrics, and tracing are out of scope of this extension. That does not imply, however, that trace context propagation is out of scope. Since this extension is modeled after the W3C trace propagation format it seems that trace context propagation must be in-scope here. Nor do I think that adding a trace propagation mechanism for events conflicts with those other projects especially given that they are unlikely to define trace propagation standards either for cloudevents or for all possible eventing protocols.

MUST be encoded over HTTP(S) as headers. E.g.
The Distributed Tracing Extension is not intended to replace the protocol specific headers for tracing,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more accurate to say that it is intended to function like the HTTP tracecontext headers in non-HTTP protocols or for event storage.

like the ones described in [W3C Trace Context](https://w3c.github.io/trace-context/) for HTTP.

Given a single hop event transmission (from sink to source directly), the Distributed Tracing Extension,
if used, MUST carry the same trace information contained in protocol specific tracing headers.

Given a multi hop event transmission, the Distributed Tracing Extension, if used, MUST
carry the trace information of the starting trace of the transmission.
In other words, it MUST NOT carry trace information of the actual hop, since these information are usually
slinkydeveloper marked this conversation as resolved.
Show resolved Hide resolved
carried using protocol specific defined headers, understood by tools like [OpenTelemetry](https://opentelemetry.io/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many protocols do not define a trace propagation mechanism, hence the need for this extension as the eventing trace propagation standard. If this extension is changed to specify that it should not track the actual trace context, then we would need a new protocol agnostic standard for trace propagation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure cloudevents is the right place where this standardization should happen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case where do you propose that this happens?


Middleware between the source and the sink of the event could eventually add a Distributed Tracing Extension
if the source didn't included any, in order to provide to the sink the starting trace of the transmission.

An example with HTTP:

```bash
CURL -X POST example/webhook.json \
-H 'ce-id: 1' \
-H 'ce-specversion: 1.0' \
-H 'ce-type: example' \
-H 'ce-source: http://localhost' \
-H 'ce-traceparent: 00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01' \
-H 'traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01' \
-H 'tracestate: rojo=00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01,congo=lZWRzIHRoNhcm5hbCBwbGVhc3VyZS4` \
-H 'content-type: application/cloudevents+json' \
-d '@sample-event.json'
-H 'tracestate: rojo=00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01,congo=lZWRzIHRoNhcm5hbCBwbGVhc3VyZS4`
```

## Conflicts

Since this extension defines secondary, special, serialization that differs
from other CloudEvents attributes, it is possible that the values of these two
could differ by the time the event is received at a destination. In those
cases, the serialization that followed the "general CloudEvents serialization
rules" MUST be used as the CloudEvents attribute. The other, secondary,
mapping MAY be picked-up and offered to the receiving application as
"additional" metadata.