Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Standard Interchange Format for OpenTracing #64

Closed
tedsuo opened this issue Apr 6, 2017 · 33 comments
Closed

Standard Interchange Format for OpenTracing #64

tedsuo opened this issue Apr 6, 2017 · 33 comments
Labels

Comments

@tedsuo
Copy link
Member

tedsuo commented Apr 6, 2017

There is some debate of whether there should be standard or "official" OpenTracing wire format. Sidestepping that debate, a smaller and hopefully less contentious starting point would be an out-of-band format for sharing trace data.

This has come up repeatedly in the following areas:

  • allow OT-compliant tracers to import/extort data between each other.
  • enable more specialized analysis tools that do not amount to complete tracing solutions.
  • enable tooling for development and testing.

Ideal qualities for this interchange format:

  • Human readable.
  • Easy to parse in many languages.
  • Compresses reasonable well.

As a straw-man to kick it off, I propose a comma separated list of JSON-formatted spans.

What say you?

@devinsba
Copy link

devinsba commented Apr 6, 2017

I think a problem that we will run into with this is that the surface for a span is not fully defined by the interfaces that OT provides. Because of this there might be key data to a specific tracer that does not translate into the data that is defined in the OT span, making it nearly impossible to work with data from that tracer in a generic manner

@jpkrohling
Copy link

jpkrohling commented Apr 6, 2017

Just to bring some feedback: I heard about this more than once during CloudNativeCon. Even pcap has been used as an example on why that's needed, in the sense that there are tools specialized in reading and making sense of pcap, no matter which tool generated it.

In any case, I like the JSON idea, but I'd rather either have a valid JSON array, or a valid JSON stream format with line delimiters (which seems to be the common case in other stacks):

https://en.wikipedia.org/wiki/JSON_Streaming

A JSON array has the obvious disadvantage of requiring the final ] , which might not always be convenient to add.

@tedsuo
Copy link
Member Author

tedsuo commented Apr 6, 2017

@devinsba I think it's reasonable to reserve a keyspace for tracer specific data. The key thing is to have the information as entered via the OT API to be standardized in the exchange format.

In other words, since the API for OT is standardized; there should be a way for tools to inject or "replay" OT information as entered by an application. To that extent, I don't think there's anything missing or tracer-dependent.

@tedsuo
Copy link
Member Author

tedsuo commented Apr 6, 2017

@jpkrohling for JSON, the streaming standard is "comma separated objects", where the bounding array is implicit. So, something as simple as JSON.parse("["+data+"]") would work.

I don't want to just fall into JSON as a solution just because it's easy. For example, JSON is not particularly efficient to parse. But it sets the bar to a certain level.

@yurishkuro
Copy link
Member

the most critical part of tracing data is the causal relationships between different spans. They are expressed via span references, which are abstract in the OT APIs, leaving tracers to implement things like span/trace identity as they wish. Without that piece, the only non-controversial parts of the spans are the operation name, tags, and logs (duration is technically also implementation specific due to varied precisions).

@tedsuo
Copy link
Member Author

tedsuo commented Apr 6, 2017

Mmmmmm, okay I see that now.

Cutting scope even more, rather than a universal interchange format, adding a utility to all of the basic tracers that outputs in the same format would be sufficient for most of the use cases I've seen so far. From a clutter perspective, I would prefer the basic tracers add a logging facility over having a separate pile of "logtracers".

@gkumar7
Copy link

gkumar7 commented Apr 6, 2017

+100 on opening this issue. Some essential aspects of trace state are not defined clearly by the api. Naming conventions such as traceId, spanId will differ among tracers.

@tedsuo creating a utility which correlates the trace state of a given tracer to a standardized naming scheme/format would greatly help.

@yurishkuro
Copy link
Member

@tedsuo if the issue is only about basic tracers, then this repo is not the best place for it, since it's a discussion of a particular implementation, not specification.

There is a proposals for a standard Trace-Context HTTP header, which indirectly implies a standard for trace & span identity: https://github.com/TraceContext/tracecontext-spec/pull/1/files

If the trace/span identity issue is addressed, then the common JSON OpenTracing format becomes very possible.

@tedsuo
Copy link
Member Author

tedsuo commented Apr 7, 2017

Ok thanks, I see that. A common JSON format for exchanging traces is what this issue was originally created to address. I would like to pursue that as long as there is hope that it can be attained. I would hold off on adding anything to basic tracers as long as the identity issue is actively being investigated.

Also, from the point of view of establishing the keys for a JSON Span object, are their any fields other than span_id, trace_id and meta needed for identity?

{
"span_id": undefined,
"trace_id": undefined,
"meta": undefined,
"name": "",
"timestamp": 0,
"duration": 0,
"tags": {"": [{}]},
"logs": [{}],
"baggage": {"": [{}]},
}

Even if the values of span_id, trace_id, and meta were only consistent within a tracer implementation, something like this would still be a very useful level of standardization.

@gkumar7
Copy link

gkumar7 commented Apr 7, 2017

childOf and followsFrom would be needed to establish relationships. I am unfamiliar with a meta field, what is this used for? Possibly tags and baggage should simply be a dictionary of k,v ({})?

Baggage may be used to propagate encrypted information which should not be written in a textual format.

Something like:

{
"span_id": undefined,
"trace_id": undefined,
"meta": undefined,
"childof": "",
"followsfrom": "",
"name": "",
"timestamp": 0,
"duration": 0,
"tags": {},
"logs": [{}],
"baggage": undefined,
}

@wu-sheng
Copy link
Member

wu-sheng commented Apr 7, 2017

@tedsuo , this is the format in my open-source tracer implementation: https://github.com/wu-sheng/sky-walking/wiki/3.0-Collector-TraceSegment-in-JSON-format .

We use more fields for making the analysis easier. e.g. Topological graph of application clusters

And, we use a total different data structure to propagate in my commercial product, which is NO json and NO key, for higher performance in network.

After all, my implementation details are not very important. My point is Standard Interchange Format is nearly impossible for different requirement. Especially in spec level.

IMO, the spec should focus on the tracer-API-user level. Backend is much more complex, trace data can give the APM more information than only a Span-DAG(tree).

@gonewest818
Copy link

I tend to agree with @wu-sheng. Without expanding the standard, I would argue that, if you're going after testing it would be sufficient to have a "MockTracer" available in each language where the tracer's state can be queried before and after the tests are run. That mock tracer could certainly output JSON if desired. For integration testing it would be extremely helpful (but not strictly required) if the backend servers offered a consistent method of querying what trace data has been received.

@pavolloffay
Copy link
Member

I had the same idea about BasicTracers. They should use the same data and wire format. Tracing systems would be able to consume that format and declare themselves as basic tracer compatible.

@devinsba
Copy link

devinsba commented Apr 7, 2017

Assuming the Trace-Context gets added to the spec then the most data you could guarantee as things stand right now would be:

{
    "trace_context":"",
    "name":"",
    "tags":[],
    "logs":[]
}

@wu-sheng
Copy link
Member

wu-sheng commented Apr 8, 2017

Trace-Context is an attempt for zipkin team. It is a good way for every tracer implementation team. But in specification level, I highly doubt that.

And agree that we can set a data structure format for MockTracer. In that way, we can discuss a JSON-like( or JSON-stream ) format, because it is only for mock/test or demonstration.

Can we do that? If so, we can start a new issue, since this issue title and context are not right for that.

@tedsuo
Copy link
Member Author

tedsuo commented Apr 11, 2017

I originally had meta as a place for tracers to stick any additional tracer-specific information, but I'm dropping it.

I believe the following keyspace should work:

{
"trace_id": undefined, // values for trace_id and span_id are not currently standardized
"span_id": undefined,
"references": [
  {"type":"child_of","span_id":undefined}, // currently types are "child_of" and "follows_from"
],
"name": "operation type",
"timestamp": 1234567890123456789, // UTC Epoch Unix Timestamp in Microseconds
"duration": 300, // integer representing microseconds
"tags": {
  "tag_key": [{"a":"b"},{"c":"d"}], // can tags have multiple values in some systems? 
}, 
"logs": [
  {"timestamp":1234567890123456789, "a":"b"}, // logs optionally have a timestamp
],
"baggage": {"a":"b"},
}

Again, to clarify the purpose of this exercise: it should be possible to have a data format that captures what a user inputs via the OT API, which in turn allows another system to replay that input without having been present during instrumentation. This is why values such as trace_id and span_id are left undefined: the user does not input this information. Other values, such as timestamp and duration, count as implicit input.

I agree that tracers may collect information above and beyond what is present in the OT API, but to the degree to which the OT API is standard across languages, it should be possible to represent it in a loggable format.

@yurishkuro
Copy link
Member

@yurishkuro
Copy link
Member

Also, baggage is meant to propagate with the span context across services, but there's no requirement for baggage to be captured in the span, so I don't think it needs to be in the JSON

@davidB
Copy link

davidB commented Apr 12, 2017

(I'm new to OT)
I target the same goal (simple local tracer) with the impl (spec) of logger tracer in https://github.com/opentracing-contrib/java-span-reporter. So if a "base" format emerge, I'll use it.
today the current impl report json in log:

19:48:36.841 [main] TRACE tracer - {"ts":1490982516841,"elapsed":13,"spanId":"c4bf987f-2d0c-49f1-87bd-a23930570b96","action":"finish","operation":"span-2","tags":{"description":"the second inner span in the original process"},"baggage":{},"references":{"child_of":"da7d843a-4564-4480-a6cb-19550c24a344"}}

@yurishkuro,

  • if tags allows duplicate key (or multi value string), it is not clear and could generate lot of confusion.
  • trace_id is not part of OT, or I miss it ?

@wu-sheng
Copy link
Member

@davidB trace_id is not part of OT, because it is based on tracer implementation codes. Basically, it is usually an encode string, which generate by some rules or random.

@yurishkuro
Copy link
Member

if tags allows duplicate key (or multi value string), it is not clear and could generate lot of confusion.

    The less a man makes declarative statements 
    the less apt he is to look foolish in retrospect.
        Quentin Tarrentino, From the film 'Four Rooms'

In other words, don't say that it's a map / dictionary, especially a multi-map, and there is no confusion. In our Thrift data model we define tags as a list

   10: optional list<Tag>     tags

None of my users have ever been confused by that, nor was it an issue for the backend.

trace_id is not part of OT, or I miss it ?

Correct, but as I mentioned in the earlier comments, (a) you cannot represent SpanReference without it, and (b) as opaque string values it is not so far fetched concept, and the ongoing Trace-Context standardization effort even gives it a more concrete shape.

@davidB
Copy link

davidB commented Apr 12, 2017

the API is setTag (or withTag), not addTag:

Span.setTag(tag, value)

I expect that at end of

span.setTag("tagA", "v0")
span.setTag("tagA", "v1")

the value of tagA to be "v1", not "v0" or ["v0", "v1"].

With

[
{"tagA", "v0"},
{"tagA", "v1"}
]

I don't know how to interpret/find the tagA.

@yurishkuro
Copy link
Member

This is not a matter of the API, but of the data model. I have run an internal poll among our JS developers and they were strongly in favor of using arrays of values instead of dictionaries when developing a JSON API for traces. And with logs the API (e.g. in Go) does allow multiple fields with the same key.

In terms of quality of data, if you simply need to search for spans by a tag, then it doesn't matter if the span has one or more values with the same key. If you want to infer certain information about the span based on a tag, say "http.url", then you already have the same problem because even though the API is called SetTag (as in "set once"), instrumentation can still call it multiple times, so how do you know whether the last value is better than the first? I'd argue it's a bug in the implementation and it should only set meaningful tags once - but that's not a question of the backend data model anymore.

In short, I prefer the data model to be lossless, rather than forcing some decisions that can be deferred.

@gkumar7
Copy link

gkumar7 commented Apr 12, 2017

@yurishkuro It is interesting that the golang API implementation has a function setTag(key String, value interface{}) essentially allowing for any type of tag. On the other hand, the Java API is more restrictive by only allowing for String, Boolean, and Number as datatypes. I agree that providing the datatype in the tag would help parsing at the other end.

On the other hand, shouldn't language implementations follow the spec? The spec defines tags as a mapping of k,v pairs which could be taken to mean that tags are a hashmap or tracing libraries may have a List of tags. Maybe the wording in the spec could be more specific?

Nevertheless, if we standardize an interchange format in this issue, it wouldn't be difficult for each language-specific library to provide a utility to output this format.

@davidB
Copy link

davidB commented Apr 13, 2017

You did a poll, so it's a symptom of ambiguity.
ambiguity (or deferred) response lead to variation in the implementation and incompatibility.
I'm fine with any solution (last value, first value, multi-value), but not to have ambiguity.

setTag("debug", true)
setTag("debug", false)

Today, the main issue I have with OT, is that only setter and no getter so:

  • we have ambiguity like tag (not possible to define properties of the system, in the sens of https://en.wikipedia.org/wiki/QuickCheck)
  • we have to implement workaround to read data context in the app, or to duplicate the propagation of context (into app or OT complementary lib).

So if tags is a append only list, it should be in OT, and reflected in the interchange format.

@bhs
Copy link
Contributor

bhs commented Apr 14, 2017

(meta: I wish I had more time and have been meaning to say something on this thread)

In general, my biggest question is about the goal(s) for the work described in this issue. I have always been dissatisfied by the imprecision / colloquialism of the "formal" OpenTracing spec.md, and part of the implicit motivation here seems to be a more explicit definition of the data model.

That said, the specification is really describing a bunch of state transitions... if we wanted to make something more formal, it would need different "structs" for each method call. But it's problematic to define a single data model for the Span itself since it's completely valid (in fact, recently encouraged) to build Tracer implementations that intentionally drop some of the data: e.g., the discussions about taking the Tracer and sending a summary of tagged latencies to a metrics subsystem like prometheus or statsd.

One other motivation might be to have a simplified mental model of the OT data model that might be slightly imprecise but easier to grok than a multi-page specification document. That's a fine thing to want, and arguably the BasicTracer RawSpan structs provide just that sort of approximate mental model.

If we're really talking about what this issue wants, by name, I am concerned because I don't understand the actual use cases for such an interchange format yet, and I would want the format to be driven by those use cases. The most compelling thing I am aware of would be an integration point for "simple" / straightforward pieces of infrastructure like linkerd, Envoy, haproxy, nginx, etc: they are awkward to integrate with from an instrumentation standpoint, but very desirable to integrate from an operational standpoint since they have a lot of visibility and a common surface area across large distributed systems. True interop between, say, Zipkin and another tracing system seems way further-fetched but would have its own complex requirements.

Anyway, TL;DR: what are we designing for here? A more formal spec? An informal mental model? Linkerd and friends? Something else?

@cwe1ss
Copy link
Member

cwe1ss commented Feb 1, 2018

fyi: The mentioned Trace-Context project has recently been moved and is now a W3C project: https://github.com/w3c/distributed-tracing

@wu-sheng
Copy link
Member

wu-sheng commented Feb 5, 2018

@cwe1ss Just for clear, at recent days, Trace-Context is talking about the propagation format, rather than Span or Trace format.

@huntc
Copy link

huntc commented Feb 24, 2019

@tedsuo I realise it has been a while, but where did your thinking land? I would like to report spans as json for the easy consumption by a JS browser app (a different goal to the ones you listed). Your proposal, with @yurishkuro’s feedback, represents what I want. Losing the fidelity of a span is acceptable for my use case. Thanks for raising this issue.

@huntc
Copy link

huntc commented Feb 25, 2019

As a follow-up, here's a sample of my implementation inspired by this thread:

{
  "baggage": [],
  "duration": 11726,
  "logs": [
    {
      "fields": {},
      "message": "hello-world",
      "time": 0
    }
  ],
  "operationName": "some-span",
  "references": [],
  "spanId": 4866369132225493238,
  "start": 1551079498409000,
  "tags": {
    "sampler.type": "const",
    "sampler.param": "true"
  },
  "traceId": 4866369132225493238
}

@ghostsquad
Copy link

Any movement on this?

@yurishkuro
Copy link
Member

There is not going to be a resolution to this thread given that most development efforts have moved to OpenTelemetry, which does define a canonical data format https://github.com/open-telemetry/opentelemetry-proto/.

@ghostsquad
Copy link

@yurishkuro thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests