Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

RFC: far fewer types (but comparable portability) #40

Merged
merged 11 commits into from
Jan 27, 2016
Merged

RFC: far fewer types (but comparable portability) #40

merged 11 commits into from
Jan 27, 2016

Conversation

bhs
Copy link
Contributor

@bhs bhs commented Jan 24, 2016

This is not for merging as-is – the standardtracer package is a mess, for example – but is definitely worth looking at and thinking about.

The goal here was to experiment with whether some of the formality around TraceContext could be removed from the API entirely. In my eyes, TraceContext (or SpanContext if we were to call it that -- whatever) is a worthwhile concept for an OpenTracing implementation, but I am not so sure it needs to leak out as an abstraction at the top level (as it has to date).

Also, this should be obvious: if we decide to go with this approach, it will affect all languages. I'm starting with Go because (a) Go makes it easy to think about interfaces without getting bogged down in boilerplate, and (b) there are working examples I could compile against as a sanity check.

I'll make some other comments inline.

Some people who might care about this: @yurishkuro @adriancole @michaelsembwever @slimsag @dankosaur @bcronin

func xyz(goCtx context.Context, ...) {
...
sp, goCtx := opentracing.JoinTrace("span_name", goCtx).AddToGoContext(goCtx)
goCtx, sp := opentracing.ContextWithSpan(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I was just tinkering with whether Span really needed that AddToGoContext helper method... the change here is not related to the main event in this PR.

PropagateSpanAsBinary(
sp Span,
) (
traceContextID []byte,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to bury the notion of "trace context" completely, we may want to refer to this argument as spanID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, that was just an oversight. for sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiredAttrs? bc if the intent is for these not to be dropped, but also fixed length, at least zipkin expects the same few attrs propagated. trace id, span id, parent id, and sampled. While we can argue about whether parent id can be sent, in reality the collectors make no attempt to look up a parent id when missing, and lack of parent id assumes the span is a root span. if we don't propagate sampled, we can royally muck up span trees as downstream might choose to not retain when the parent does, or visa versa.

Since we describe this as "core identifying information", implementations should be able to decide what that is. Ex. "user-id" certainly isn't, and even if some may not like what zipkin has as core propagation tags, they are crucial for bug prevention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriancole @yurishkuro it's more than requiredAttrs... there are other "required" parts of a Span that don't get sent in-band during propagation (the operation name, the timing info, etc).

Maybe we should use language consistent with whatever we end up calling JoinTraceFrom(Binary|Text). It is borderline tautological, but the truest way to describe this field is that it's the minimum amount of state needed to join back to the trace on the other side of the propagation boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I went with

    contextSnapshot []byte,
    traceAttrs []byte,

for now... happy to take suggestions!

@yurishkuro
Copy link
Member

@bensigelman good job! I like this direction, makes the API simpler to use, and solves the "contains" vs. "identified by" conundrum.

There is one edge case that now won't work (the "debug" trace), but it wasn't very elegant in the previous version either, so we can think about it separately. Maybe once we extend the API to allow externally provided timestamps, the debug use case will be solved there as well.

+1

@bhs
Copy link
Contributor Author

bhs commented Jan 24, 2016

@yurishkuro, would the debug case be satisfied by a tag map made available at Span initialization time? Agreed that such a thing could probably be folded in to the forthcoming API for after-the-fact span recording...

@yurishkuro
Copy link
Member

would the debug case be satisfied by a tag map made available at Span initialization time?

It could be, but as we've seen it also makes for an ugly api (in some languages), and the use case for pre-construction tags is pretty weak, so I wouldn't rush there.

// Make sure that global trace tag propagation works.
span.TraceContext().SetTraceAttribute("User", os.Getenv("USER"))
span.SetTraceAttribute("User", os.Getenv("USER"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't mean to distract the issue, but I will 📦

I really like Tag Sets where propagation is simply a property of a tag. The "Trace" part of the name "TraceAttribute" has resulted in heaps of discussion, notably as no-one can guess by its name that it a propagation tag, but also the common "is it a part of the trace?" Ex. in zipkin trace id, id, parent id, and sampled decision are "trace attributes", ie only one field has anything to do with the trace.

I really would rather we be able to deconflate this whole thing by saying Span.setPropagationTag("User", os.Getenv("USER")), or worse but still better Span.setPropagationAttribute("User", os.Getenv("USER")). I have consistently found TraceAttribute an unnecessarily distracting term.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. "TraceAttribute" implies something that applies to the root, or to all parent Spans as well as child Spans.

Another suggestion for naming:

Span.setCascadedTag("User", os.Getenv("USER"))

It's an attribute that child tags will also inherit, regardless of network/process boundaries.

I find the whole "Distributed Context Propagation" at the API level, has grown into an unneeded (psychological) schema.

To the user of the API it can be simplified just as an attribute the casades, even over the network/process boundaries.

( The "Distributed Context Propagation" schema is still valuable in Specification though. )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, too, am unnerved by the naming around "Trace Attributes"... if we pursue the basic model of this PR, something like SetCascadedTag is worth considering seriously (since both the cascaded and "non-cascaded" tags apply to the same object now: the Span).

@adriancole @michaelsembwever @yurishkuro thoughts on something like the following three methods on Span?

    SetTag(string, BasicType)
    SetCascadingTag(string, BasicType)
    Tag(string) --> string

Some problems right off the bat:

  • If "cascading tags" (trace attributes) are to be used as HTTP headers without bizarro escaping, we run headlong into the lengthy/complex caveats about casing, hyphens, etc. And it seems like the key restrictions for "plain" tags and cascading tags should be identical.
  • What to do if there's a "cascading" tag and a user calls SetTag with the same key? Who wins?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 -- Attribute vs Tag was a strange terminology distinction when I first started getting into this project; nothing unusual but the kind of naming that software is unfortunately rife with. So I think it can be clearer.

However, I'm not sure folding Attributes into Span Tags (via CascadingTag or similar) is going to solve the problem. A Tag is data added to a span, whereas an Attribute is data added to a Context. Based on that Attributes feel like they are not part of Span creation. We do want to propagate them, but they are not Span-centric.

It makes sense to think of the relevant part of DCP as propagating a Span when you have a zipkin-like model where client and server report data into the same Span, but when considering more generalized tracing systems, Context is the thing that is propagated, and Spans are created to report information on execution as the Context passes through. We can add Span info to the Context before propagating to inform the structure of the tree and Spans, but that doesn't make the Context a Span.

I realize this is arguably scope question currently being debated in #33 as well, but clarity is the goal of this line of discussion and I think it will be clearer if we don't conflate these two concepts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(/me files a motion to decouple this important, unfinished discussion from the current PR)

@codefromthecrypt
Copy link

I like that we are calling this a propagation api vs a codec.. that's a better choice. There are some implementation glitches that this doesn't cause, but maybe just highlights better.

For example, span operationName often lives in code that doesn't have the responsibility of reading headers or thrift envelopes.

Also, I've made some comments that we should really say "core" attributes/propagated tags vs assuming they will be identifiers. Here are the core attributes in zipkin. Note that optional below doesn't mean it is optional to propagate. It means that it can be null.

  1: i64 trace_id,
  2: i64  span_id,
  3: optional i64 parent_span_id,
  5: optional bool sampled

https://github.com/twitter/finagle/blob/develop/finagle-thrift/src/main/thrift/tracing.thrift#L246

It might be the case that we called the above core fields "Id" as finagle calls it this. Personally, I think this is a very confusing choice of as the word ID doesn't in any way convey transport of flags.

TraceId(
  _traceId: Option[SpanId],
  _parentId: Option[SpanId],
  spanId: SpanId,
  _sampled: Option[Boolean],
  flags: Flags)

@codefromthecrypt
Copy link

Another late comment. It is probably acknowledged, but this api is what binds us or overlaps with other context apis. Ex. in finagle HttpContext.read is used to get a context which can also be used for ancillary things to tracing, such as deadlines. Maybe use-cases doc needs to enumerate that we don't expect to replace context apis with this, rather interact with them.

@codefromthecrypt
Copy link

codefromthecrypt commented Jan 24, 2016 via email

@bhs
Copy link
Contributor Author

bhs commented Jan 25, 2016

Thanks, all, for the comments so far. Here are the important outstanding issues as I see them... in no particular order:

  • what to call and/or change about SetTraceAttribute (leave as-is? rename? combine with tags? what about the key restrictions?)
  • how to refer to the parts of a Span that need to get sent in-band during propagation
  • how we should allow people to set operation names after Spans have already started
  • whether/how to change method names around starting and joining traces

One other nice side-effect of this proposal: the hacks where we returned a child context and tags for the forthcoming Span are gone; even better, that Span does not even need to represent out-of-band identity fields like parent_id in the Span Tags.

Finally: nobody has leapt to the defense of TraceContext at this level of the API... seems like it's a goner.

@bhs
Copy link
Contributor Author

bhs commented Jan 25, 2016

Per opentracing/opentracing.io#24, I cleaned up the incredibly gross parts of this (i.e., standardtracer), fixed some comments, fixed tests, and would now like to merge this.

I recognize that we are not done talking about SetTraceAttribute, but I also think that can be resolved in parallel.

I'll add a few comments inline...

panic(err)
}

// Handle the attributes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I haven't tested this yet and so it probably has some subtle bug. I'll write a unittest if we agree that trace attrs are here to stay. :)

// ProcessIdentifier is a thin interface that guarantees all implementors
// represent a ProcessName and accepts arbitrary process-level tag assignment
// (e.g., build numbers, platforms, hostnames, etc).
type ProcessIdentifier interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this since a tracing impl can add this if it feels like it, but ProcessIdentifier has no bearing on RawSpan or its direct dependencies so thus can be omitted here.

@bhs
Copy link
Contributor Author

bhs commented Jan 26, 2016

(this is now rebased against #39)

@bhs
Copy link
Contributor Author

bhs commented Jan 26, 2016

Per #40 (comment) , I think this is basically ready for merge unless someone wants to object. There are still some open issues, but they were not created by this PR (the trace attribute questions and whether we can consolidate span-creation mechanisms).

@dkuebric
Copy link

I think the biggest change here, as you point out, is the removal of TraceContext as an external interface. It's been seen as a stumbling block, but I think removing it introduces a strange confusion as well--a Span is a reportable observation representing execution within my application, but also I need to propagate it in order to report something somewhere else?

We currently have this definition of a Span, which seems good and clear:

A Span represents a logical unit of work in the system that has a start time and a duration.

Our definition of TraceContext is currently pretty complex, so I won't quote it here, but I think it boils down to:

A TraceContext provides identifying information that is associated with each Span reported, tying the entire trace together. It is propagated throughout the request, allowing it to carry not only Span identifying information, but also other Attributes added by an OT user.

Curious if anyone else shares this feeling (or if I'm an outlier)?

(PS -- I think this is pretty relevant to the discussion on #33 as well.)

@bhs
Copy link
Contributor Author

bhs commented Jan 26, 2016

Well, there is SpanPropagator which is responsible for getting spans across process boundaries... so in a sense it takes the place of TraceContext and the Codec and TraceContextSource pieces, all of which were only practically useful as (a) a place to stash the baggage / trace attrs, and (b) encapsulate the parts of a Span that propagate over the wire.

But my larger point is that there's still a type in this revised API that concerns itself with propagation. It's just not an encapsulated member of the Span type.

@bhs
Copy link
Contributor Author

bhs commented Jan 27, 2016

@dkuebric do you want to veto this or are you more "thinking out loud"?

Are there other dissenters to the core idea of removing TraceContext from the API?

If not, does someone want to LGTM this?

@yurishkuro
Copy link
Member

It LGTM - other discussions can continue, but this doesn't make things worse imo, yet leaves less things to learn/worry about.

bhs pushed a commit that referenced this pull request Jan 27, 2016
RFC: far fewer types (but comparable portability)
@bhs bhs merged commit a33c13c into master Jan 27, 2016
@bhs bhs deleted the fewer_types branch January 27, 2016 06:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants