Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Standard(s) for in-process propagation #23

Open
yurishkuro opened this issue Dec 4, 2016 · 80 comments
Open

Standard(s) for in-process propagation #23

yurishkuro opened this issue Dec 4, 2016 · 80 comments

Comments

@yurishkuro
Copy link
Member

OpenTracing 1.x leaves the question of in-process context propagation entirely in the hands of the applications themselves, which substantially reduces the usefulness of the standard since different frameworks instrumented with OT do not have a way to exchange context.

Standards for in-process context propagation inevitably have to be specific to individual languages. This is more of a placeholder issue. It is also somewhat similar to #22 (possible part of it), but can be done independently.

@yurishkuro yurishkuro added this to the OpenTracing 2.0 milestone Dec 4, 2016
@wu-sheng
Copy link
Member

wu-sheng commented Dec 4, 2016

In-process propagation, and support different frameworks instrumented, seems very hard, as I known.
And different platforms have different characteristics.

In Java, same Context in ThreadLocal, and provide inter-thread propagation spec, maybe a effective way.

Keep me in the loop. :)

@codefromthecrypt
Copy link

codefromthecrypt commented Dec 4, 2016 via email

@yurishkuro
Copy link
Member Author

yurishkuro commented Dec 4, 2016

@adriancole what I find peculiar is that several similar concepts of context storage/provider I've seen (including Jaeger's) pretend that they are just an interface, but in fact they cannot work with anything but a thread-local based implementation. For example, TChannel (which is a form of RPC framework) can be configured with a TracingContext, as a way of abstracting the actual mechanism of in-process propagation, but the mere fact that a context object is not passed to RPC methods but instead extracted via that abstraction essentially means the abstraction must be implemented via thread-locals.

NB: the TracingContext in Jaeger and TChannel is to a Span what gRPC Storage is to a Context. I.e. the same separation of how something is propagated vs. what is being propagated. They are just smaller in scope and only propagate a single Span.

@codefromthecrypt
Copy link

codefromthecrypt commented Dec 4, 2016 via email

@raphw
Copy link

raphw commented Dec 5, 2016

As a matter of fact, I have written a small thread-local-ish utility for this exact purpose. The problem with thread locals is that it only allows access from within a Thread. It is not possible to set an object from without a Thread even if the latter is known.

Also, I have started migrating away from the above DetachedThreadLocal and rather use the WeakConcurrentMap. The latter allows to define a form of key object that represents a state. This state is automatically garbage collected once the key is eligible for collection. Propagation works by sharing such keys where the context is removed once a context has become obsolete.

All in all, the problem always boils down to managing a form of global-state what is always tedious.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 5, 2016

@adriancole what is incidental trace state? Why should be stored in something like threadlocal?

@wu-sheng
Copy link
Member

wu-sheng commented Dec 5, 2016

To all, threadlocal can't solve all problems, somthing like async process model will bring us more.

For example, Disruptor is a very useful in-memory async lib. If application use this, all context base on threadlocal will fail without doubt.

This is very important for my exp.

@codefromthecrypt
Copy link

codefromthecrypt commented Dec 5, 2016 via email

@wu-sheng
Copy link
Member

wu-sheng commented Dec 5, 2016

@adriancole, yes, that is my concern.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 5, 2016

But I think only change threadlocal to something similar is not explicit enough to someone, who is not a tracer developer.

Provide some APIs, like carrier, inject, extract for rpc, are better choice for me. They will be easy to understand.

@cwe1ss
Copy link
Member

cwe1ss commented Dec 6, 2016

In terms of the specification, the more interesting question for me is the contract of this API:

  • Should it be possible to store just one span in the context?
    • This would probably be e.g. the RPC server span and all lower-level components would be child spans of this one span without any nesting.
  • Should it be implemented as a stack?
    • How do you know on POP that you got "your" span? (in case a child component failed to close/pop its span or called pop too often)
    • How does a component know that it is the outermost component and that it should close any open spans. Example: The RPC server creates a "root span" and adds it to the stack, a lower-level component (e.g. a DB access library) opens a child span but the code failed with an exception. The request exception handler now needs to know that the DB span must be closed.
  • Should it be implemented as a list or a tree to allow for more reference types?

It might be easier to decide which language-specific storage construct might be the right one, when we know which features we want it to support.

@sjoerdtalsma
Copy link

I am creating a library called 'context-propagation' meant purely to propagate any (threadlocal) context implementation from a parent-thread to any background thread in a stack-like fashion, using the Autocloseable try-with-resources mechanism to allow verified stack unrolling (and automatic skipping if a pop is missed).
It revolves around a Context and ContextManager interface (each with only two methods) and a utility to create a snapshot from all known ContextManager implementations at once.
If you're interested, please take a look here and let me know what you think:
https://github.com/talsma-ict/context-propagation

(we needed it for security propagation, but are now looking into adding opentracing and would like to re-use the mechanism that is currently in-place that serves us well).

@yurishkuro
Copy link
Member Author

@cwe1ss I think the only requirement is "I need to get the current span". The fact that Java implementation often uses stack is a side-effect of thread-local implementation, since you're using the same API to retrieve different spans at different times, and stack is the only thing that makes sense for that.

@cwe1ss
Copy link
Member

cwe1ss commented Dec 6, 2016

@yurishkuro there's also the logical requirement to "set the current span". Do you think this is implicit - meaning that every span that is created is automatically "the current span" or do you see this as an explicit feature/step?

In other words, do you think that the context should be part of the tracer interface in which a call to StartSpan would automatically store the span in the context and something new like tracer.CurrentSpan would return it from the context? Or would you prefer a new interface like traceContext that must be called explicitly by the user after a span has been created - e.g. traceContext.SetCurrentSpan and traceContext.GetCurrentSpan?

The former would make for a nice user-friendly API and the latter seems to be closer to what we have now in jaeger-context and my C# contrib library. But both have advantages and disadvantages that need further discussions of course.

@yurishkuro
Copy link
Member Author

Sorry, you're right, it needs get and set fort current span.

@bhs
Copy link
Contributor

bhs commented Dec 7, 2016

@sjoerdtalsma thanks for sharing the context propagation lib... nice to see something that's already in use and not pre-coupled to a distributed tracing system per se.

@yurishkuro @cwe1ss I am prototyping a few approaches to make this pluggable... I would like OT to support in-process context managers without getting into the business of doing all of the work (since there are so many approaches and each has its advantages/disadvantages).

@cwe1ss as far as the stack concept is concerned: in Dapper we basically had each in-process Context object keep a reference to its parent Context; when finish() was called (or its moral equivalent... the naming was different), the Context would re-install its parent Context in the thread-local storage. So there was essentially a linked list of Context objects within the process.

@sjoerdtalsma
Copy link

sjoerdtalsma commented Dec 7, 2016

@bensigelman For what it's worth, I heavily depend on the the java.util.ServiceLoader to make my library pluggable.
The abstract threadlocal implementation has the same 'linked stack' structure, except it tries to correct out-of-order closing as well.

Details on the out-of-order closing: If the closed context isn't 'current', current isn't touched.
While the parent is already-closed, it is 'skipped' in the search for the restored 'current'.
I am aware this should never happen in properly nested try-with-resources Autocloseable contexts.

Later today I'll be committing a 'java-globaltracer' project to github and planning to offer it to opentracing-contrib. There will be three main classes with only a single public class GlobalTracer (currently with 3 methods) following this design: http://bit.ly/2gkrRuo

@wu-sheng
Copy link
Member

wu-sheng commented Dec 8, 2016

@sjoerdtalsma , can you post the site of java-globaltracer project. I have suggest earlier in opentracing/opentracing-java#63. ServiceLoader to get a tracer maybe a good way. Even though, some tracer need arguments to init, like Jaeger Tracer by @yurishkuro told, they are not suitable for the module.

But I like this type of java tracer factory/manager very much.

@sjoerdtalsma
Copy link

@wu-sheng @adriancole , I pushed an initial attempt (with a working unit test using the MockTracer) to github yesterday evening:
https://github.com/talsma-ict/java-globaltracer

I would love to donate it to the 'contrib' git repository and have it published in Maven central under 'io.opentracing.contrib' but only after a thorough review process. Since I'm relatively new to the opentracing community I have no experience on the best way to go forward from here...

@wu-sheng
Copy link
Member

wu-sheng commented Dec 8, 2016

@sjoerdtalsma @adriancole , maybe we should create a new issue to discuss this?

I review the project generally, let's leave the detail discuss later.

But at first, jdk level should be fixed. Using Optional<T> led to needing jdk 1.8+, but many applications based on jdk 1.6+. I think you should implement your design on jdk 1.6, maybe the code is less grace, but it will easier to use.

@codefromthecrypt
Copy link

codefromthecrypt commented Dec 8, 2016 via email

@sjoerdtalsma
Copy link

@wu-sheng @adriancole I welcome feedback on the design and implementation choices by github issues.
https://github.com/talsma-ict/java-globaltracer
I'm also willing to transfer this repository to https://github.com/opentracing-contrib if you think this contribution is worthwhile.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 8, 2016

@sjoerdtalsma , transfer to https://github.com/opentracing-contrib, LGTM. @yurishkuro @adriancole @bensigelman , what's your opinion?

Maybe accept it, discuss on the repo, and let's see what is going on later?

@bhs
Copy link
Contributor

bhs commented Dec 8, 2016

@sjoerdtalsma my preference would be to

  1. create (empty, ASF2.0) repos on opentracing-contrib for the pieces you'd like to move over
  2. do a PR for the initial contents where we discuss the finer points of the code itself (and I would def rather not try to do that all here in this thread!)
  3. merge those PRs as soon as they're ready, etc.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 9, 2016

@bensigelman , can you add me to the https://github.com/opentracing-contrib org. I am interested in @sjoerdtalsma's project.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 9, 2016

@sjoerdtalsma , let's me know, after you create a repo, and do a PR

@sjoerdtalsma
Copy link

sjoerdtalsma commented Dec 9, 2016

@bensigelman I agree on all three of your points.
re. 1: I cannot create this repo in opentracing-contrib, so it would be great if anyone with the power to do so could create a new ASF2.0 repo called 'java-globaltracer' or something similar. I'll take care of step 2 from there.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 9, 2016

@bensigelman , maybe you should create a repo for @sjoerdtalsma

@objectiser
Copy link
Contributor

@bhs If transferring a Span from one thread to another, by creating a SpanClosure which is activated in the destination thread, how is the Span disassociated from the first thread (in situations where this is desirable)?

@bhs
Copy link
Contributor

bhs commented Mar 11, 2017

@wu-sheng

What is the difference between Snapshot and SpanClosure?

Definitely similar in spirit... The SpanClosure actually has an interface to implement, though, since it makes it clear that a SpanClosure/Snapshot may only be activated/deactivated once.

@bhs
Copy link
Contributor

bhs commented Mar 11, 2017

@objectiser

If transferring a Span from one thread to another, by creating a SpanClosure which is activated in the destination thread, how is the Span disassociated from the first thread (in situations where this is desirable)?

Yes, that's a good point. I had addressed that in a previous version of this (that I didn't send out) where everything was reference-counted and finish() was implicit (when the refcount got to zero).

I've added a new commit that includes a new method, SpanBuilder.startAndActivate(), that returns a SpanClosure. I also made SpanClosure autocloseable just to see how that felt syntactically.

The async code looks like this:

        // Create a parent SpanClosure for all of the async activity.
        try (SpanManager.SpanClosure parentSpanClosure = tracer.buildSpan("parent").startAndActivate();) {

            // Create 10 async children.
            for (int i = 0; i < 10; i++) {
                final int j = i;
                futures.add(otExecutor.submit(new Runnable() {
                    @Override
                    public void run() {
                        // START child body

                        try (SpanManager.SpanClosure childSpanClosure =
                                     tracer.buildSpan("child_" + j).startAndActivate();) {
                            Thread.currentThread().sleep(1000);
                            childSpanClosure.span().log("awoke");
                            Runnable r = new Runnable() {
                                @Override
                                public void run() {
                                    Span active = tracer.activeSpanManager().active();
                                    active.log("awoke again");
                                    // Create a grandchild for each child.
                                    Span grandchild = tracer.buildSpan("grandchild_" + j).start();
                                    grandchild.finish();
                                    active.finish();
                                }
                            };
                            subfutures.add(otExecutor.submit(r));
                        } catch (Exception e) { }

                        // END child body
                    }
                }));
            }
        } catch (Exception e) { }

        try {
            for (Future<?> f : futures) {
                f.get();
            }
            for (Future<?> f : subfutures) {
                f.get();
            }
        } catch (Exception e) { }

It's worth noting that existing synchronous start()/finish() code can work just as it does today.

Stepping back a bit...

This PR moves things in the direction of formalizing the difference between Span as an API to essentially record data about a span and SpanClosure and SpanManager as APIs to manage the relationship between specific spans and specific execution contexts / threads / etc. I think that's a good thing / direction, but just wanted to point it out.

@sjoerdtalsma
Copy link

sjoerdtalsma commented Mar 13, 2017

One question; are we 'okay' now with automatically/implicitly becoming child of the active span ?
I remember rather strong opinions on either side of that decision when discussing the java-spanmanager PR.

edit: I am personally okay with either choice as long as it is documented very clearly

@objectiser
Copy link
Contributor

@bhs There may still be an issue if the parent scope is being managed by two separate event handlers, if the event handler creating the SpanClosure cannot store it in some specific context that is available to the end event handler.

@bhs
Copy link
Contributor

bhs commented Mar 13, 2017

@sjoerdtalsma

One question; are we 'okay' now with automatically/implicitly becoming child of the active span ?
I remember rather strong opinions on either side of that decision when discussing the java-spanmanager PR.

I'm definitely ok with that. We did so in Dapper and – while it has some drawbacks – it's a way better default than not-becoming the child :)

Two thoughts, though:

  1. If.f. the programmer explicitly provides references, those would replace (not supplement) any automatic references, and
  2. We can introduce a new Reference type, maybe INFERRED_CHILD_OF or similar; this will allow tracing systems / UIs / etc to distinguish these relationships

@objectiser

There may still be an issue if the parent scope is being managed by two separate event handlers, if the event handler creating the SpanClosure cannot store it in some specific context that is available to the end event handler.

IMO, if different executors (or whatever) start and finish a Span, they would not share a single SpanClosure activation/deactivation. Either there would be zero or two of those pairs. Note that in my proposal it's still possible to start a Span without meddling with SpanManager at all.

@bhs
Copy link
Contributor

bhs commented Mar 19, 2017

A little later than I'd hoped, but here is a sincere attempt to address this issue for OT-Java: opentracing/opentracing-java#111

Please add comments there... I'm making special note of it because more people are watching this issue.

@beberlei
Copy link

The java implementation scares me, since this kind of complexity is absolutely not needed in a request scoped language such as PHP. Would it make more sense to agree that each language that does not have explicit in process context propagation such as Go, should provide a solution that is idiomatic for this language?

@felixfbecker
Copy link

Since I hear this a lot, not every PHP project is request scoped. PHP can be used just like any other language to build long-running service processes, that e.g. can serve RPCs calls over TCP, even with an event loop like in NodeJS. Take as an example https://github.com/felixfbecker/php-language-server

@yurishkuro
Copy link
Member Author

@beberlei yes, the propagation mechanism is specific to each language, although it is preferable to keep similar concepts as much as possible, such as "active span".

@felixfbecker if you look at a proposed mechanism for Python, it is actually a pluggable approach depending on which async framework in Python is used, because they require different ways of propagating the context via event loops. I assume the same will be true in PHP. For request-scoped PHP the implementation of active span source could be just a global variable.

@felixfbecker
Copy link

For request-scoped PHP that is not using async functions with callbacks (like async curl or async DB queries) and no event loops, it can be a global variable.

A runtime also doesn't necessarily need to have a concept of "active span" imo. In Node's event loop continuation-local-storage is super slow to propagate an active span, and it is so much easier to e.g. just pass the span around as the last parameter to functions.

@dkuebric
Copy link

@felixfbecker without an active span getter, it will be more cumbersome for users to combine their own instrumentation points with those built in to standard libraries (eg. web frameworks, db clients, etc), and to compose those with each other.

@tedsuo
Copy link
Member

tedsuo commented Jun 13, 2017

@yurishkuro has the right nuance here. In-process propagation is language dependent, but for languages that share paradigms we should try to avoid unnecessary inconsistency between API implementations. For example, I would not want the ruby implementation to stray too far from the python one.

@felixfbecker I think Javascript is a special case, where you do really want an ActiveSpan concept but the currently available mechanisms are insufficient/unofficial enough that it would be concerning to bake support for them into the official API. But if JS were to sort out continuation local storage in an effective manner, we would immediately want it for the reasons @dkuebric pointed out.

PHP concerns me because I'm ignorant, and don't understand the various possible execution contexts and how they are exposed to the PHP runtime, and to what degree the tracing you need to do is actually mixed up with foreign function calls to C code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests