Skip to content
This repository has been archived by the owner on Jun 24, 2018. It is now read-only.

What's the story with threads? #3

Open
NicolasT opened this issue Sep 30, 2017 · 8 comments
Open

What's the story with threads? #3

NicolasT opened this issue Sep 30, 2017 · 8 comments

Comments

@NicolasT
Copy link
Contributor

I was trying to figure out how to handle concurrency (forkIO, async).

A potential API I came up with would be something like

data Reference = ChildOf | FollowsFrom

fork :: MonadTracing m => Reference -> Text -> m () -> m () -> m ThreadId

I implemented something along these lines (although my fork is Reference -> Text -> TracingT IO () -> TracingT IO ThreadId since I didn't generalize it), also introducing activeSpanContext :: Tracer -> IO (Maybe SpanContext) to the OpenTracing sig and implenting it in Jaeger accordingly.

Sadly enough, this does not work, because of the IORef (Maybe Span) being used in the Jaeger.Tracer type. What should be added instead, I guess, is some kind of dupTracer which creates a new IORef to be used within the thread, or something along those lines.

It may also require creating new sockets etc, or coordination between threads, because otherwise multiple threads could write to the same socket concurrently (may not be a problem with UDP transports, but would be with TCP-based tracing systems).

This seems to become messy quite quickly.

An alternative could be to have a single thread interacting with the tracing agent, and use some kind of queue to submit spans to it.

Related to this: need to figure out how to play nice with libraries like async. Getting things to 'Just Work' seems difficult, some new APIs will be required I believe. But ending up with something like

(a, b) <- runConcurrently $ (,) <$> Concurrent "getPage" (getPage url1) <*> Concurrent "getPage" (getPage url2)

where the sub-threads would ChildOf reference the main one could be sensible, staying close to the async APIs, and not requiring changes to getPage (which may be a library function, not tracing-enabled).

@ocharles
Copy link
Owner

I haven't entirely thought it through yet, but we should check out how the Go library deals with this. From memory, I believe it's roughly about capturing the Span you're in, and then activating it in child threads.

@ocharles
Copy link
Owner

opentracing/specification#23 may discuss this. In general, it seems that this is not part of the OpenTracing spec, so i'd be inclined to just hold off. https://github.com/opentracing/opentracing-java/blob/master/opentracing-api/src/main/java/io/opentracing/ActiveSpan.java#L64 is capture in the Java API, which already diverges from the specification (or more, extends it).

@NicolasT
Copy link
Contributor Author

A way forward, I think, could be to split 'current span' tracking etc (move it into TracingT?), and let an OpenTracing implementation only worry about creation of span instances, and emitting them.

Remains the shared use of tracerWriteBuffer, I guess?

Maybe a Tracer should have two parts of state:

  • Shared within a process (e.g. a socket)
  • Shared within a thread-of-execution (e.g. Thrift.WriteBuffer), and a new instance created upon fork
  • Not maintain 'current span' and thread-state as part of the Tracer, but as part of the MonadTracing environment, which can then take care of setting things up upon fork

@ocharles
Copy link
Owner

Well it wouldn't be much work to copy the Java API and have

data Continuation

captureActiveSpan :: Tracer -> IO (Maybe Continuation)
restore :: Tracer -> Continuation -> IO ()

But I need to give it a bit of thought, and might also need to think about these Maybes and see if there's anything else.

I'm a little cautious about committing to a particular monad or transformer though.

I'm not sure about the connection to tracerWriteBuffer, that's just an implementation detail of the Jaeger client to make sure all Thrift calls are delivery in a single UDP packet.

@NicolasT
Copy link
Contributor Author

NicolasT commented Oct 1, 2017

What I tried to say about the WriteBuffer is that having one as part of the Tracer structure, this basically makes the Tracer not thread-safe (or rather, can't be shared across threads assuming it's WriteBuffer is used by multiple threads concurrently).
A WriteBuffer is a IORef Builder with e.g.

writeBuf :: WriteBuffer -> LBS.ByteString -> IO ()
writeBuf w s = modifyIORef w ( <> lazyByteString s)

(https://git1-us-west.apache.org/repos/asf?p=thrift.git;a=blob;f=lib/hs/src/Thrift/Transport/IOBuffer.hs;hb=39310dad793ca69b4b7217a3b54430e682e5e2a4)
so I think (admittedly making some assumptions about how the Thrift library may work, given experience with the Python version of it) this should become part of per-thread Tracer state.

Thinking of it, doing a worker-thread(s?) + queue approach would

  • get rid of WriteBuffer sharing thread-safety issues
  • get rid of pitfalls with timely closing the Socket
  • allow to batch multiple spans into a single packet to be sent to the agent, which may be more efficient

@NicolasT
Copy link
Contributor Author

NicolasT commented Oct 1, 2017

Given the Continuation and Maybes story:

I believe there are 3 possible states:

  • Some code is running as part of a current (active) span
  • Some code is running (likely in a thread) and has no current (active) span, but does have a 'parent' span(context) which would becomes a reference (childof or followsfrom) whenever a span is created for the current thread
  • Some code is running and has no current nor parent span, e.g. at process startup

Not sure how to encode this ergonomically (wrt how a span is stored in the state) though. Will do some exploration-while-coding.

@NicolasT
Copy link
Contributor Author

NicolasT commented Oct 1, 2017

I was thinking of using some forall s. ST s-like trick to ensure a span can't be passed to a different execution context, in order to ensure a span with a specific spancontext won't (accidentally) be emitted twice (with different timings), but this would rule out the possibility to write code in which a span is started in one thread and closed in another (e.g. an RPC system where the server receives a client request, starts a corresponding span, pushes the request on a queue which is handled by some other thread, which then pushes the RPC result to yet another queue which is popped by a third thread to respond to a client, then closing the span).

@ocharles
Copy link
Owner

ocharles commented Oct 2, 2017

WriteBuffer can be trivially thread safe by moving to an MVar, so i'm not too worried about that. But really the Tracer should be forked to a separate thread with spans pushed to it for publishing, which means some kind of TQueue or something,

get rid of pitfalls with timely closing the Socket

pitfalls?

allow to batch multiple spans into a single packet to be sent to the agent, which may be more efficient

Certainly, and this comes from having the jaeger tracer fork a thread that periodically flushes.

ST like tricks seem like a rather heavy weight solution (it's a specialization of Oleg's region work), so i'd want to be sure we really need that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants