-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry integration misses the point: we need to propagate the span/context! #1389
Comments
@sethmaxwl was the scope of the initial OpenTelemetry PR to just link the publish and subscribe side together? I think it missed the point for any consumer application if you want to link the publish to more downstream actions inside the subscriber. |
@hermanbanken Thanks for the comments! We are in the middle of trying to harmonize our treatment of OpenTelemetry for Pub/Sub across languages/libraries. The Node one was sort of a pilot project to see how it goes, and we're hoping to make that better. Feel free to add any comments you have (or anyone who wants to comment!) and we'll add it to the discussion. |
One remark I have is that it should be documented a little bit more how it is working. For example, the (semantic) key name and the data format of the propagation key could be documented. I believe it was |
Good point, I think Aaron suggested the same regarding W3C trace format. Existing propagator should be reusable. I struggled to get it all working with the recent Otel-js. If you want you can raise a PR or I can have a stab at it during my next tech debt day at work I am guess we should make the span we created the active one? Got to look a bit more into the workings of subscriber class |
@hermanbanken Did you hear back from Google Support about your case? |
At September 1st they said that they would working on it "next quarter". Public tracker: https://issuetracker.google.com/issues/183608476 |
That tracker seems to be inaccessible now... weird. |
Going to revisit it this week. Have to solve it for work :)
…On Sun, Feb 20, 2022 at 3:43 PM naseemkullah ***@***.***> wrote:
+1
—
Reply to this email directly, view it on GitHub
<#1389 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABXCIK5RBIXPARJ4MCLTDU4ED2FANCNFSM5DCBVHCA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I came up with this as a workaround (there is a comment in the event listener about what I think this library should provide): import {Message, PubSub} from '@google-cloud/pubsub';
import {
ROOT_CONTEXT,
SpanContext,
SpanStatusCode,
trace,
} from '@opentelemetry/api';
const tracer = trace.getTracer('foo');
const pubsub = new PubSub({projectId: 'abc'});
const enableOpenTelemetryTracing = {
enableOpenTelemetryTracing: true,
};
export async function subcribe() {
const subscription = pubsub.subscription(
'testSubscription',
enableOpenTelemetryTracing
);
subscription
.on('message', (message: Message) => {
// it would be convenient if everything in this handler
// apart from `handleMessage(message)` logic was provided by the GCP pub/sub library
// when enableOpenTelemetryTracing is true
const spanContext = getSpanContext(message);
if (spanContext === undefined) {
handleMessage(message);
} else {
const context = trace.setSpanContext(ROOT_CONTEXT, spanContext);
tracer.startActiveSpan('handleMessage', {}, context, span => {
try {
handleMessage(message);
span.setStatus({code: SpanStatusCode.OK});
} catch (err) {
span.setStatus({
code: SpanStatusCode.ERROR,
message: err instanceof Error ? err.message : undefined,
});
throw err;
} finally {
span.end();
}
});
}
})
.on('error', (error: Error) => {
console.error('Received error:', error);
});
}
function handleMessage(message: Message) {
console.log('Received message:', message.data.toString());
message.ack();
}
function getSpanContext(message: Message): SpanContext | undefined {
try {
return JSON.parse(
message.attributes['googclient_OpenTelemetrySpanContext']
);
} catch (error) {
return undefined;
}
} worth noting, while the message acknowledgment and any processing leading up to it are part of the same trace, the edit: so apparently "span links" are the proper way to link a messages consumption to its production (as @weyert menitons below), above code snippet should probably not be used |
Thank you, that looks useful. I think we might be able to use span links to connect modifyackdeadline and the other cools. But to me having it not linked is already a great improvement |
The tracker page is still there for me, internally at least. Weird. Anyway, nothing has happened on this yet, but it's increasing in priority right now. Definitely not forgotten. |
Yeah, I hope to have a stab at it soon. So far more important stuff at work than this. |
You might not want to put too much into it, just because we've got a revamped design for OT that will be rolling out across all the languages, over time. Of note, it's going to propagate the span and context by way of the W3C spec, as well as doing better about "closing the loop" on the subscriber end. We've got a prototype implementation in Go, and that just needs to be expanded across all the languages. googleapis/google-cloud-go#5034 If you see anything you want in the Node implementation that's missing from the Go one, please feel free to comment! |
Yeah, the changes I wanted to propose via PR would be heavily based on the Go implementation :) Let me know, if you prefer me to wait, that's fine, happy to wait for a Google implementation if it's coming in short term |
Great news that this is being fixed! Just spent a day scratching my head as to why my traces were disconnected... |
Sorry for the radio silence on this topic, but it's still a major objective for us. None of this is official until it's official (please don't take it as an announcement/promise), but the current thought is that we're planning to shift it to use the more standard mechanism (w3c headers), and to get it piped all the way through. A preview of the Go version produces a really nice graph in Cloud Trace, showing the time spent in each phase of the library (batching, ack leasing, etc). There isn't currently anything that will be traced in the server, but the end to end trace should at least allow you to see how long it spends in the server before being sent out. |
Mentioned in the PR, but mentioning here too: there's a suuuuper early preview branch now if anyone wants to play around with what's done. I don't recommend using it in any kind of production scenario or assuming any of the interfaces or trace formats are final. That PR is here: #1659 And the branch in googleapis that anyone can make PRs against if you want: https://github.com/googleapis/nodejs-pubsub/tree/otel-2 I'll be out for a bit but plan to pick it up again in early 2023. |
@feywind Just wanted to check in and see how 2023 was treating this work, any updates yet? |
I've been posting updates on the PR, but I'm planning to make a preview version soon. |
Tiny sample size, I admit, but quick survey for those on this thread and wanting OTel support sooner rather than later: do you consider the trace structures and span names to be part of the public interface? Would it break your code if they changed? Or is it just a human debug tool? |
@feywind for me it’s a human debug tool:) I am personally not planning to generate metrics from the traces via like otel-collector |
Mainly it is human debug tool, but there might be some alerting around it, I would expect that changes to span names or changes in amount of spans emitted to be mentioned in changelog/release notes. Mainly because there is some little cost attached to each single span. |
Pretty much entirely human debug tool, but mentioning any significant changes in the changelog would be helpful for updating any internal process documentation/etc. |
Same here as others have said - only used for some alerting and saved searches in grafana, "production code" doesn't care |
Same here. |
Thanks everyone, I appreciate the input. Our main goal for the moment is more of a human debug solution, too, so I think we're all aligned. :) A sort of "soft interface" is an interesting idea, not needing a whole new major as long as updates are mentioned consistently somewhere. That'll be good to take into account. |
That #1833 is the version rebased onto |
Quick update, there are some beta testing instructions over here if anyone is interested. |
Has anyone had a chance to try this out? There are a few things that need to be finished up to get this released, but it'll probably be pretty soon. |
I have tried it during local development and works pretty well for me! |
@feywind I have now been trying the beta in GCP but don't get the pubsub message handler requests to the same trace than from where it was triggered. We use @opentelemetry/auto-instrumentations-node and Pub/Sub push subscriptions with Cloud Run instance being triggered from the messages. I can see that the incoming request body contains eg.
where the trace ID matches the trace ID of the publisher, but the incoming HTTP request Note: I have very little prior OTEL experience but trying now to set up tracing so that incoming request trace would contain everything the request handling caused. I don't even know if I'm supposed to do something manually or are push subscriptions even supposed to be supported throughout the chain. |
@vesse It seems push subscriptions are slightly out of scope for this first pass. You should be able to pull that I'm not sure exactly what is placing a new |
I forgot to post here too: the |
The code where the Subscriber reads the span from the attributes just creates the span and then throws it away:
nodejs-pubsub/src/subscriber.ts
Lines 517 to 527 in 9910793
A good OpenTelemetry customer would then set the span to the current span in the
on("message")
handler, so that any subsequent processing is attached to the remote span.I'll also file a Google support case.
Environment details
@google-cloud/pubsub
version: master / v2.17.0Steps to reproduce
{ enableOpenTelemetryTracing: true }
options on topic and subscription.It can be observed that actions are linked to the subscriber span, as can also be seen in this screenshot (spans take <1ms because nothing is done inside):
Relates to #1078.
The text was updated successfully, but these errors were encountered: