-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot make OpenTelemetry work with GCP PubSub: how to seperate traces for different events in an EventEmitter listener? #3993
Comments
This is a context propagation topic. This explicit propagation inject/extract is not done above. Therefore the node.js built in context propagation via AsyncLocalStore context manager kicks in. because everything is in a single process the context propagation is completely different then between two separate node processes communicating via some Queue/TCP/... In your sample the messages/message operations are not encapsulated in an AsyncResource therefore You could take a look into one of the instrumenations in https://github.com/open-telemetry/opentelemetry-js-contrib/ to get a better insight into these details. regarding the difference between |
Let's say I don't care about producer side context; I just want each call of the event emitter handler get a separate trace, not nested from any previous message. (e.g. separate span context)
Thanks for pointing that out. This the new code: async run(channel) {
this.subscription.on('message', async(message) => {
const asyncResource = new AsyncResource('consumeMessage', { triggerAsyncId: message.id * 1000 + Math.trunc(Math.random() * 1000) })
await asyncResource.runInAsyncScope(tracer.startActiveSpan, tracer, { kind: otel.SpanKind.CONSUMER }, async(span) => {
console.log('executionId', executionAsyncId(), 'triggerId', triggerAsyncId())
await this.consumeMessage(message, { span })
})
asyncResource.emitDestroy()
channel.emit('subscribed')
} (I also had a bug passing message IDs, so While each process gets a separate
I also tried calling
I couldn't find any code directly using
Yes, and I tried to mimic that in my second approach but external libraries won't get instrumented... |
So, I could get it work by moving 'use strict'
import { tracer, enableConsoleSpanExporter } from './trace.js'
import { EventEmitter } from 'events'
import * as otel from '@opentelemetry/api'
import * as winston from 'winston'
import * as assert from 'assert'
async function sleep(amount) {
await new Promise((resolve, reject) => {
setTimeout(resolve, amount)
})
}
function createLogger({ logLevel = 'info' } = {}) {
return winston.createLogger({
level: logLevel,
exitOnError: false,
format: winston.format.combine(winston.format.timestamp(), winston.format.json(), winston.format.colorize()),
transports: [
new winston.transports.Console()
]
})
}
export class FakeQueue {
constructor() {
this.logger = createLogger({ logLevel: 'debug' })
this.subscription = new EventEmitter()
}
async run(channel) {
// this works as I expect.
this.subscription.on('message', async(message) => {
const ok = await tracer.startActiveSpan('consumeMessage', { kind: otel.SpanKind.CONSUMER }, async(span) => {
const ok = await this.consumeMessage(message, { span })
span.end() // I'm not sure if we need this
return ok
})
console.log('requeue', message.id, !ok)
ok ? message.ack() : message.nack()
})
channel.emit('subscribed')
}
async publishMessage(message, id, retryCount = 0) {
this.subscription.emit('message', {
ack: () => { },
nack: () => {
if (retryCount < 2) {
process.nextTick(() => this.publishMessage(message, id, retryCount + 1))
}
},
id,
retryCount,
data: JSON.stringify(message)
})
}
async consumeMessage(message, { span = tracer.startSpan() } = {}) {
// returns false for requeue
let payload
const logger = this.logger
span.setAttribute('messageId', message.id)
try {
payload = JSON.parse(message.data)
logger.info('received message', { payload })
span.setAttributes({ pageURL: payload.url })
const { response } = await tracer.startActiveSpan('fakeDownload', async(childSpan) => this.fakeDownload(payload.url, { span: childSpan }))
span.setAttribute('responseStatus', response.status)
if (response.status < 400) {
span.setStatus({ code: otel.SpanStatusCode.OK })
return true
} else {
span.setStatus({ code: otel.SpanStatusCode.ERROR })
return false
}
} catch (e) {
span.recordException(e)
span.setStatus({ code: otel.SpanStatusCode.ERROR })
return false
}
}
async fakeDownload(urlStr, { span = tracer.startSpan() } = {}) {
await sleep(Math.random() * 2000)
try {
const url = new URL(urlStr)
assert.ok(['http', 'https'].indexOf(url.protocol.split(':')[0]) > -1)
span.setStatus({ code: otel.SpanStatusCode.OK })
return { response: { status: 200 } }
} catch (e) {
span.recordException(e)
span.setStatus({ code: otel.SpanStatusCode.ERROR })
return { response: { status: 500 } }
} finally {
span.end()
}
}
}
enableConsoleSpanExporter()
const processor = new FakeQueue()
const runChannel = new EventEmitter()
runChannel.on('subscribed', () => {
processor.publishMessage({ url: 'http://google.com' }, 1)
processor.publishMessage({ url: 'httpXX://google.com' }, 2)
})
await processor.run(runChannel) |
My questions:
|
@reith Can I ask you to +1 the PR in the pubsub repo? googleapis/nodejs-pubsub#1659 (comment) |
You can use the
I haven't followed all your code because I think the in process messaging system is a bit of topic here. As long as you are in one process There is no dedicated API to don't consider rest of code in this function as part of current span - There is only
No,
That's not used here, it should be used by e.g. database drivers,.. to encapsulated individual operations (e.g. a DB request) sent over a shared/reused resource (e.g. a network connection). |
I'm using OpenTelemetry to create traces for messages I get from a PubSub subscription. I found two ways to do this:
tracer.startActiveSpan
to wrap my actual message handler function,tracer.startSpan
in my message handler.Using the first approach, subsequent messages may get assigned as children of previously processed messages. I say may because it depends on the workload of a PubSub, I guess, but I reproduced the problem without using an actual PubSub:
fake-queue.js
trace.js
The above code mimics the behavior of a node-js client for PubSub, at least for the
nack()
part. Notice that when I run the code, usingconsumeMessage
as message handler, I see only two traces are created. The second trace,f5732dff30bc02790c75c793852abb73
in my logs, is repeated three times. I'm looking for a way to make open-telemetry create different spans for each message processing. Does open-telemetry happen to provide a function that I can use in the line I commented withQ1
label?The other approach, using
consumeMessageInDetachedSpan
as message handler, creates four traces for me. That's what I expect. There are two problems though. It won't be able to extract context from a PubSub message, but that's not a big deal since the PubSub client doesn't actually propagate context yet, as far as I know. The other problem is instrumentation for external libraries doesn't work anymore. For example, using the second approach I see logs like this:{"level":"info","message":"received message","payload":{"id":1,"url":"http://google.com"},"timestamp":"2023-07-14T00:11:55.921Z"}
While using the first approach, span and trace ID are injected into the logs:
{"level":"info","message":"received message","payload":{"id":1,"url":"http://google.com"},"span_id":"3a98357837d3bd61","timestamp":"2023-07-14T00:09:53.677Z","trace_flags":"01","trace_id":"dff9d9a5a08f06911b8328c9d82d3ec6"}
Since my code calls other HTTP endpoints, I need HTTP instrumentation which doesn't work in the second approach.
Logs for first approach:
fake-1.log
and second approach:
fake-2.log
Dependencies versions:
The text was updated successfully, but these errors were encountered: