feat(otel): add opentelemety utility functions #272

AndrewWinterman · 2024-06-28T22:17:38Z

This PR extracts opentelemetry utility functions from my private project
and adds them to this project without calling them. It partially resolves #43

I'd like a broader discussion about whether these should be
automatically called by the library where possible, or if they should
simply be provided to clients to use if they so wish.

I did my best to follow OpenTelemetry semantic conventions as described
here
https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/,
but they are at times ambiguous for rabbitmq-- e.g. is the destination
for a message the Queue or the Consumer Tag the message was delivered to.

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

We can manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.

My intention with putting this PR up is to move the conversation
forward. Because the PR only provides private methods (if I left
members public please call them out), it can be safely merged while
these questions are worked out.

This PR extracts opentelemetry utility functions from my private project and adds them to this project without calling them. It resolves rabbitmq#43 I'd like a broader discussion about whether these should be automatically called by the library where possible, or if they should simply be provided to clients to use if they so wish. I did my best to follow OpenTelemetry semantic conventions as described here https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/, but they are at times ambiguous for rabbitmq-- e.g. is the destination for a message the Queue or the Consumer Tag the message was delivered to. Given the channel based approaches of this library, it is impossible for the library to know the full execution of a consumer. Unless autoack=false, we cannot actually know when to end the span associated with a delivery, so at least in the consumer case, it's probably best to allow the client to manage spans for themselves. We *can* manage spans on the producer side, and at the very least extract span identifiers to include on published headers automatically, and provide utilities for pulling them back out again. My intention with putting this PR up is to move the conversation forward. Because the PR *only* provides private methods (if I left members public please call them out), it can be safely merged while these questions are worked out.

AndrewWinterman · 2024-06-28T23:05:37Z

I converted this to a draft. it's getting dinged for unused methods, which was intentional, so maybe best to leave it in draft unmerged until we resolve some of those design questions

Zerpet · 2024-07-02T12:13:10Z

Hey, thank you for taking the time to contribute to this library. I'll respond in-line to the topics in the OP.

I'd like a broader discussion about whether these should be automatically called by the library where possible, or if they should simply be provided to clients to use if they so wish.

Given that this project is a library, it's a great opportunity to provide automatic native instrumentation. That means we should automatically create spans, and inject/extract context where it makes sense.

they are at times ambiguous for rabbitmq-- e.g. is the destination for a message the Queue or the Consumer Tag the message was delivered to.

I agree that some semantics are ambiguous for RabbitMQ. I would advocate to adhere to the specific conventions for RabbitMQ described in this link, and do our best with ambiguities not covered (and document them!).

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

Given the "subscription" workflow of RabbitMQ (polling/pulling is highly discouraged), I think we can record the attributes of the subscription (and add more if necessary), and inject those into the spans upon receiving messages, and just before forwarding them into the Go channel. This idea needs validation, but I would prefer this over leaving the consumption instrumentation to the users.

We can manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.

I agree 👍 This is a sensible idea.

lukebakken · 2024-07-02T14:01:07Z

For what it's worth, I suggest looking into how OTel was added to the .NET client. There is a LOT of discussion here:

Evaluating how to support tracing and OpenTelemetry rabbitmq-dotnet-client#776
Add OpenTelemetry support via ActivitySource rabbitmq-dotnet-client#1261
Adding proper OpenTelemetry integration via. registration helpers and better context propagation rabbitmq-dotnet-client#1528

AndrewWinterman · 2024-07-30T04:30:06Z

I'll wire up what I can. I still think there's some open questions on the consumer side.

Hey, thank you for taking the time to contribute to this library. I'll respond in-line to the topics in the OP.

I'd like a broader discussion about whether these should be automatically called by the library where possible, or if they should simply be provided to clients to use if they so wish.

Given that this project is a library, it's a great opportunity to provide automatic native instrumentation. That means we should automatically create spans, and inject/extract context where it makes sense.

okay, sounds good.

they are at times ambiguous for rabbitmq-- e.g. is the destination for a message the Queue or the Consumer Tag the message was delivered to.

I agree that some semantics are ambiguous for RabbitMQ. I would advocate to adhere to the specific conventions for RabbitMQ described in this link, and do our best with ambiguities not covered (and document them!).

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

Given the "subscription" workflow of RabbitMQ (polling/pulling is highly discouraged), I think we can record the attributes of the subscription (and add more if necessary), and inject those into the spans upon receiving messages, and just before forwarding them into the Go channel. This idea needs validation, but I would prefer this over leaving the consumption instrumentation to the users.

The problem is that we need a way to transport the spans, and then an idiomatic way for clients to consume them downstream, so if you've called

	deliveries, err := c.channel.Consume(
		queue.Name, // name
		c.tag,      // consumerTag,
		*autoAck,   // autoAck
		false,      // exclusive
		false,      // noLocal
		false,      // noWait
		nil,        // arguments
	)

we now need to embed a span or a context (go authors say not to embed contexts, so span?) in the delivery. Is that amenable?

Each span is probably best treated as a new root span, and as implemented in my draft, gets a link to the publication that created it.

We can manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.

I agree 👍 This is a sensible idea.

AndrewWinterman · 2024-07-30T04:30:56Z

I suppose another way to go would be to provide a delivery#Context method that returns a newly constructed context. I think that's roughly equivalent to how http works.

…an/amqp091-go into feat/opentelemetry

AndrewWinterman · 2024-07-30T05:21:30Z

For what it's worth, I suggest looking into how OTel was added to the .NET client. There is a LOT of discussion here:

Evaluating how to support tracing and OpenTelemetry rabbitmq-dotnet-client#776

Add OpenTelemetry support via ActivitySource rabbitmq-dotnet-client#1261

Adding proper OpenTelemetry integration via. registration helpers and better context propagation rabbitmq-dotnet-client#1528

I admit I didnt go through this. I'm sure it's informative but it's not super accessbile to me (yet?). I haven't written any dotnet...

AndrewWinterman · 2024-07-30T05:21:59Z

channel.go

@@ -1492,7 +1492,7 @@ func (ch *Channel) Publish(exchange, key string, mandatory, immediate bool, msg
 /*
 PublishWithContext sends a Publishing from the client to an exchange on the server.

-NOTE: this function is equivalent to [Channel.Publish]. Context is not honoured.
+NOTE: Context termination is not honoured.


we're now using the context for span propagation in-process.

AndrewWinterman · 2024-07-30T05:23:23Z

delivery.go

+// the appropraite headers set. See [context-propagation] for more details
+//
+// [context-propagation]: https://opentelemetry.io/docs/concepts/context-propagation/
+func (d *Delivery) Span(ctx context.Context, options ...trace.SpanStartOption) (context.Context, trace.Span) {


this is an okay route-- clients can use their own context + span to indicate boundaries of a batch, and then get child spans for each delivery, with each span linked to the publication.

I also provide access to the Link for a delivery in case they want to combine multiple links into one span for their batch (that's what I would prefer in my use case, but that's because I'm defining a naturally batching consumer).

The tradeoff is that without storing additional state, we're relying on the client to tell us the context when we go to ack nack, which could lead to errors.

If we instead store the span on the delivery, we can close it when we ack, after inserting a child settle span for the ack itself. This has the implication that every consumer needs to settle the delivery even if their in autoack mode in order for them to see spans in their telemetry info (spans are usually not sent until they are closed). In autoack mode the settle method would just close the span, with no further implications at the wire level.

If we instead store the span on the delivery, we can close it when we ack, after inserting a child settle span for the ack itself

I prefer this approach, TBH. Relying on the user would make this implementation brittle. I'm ok with having a limitation handling autoack, because that's really not a recommended practice. autoack is synonym of YOLO I don't care about my data, just GO!

AndrewWinterman · 2024-07-30T05:24:23Z

delivery.go

+	}
+}
+
+func (d *Delivery) Settle(ctx context.Context, response DeliveryResponse, multiple, requeue bool) error {


I feel deeply ambivalent about this approach, but the alternative would seem to be providing (Ack,Nack,Rject)Ctx methods. TBH that's probably better.

I agree that (Ack,Nack,Reject)Ctx methods are probably a better alternative, and settle the delivery automagically in those functions.

AndrewWinterman · 2024-07-30T05:28:36Z

opentelemetry.go

+}
+
+// extractSpanFromReturn creates a span for a returned message
+func extractSpanFromReturn(


ahh I haven't wired the return up yet. Probably gets a similar treatment to Delivery, if that works.

The rabbitmq semconv specs does not mention how message returns should be instrumented, maybe we should open an issue in https://github.com/open-telemetry/semantic-conventions/issues asking for clarification.

…an/amqp091-go into feat/opentelemetry

wzy9607 · 2024-08-31T09:51:01Z

I don't think we should add opentelemetry spport directly to the amqp091-go package. It's better to implement the instrumentation in a sub package or separate repo (e.g. goredis instrumentation redisotel and gorm instrumentation go-gorm/opentelemetry), so that users who want to use amqp091-go without otel will not be forced to indirectly depends on otel modules.
I have made a draft version of instrumentation in https://github.com/wzy9607/amqp091otel, though I also havent's find a simple way to handle autoack=true.

However, I do think, to support instrumentation, some utilities should be added to the amqp091-go package. for example,

Mechanism to add middleware to Publish and Consume, which made it easier to instrument them and for users to use the instrumentation package. For reference, go-redis's Hook mechanism and how redisotel utilize it.
Standard methods for consumer to start a context.Context from Delivery and mark the end of it, so the instrumentation package can properly start and end a consumer process Span.
Method to obtain some channel property and server infos, which will make it easier for the instrumentation package to add those attributes, e.g. messaging.client.id, server.address.

Zerpet · 2024-09-13T09:29:04Z

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

Given the "subscription" workflow of RabbitMQ (polling/pulling is highly discouraged), I think we can record the attributes of the subscription (and add more if necessary), and inject those into the spans upon receiving messages, and just before forwarding them into the Go channel. This idea needs validation, but I would prefer this over leaving the consumption instrumentation to the users.

The problem is that we need a way to transport the spans, and then an idiomatic way for clients to consume them downstream, so if you've called
	deliveries, err := c.channel.Consume(
		queue.Name, // name
		c.tag,      // consumerTag,
		*autoAck,   // autoAck
		false,      // exclusive
		false,      // noLocal
		false,      // noWait
		nil,        // arguments
	)
we now need to embed a span or a context (go authors say not to embed contexts, so span?) in the delivery. Is that amenable?

I think this is acceptable. We can inject Span, or the necessary attributes to build a span, in the message header or properties.

Each span is probably best treated as a new root span, and as implemented in my draft, gets a link to the publication that created it.

I like this idea 👍 It's probably easier to reason about using links between spans, than creating a sub-span from a "publish" span. Specially if we consider the use case where a consumer may reject a message and re-queue it.

Zerpet · 2024-09-13T09:43:27Z

I don't think we should add opentelemetry spport directly to the amqp091-go package. It's better to implement the instrumentation in a sub package or separate repo (e.g. goredis instrumentation redisotel and gorm instrumentation go-gorm/opentelemetry), so that users who want to use amqp091-go without otel will not be forced to indirectly depends on otel modules. I have made a draft version of instrumentation in https://github.com/wzy9607/amqp091otel, though I also havent's find a simple way to handle autoack=true.

I'm ok with having the open telemetry bits in a different package. At the same time, the library should provide automatic instrumentation. I think it's ok to "force" consumers of the library to "depend" on OTEL modules is acceptable, because the API libraries are non-functional/no-op calls without the OTEL SDK. It will be the user's decision to import OTEL SDK to make the API calls functional.

However, I do think, to support instrumentation, some utilities should be added to the amqp091-go package. for example,

Mechanism to add middleware to Publish and Consume, which made it easier to instrument them and for users to use the instrumentation package. For reference, go-redis's Hook mechanism and how redisotel utilize it.

Standard methods for consumer to start a context.Context from Delivery and mark the end of it, so the instrumentation package can properly start and end a consumer process Span.

Method to obtain some channel property and server infos, which will make it easier for the instrumentation package to add those attributes, e.g. messaging.client.id, server.address.

Those suggestions are nice-to-have, but I'm not sure I understand why those utilities are necessary in order to support OTEL.

Zerpet

I left some comments in the discussions, and above in the main thread.

wzy9607 · 2024-09-15T05:09:49Z

I don't think we should add opentelemetry spport directly to the amqp091-go package. It's better to implement the instrumentation in a sub package or separate repo (e.g. goredis instrumentation redisotel and gorm instrumentation go-gorm/opentelemetry), so that users who want to use amqp091-go without otel will not be forced to indirectly depends on otel modules. I have made a draft version of instrumentation in https://github.com/wzy9607/amqp091otel, though I also havent's find a simple way to handle autoack=true.

I'm ok with having the open telemetry bits in a different package. At the same time, the library should provide automatic instrumentation. I think it's ok to "force" consumers of the library to "depend" on OTEL modules is acceptable, because the API libraries are non-functional/no-op calls without the OTEL SDK. It will be the user's decision to import OTEL SDK to make the API calls functional.

I'm fine with either approach.

Mechanism to add middleware to Publish and Consume, which made it easier to instrument them and for users to use the instrumentation package. For reference, go-redis's Hook mechanism and how redisotel utilize it.

Standard methods for consumer to start a context.Context from Delivery and mark the end of it, so the instrumentation package can properly start and end a consumer process Span.

Method to obtain some channel property and server infos, which will make it easier for the instrumentation package to add those attributes, e.g. messaging.client.id, server.address.

Those suggestions are nice-to-have, but I'm not sure I understand why those utilities are necessary in order to support OTEL.

2 is the same thing as this pr's Delivery.Span. 1 and 3 are utilities to make it easier for users to use separate package instrumentation and aren't necessary.

wzy9607 · 2024-09-13T14:28:38Z

delivery.go

+}
+
+func (d Delivery) Settle(ctx context.Context, response DeliveryResponse, multiple, requeue bool) error {
+    defer settleDelivery(ctx, &d, response, multiple, requeue)


Based on my understanding of messaging semconv, (“Settle” spans SHOULD be created for every manually or automatically triggered settlement operation. A single “Settle” span can account for a single message or for multiple messages (in case messages are passed for settling as batches). For each message it accounts for, the “Settle” span MAY link to the creation context of the message.)
the settle Span should start before calling Acknowledger.Ack() etc., and end right after Acknowledger.Ack() have returned.

wzy9607 · 2024-09-13T14:52:39Z

channel.go

+	if err != nil {
+		errFn(err)


errFn needs to be called regardless of there is an err or not, to properly ends the Span.

Suggested change

if err != nil {

errFn(err)

errFn(err)

if err != nil {

Maybe also rename errFn to endFn to make the intention clearer.

wzy9607 · 2024-09-13T14:57:30Z

opentelemetry.go

+    exchange, routinKey string,
+    immediate bool,
+) (context.Context, Publishing, func(err error)) {
+    spanName := fmt.Sprintf("%s publish", routinKey)


The specs recently changed to The span name SHOULD be {messaging.operation.name} {destination}.
And, to keep consistency with the example in the specs and .net implementation, messaging.destination.name attribute should be the exchange.
So, maybe:

destinationName := exchange if len(destinationName) == 0 { destinationName = "amq.default" } spanName := "publish " + destinationName ... trace.WithAttributes( semconv.MessagingDestinationName(destinationName),

wzy9607 · 2024-09-13T15:56:50Z

opentelemetry.go

+            semconv.MessagingMessageID(publishing.MessageId),
+            semconv.MessagingMessageConversationID(publishing.CorrelationId),


I think messaging.message.conversation_id and messaging.message.id attrs should only be set if non empty, as in .net implementation

wzy9607 · 2024-09-13T16:10:30Z

opentelemetry.go

+            semconv.MessagingMessageID(publishing.MessageId),
+            semconv.MessagingMessageConversationID(publishing.CorrelationId),
+            semconv.MessagingSystemRabbitmq,
+            semconv.MessagingClientIDKey.String(publishing.AppId),


AppId is not the rabbitmq client id, but a application specified per message header.
I think maybe config.Properties["connection_name"] could be used if set, see https://www.rabbitmq.com/docs/connections#client-provided-names.

AndrewWinterman and others added 2 commits June 28, 2024 15:11

Merge branch 'main' into feat/opentelemetry

d292598

AndrewWinterman marked this pull request as draft June 28, 2024 23:05

lukebakken requested review from Zerpet and lukebakken July 1, 2024 16:45

lukebakken self-assigned this Jul 1, 2024

AndrewWinterman and others added 3 commits July 29, 2024 22:20

feat(otel): take a stab at wiring otel up

75a6aeb

Merge branch 'feat/opentelemetry' of https://github.com/AndrewWinterm…

ccf814a

…an/amqp091-go into feat/opentelemetry

Merge branch 'main' into feat/opentelemetry

13a1894

AndrewWinterman commented Jul 30, 2024

View reviewed changes

AndrewWinterman added 3 commits July 29, 2024 22:29

fix: remove reference to outreach gobox lib

e0fa7c6

Merge branch 'feat/opentelemetry' of https://github.com/AndrewWinterm…

1aeb2d0

…an/amqp091-go into feat/opentelemetry

a smidge of polish

47aa58b

Zerpet reviewed Sep 13, 2024

View reviewed changes

wzy9607 reviewed Sep 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(otel): add opentelemety utility functions #272

feat(otel): add opentelemety utility functions #272

AndrewWinterman commented Jun 28, 2024 •

edited

Loading

AndrewWinterman commented Jun 28, 2024

Zerpet commented Jul 2, 2024

lukebakken commented Jul 2, 2024

AndrewWinterman commented Jul 30, 2024

AndrewWinterman commented Jul 30, 2024

AndrewWinterman commented Jul 30, 2024

AndrewWinterman Jul 30, 2024

AndrewWinterman Jul 30, 2024

Zerpet Sep 13, 2024

AndrewWinterman Jul 30, 2024

Zerpet Sep 13, 2024

AndrewWinterman Jul 30, 2024

wzy9607 Sep 15, 2024

wzy9607 commented Aug 31, 2024 •

edited

Loading

Zerpet commented Sep 13, 2024

Zerpet commented Sep 13, 2024

Zerpet left a comment

wzy9607 commented Sep 15, 2024

wzy9607 Sep 13, 2024

wzy9607 Sep 13, 2024

wzy9607 Sep 13, 2024 •

edited

Loading

wzy9607 Sep 13, 2024

wzy9607 Sep 13, 2024

		semconv.MessagingMessageID(publishing.MessageId),
		semconv.MessagingMessageConversationID(publishing.CorrelationId),

feat(otel): add opentelemety utility functions #272

Are you sure you want to change the base?

feat(otel): add opentelemety utility functions #272

Conversation

AndrewWinterman commented Jun 28, 2024 • edited Loading

AndrewWinterman commented Jun 28, 2024

Zerpet commented Jul 2, 2024

lukebakken commented Jul 2, 2024

AndrewWinterman commented Jul 30, 2024

AndrewWinterman commented Jul 30, 2024

AndrewWinterman commented Jul 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wzy9607 commented Aug 31, 2024 • edited Loading

Zerpet commented Sep 13, 2024

Zerpet commented Sep 13, 2024

Zerpet left a comment

Choose a reason for hiding this comment

wzy9607 commented Sep 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wzy9607 Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewWinterman commented Jun 28, 2024 •

edited

Loading

wzy9607 commented Aug 31, 2024 •

edited

Loading

wzy9607 Sep 13, 2024 •

edited

Loading