Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thriftbp/httpbp: Remove tracing from default middlewares #664

Closed
wants to merge 1 commit into from

Conversation

fishy
Copy link
Member

@fishy fishy commented Sep 24, 2024

Servers/clients should manually add v2 tracing middlewares if needed, or switch to v2 servers/clients.

Servers/clients should manually add v2 tracing middlewares if needed, or
switch to v2 servers/clients.
@fishy fishy requested a review from a team as a code owner September 24, 2024 21:27
@fishy fishy requested review from kylelemons, konradreiche and pacejackson and removed request for a team September 24, 2024 21:27
@mathyourlife-reddit
Copy link

should manually add v2 tracing middlewares if needed

This change could be jarring for service owners to have tracing unceremoniously dropped and need to scramble to understand why their deployments are suddenly missing observability information.

Copy link
Contributor

@pacejackson pacejackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics are their own middleware, not relying on the span framework right?

@fishy
Copy link
Member Author

fishy commented Sep 25, 2024

Metrics are their own middleware, not relying on the span framework right?

correct

Copy link
Contributor

@pacejackson pacejackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This change could be jarring for service owners to have tracing unceremoniously dropped and need to scramble to understand why their deployments are suddenly missing observability information.

I think this change should be fine? IIRC this implementation of tracing doesn't actually do anything anymore?

@fishy
Copy link
Member Author

fishy commented Sep 25, 2024

👍

This change could be jarring for service owners to have tracing unceremoniously dropped and need to scramble to understand why their deployments are suddenly missing observability information.

I think this change should be fine? IIRC this implementation of tracing doesn't actually do anything anymore?

So we tried to add limited interop to make some of the tracing v0 work auto translate to new tracing system, but the limited interop will cause it to spam logs for server and client spans, as in the tracing v0 implementation we have the assumption that the spans are of a certain type (from v0 implementation) that's broken with the interop. that assumption being broken also likely means that server and local client spans are not really interop'd into new tracing system correctly (I don't think anyone actually tested/verified that, we only verified that local spans interop works as expected).

so removing the default middlewares actually helps us to enable interop for local spans (local spans are not implemented in the new tracing library so v0 interop is actually the only way to use them right now), without the fear of spamming the logs from server and client spans.

Copy link
Contributor

@kylelemons kylelemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to get signoff from observability. There is a strong desire to get complete traces for some of our biggest queries today, and requiring code changes for all services in the graph likely multiplies the complexity and timeline required. I think we should strongly prefer fixes rather than punts.

The other question I have is whether this will break propagation of traces through these frameworks -- if they will not propagate their trace information, then it makes getting a complete trace even harder still.

@trevorriles
Copy link

After chatting with @redloaf, my understanding is that the httpbpv2 and thriftbpv2 libraries are in alpha and almost ready to ship. The cleanest path is likely to push for services to upgrade to those libraries to get tracing. My biggest hangup on that is GQL. GQL is actively using the http server, and also the thrift and http clients. Breaking tracing in GQL would be a big step backward for us.

I need to look a bit closer, but maybe we can fix the casting logic to work with the bridge spans in the interim?

@fishy
Copy link
Member Author

fishy commented Oct 4, 2024

@trevorriles A few things I want to emphasize here (some are moved from the slack thread):

  1. The middlewares I removed from being default all call AsSpan. They call AsSpan because opentracing API is not enough for what we need them to do so we need to get the underlying raw span to achieve what we need to achieve. But since interop is relying on opentracing API, I highly suspect those interop don't really work, so removing them should not put any dent on the interop.
  2. This PR just removes some middlewares from being the default, the middlewares themselves are still there, and if some of them are actually working with the interop, service owner can still add them on top of the default ones to keep using them.

@trevorriles
Copy link

trevorriles commented Oct 7, 2024

Should the InjectServerSpan logic already be short circuited by this code when using the transition library, without the need to remove it entirely?

if mw := internalv2compat.V2TracingThriftServerMiddleware(); mw != nil {
return mw
}

Removing them won't break interop, but I think as @kylelemons mentioned it would break the best-effort support of passing through services that are not using the interop library yet.

@fishy
Copy link
Member Author

fishy commented Oct 7, 2024

Should the InjectServerSpan logic already be short circuited by this code when using the transition library, without the need to remove it entirely?

if mw := internalv2compat.V2TracingThriftServerMiddleware(); mw != nil {
return mw
}

Removing them won't break interop, but I think as @kylelemons mentioned it would break the best-effort support of passing through services that are not using the interop library yet.

Oh I didn't know/notice we have those short-circuit.

So I think the problem is that we didn't short-circuit on the client middleware which caused the log spam (and we may or may not want to add server/client short-circuit for httpbp)

basically we need to either add short-circuit or remove the middleware from default for server/client middlewares on thriftbp/httpbp v0.

fishy added a commit that referenced this pull request Oct 7, 2024
For the following 3 tracing middlewares:

- thriftbp server
- thriftbp client
- httpbp client

The current behavior is to short-circuit to v2 compat middleware if
injected, or fallback to v0 tracing implementation if not injected.
Change the fallback logic to no-op, as the v0 tracing implementation is
not really working any more and it will spam logs.

For httpbp server tracing middleware, remove it from defaults, as the
current v2 compat implementation is to skip it when injected as the API
is not sufficient for the v2 compat implementation to work.

This replaces and closes
#664.
@fishy
Copy link
Member Author

fishy commented Oct 7, 2024

@trevorriles ok I created #665 to solve this problem differently.

fishy added a commit to fishy/baseplate.go that referenced this pull request Oct 7, 2024
For the following 3 tracing middlewares:

- thriftbp server
- thriftbp client
- httpbp client

The current behavior is to short-circuit to v2 compat middleware if
injected, or fallback to v0 tracing implementation if not injected.
Change the fallback logic to no-op, as the v0 tracing implementation is
not really working any more and it will spam logs.

For httpbp server tracing middleware, remove it from defaults, as the
current v2 compat implementation is to skip it when injected as the API
is not sufficient for the v2 compat implementation to work.

This replaces and closes
reddit#664.
fishy added a commit to fishy/baseplate.go that referenced this pull request Oct 7, 2024
For the following 3 tracing middlewares:

- thriftbp server
- thriftbp client
- httpbp client

The current behavior is to short-circuit to v2 compat middleware if
injected, or fallback to v0 tracing implementation if not injected.
Change the fallback logic to no-op, as the v0 tracing implementation is
not really working any more and it will spam logs.

For httpbp server tracing middleware, remove it from defaults, as the
current v2 compat implementation is to skip it when injected as the API
is not sufficient for the v2 compat implementation to work.

This replaces and closes
reddit#664.
fishy added a commit to fishy/baseplate.go that referenced this pull request Oct 7, 2024
For the following 3 tracing middlewares:

- thriftbp server
- thriftbp client
- httpbp client

The current behavior is to short-circuit to v2 compat middleware if
injected, or fallback to v0 tracing implementation if not injected.
Change the fallback logic to no-op, as the v0 tracing implementation is
not really working any more and it will spam logs.

For httpbp server tracing middleware, remove it from defaults, as the
current v2 compat implementation is to skip it when injected as the API
is not sufficient for the v2 compat implementation to work.

This replaces and closes
reddit#664.
fishy added a commit to fishy/baseplate.go that referenced this pull request Oct 8, 2024
For the following 3 tracing middlewares:

- thriftbp server
- thriftbp client
- httpbp client

The current behavior is to short-circuit to v2 compat middleware if
injected, or fallback to v0 tracing implementation if not injected.
Change the fallback logic to no-op, as the v0 tracing implementation is
not really working any more and it will spam logs.

For httpbp server tracing middleware, remove it from defaults, as the
current v2 compat implementation is to skip it when injected as the API
is not sufficient for the v2 compat implementation to work.

This replaces and closes
reddit#664.
@fishy fishy closed this in #665 Oct 8, 2024
@fishy fishy closed this in ec4a670 Oct 8, 2024
@fishy fishy deleted the remove-tracing-default-middlewares branch October 8, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants