-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats.Handler
's HandleRPC
is called with invalid context
#6928
Comments
I just wrote a test locally, and I don't see any issues with any callouts not containing the information propagated in the context. This includes for the event which triggered this panic, out trailers. Still trying to figure out what is going on in opentelemetry collector though. |
I read the stats handler code in the collector repo and wrote a test locally that has no problem server side in the case of trailers, pulling a value out of the context populated in TagRPC. I have to ask, is there a mini repro I can run. I see it runs behind layers of deployments in the top level issue and it can't seem to be reproduced locally, but I think that would be super useful here. I'd love to get to the bottom of this, here to help. Thank you! |
My team is experiencing the exact same issue in deployment but we are unable to reproduce it locally. Can also confirm that v1.59.0 does not cause the panic but v1.60.1 does. I'll lyk if we're able to get a mini repro. |
Hmm ok @rpadaki. Is this the same repository, or are you on a different project that is also running a stats handler that sees this same issue? |
We are a different project with the same issue using the stats handler behind a load balancer. Still no success with a reproduction unfortunately |
Is it a problem just in the status stats callout or all of the stats callouts server side? |
For increased visibility: we have some grpcdump logs comparison here with and without the issue on the OpenTelemetry Collector open-telemetry/opentelemetry-collector#9296 (comment) |
Ok thanks for that. Looking through the logs, it looks like there is a health checking issue. How does this link with the configured stats handler that is having a problem pulling something out of the context on a WriteStatus call that it expects to be populated from the Tag call? |
@rpadaki are you using the same OTel repro as the issue that came up? Also, we just discussed this at a team and in order to narrow it down to our repository (there are a lot of layers here) we would need a reproduction, as my test locally cannot reproduce this problem. |
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
For clarity: I doubt I will have time to do any further research about this in the next 7 days |
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
What version of gRPC are you using?
This happens with v1.60.1 but not with v1.59.0
What version of Go are you using (
go version
)?v1.21.0
What operating system (Linux, Windows, …) and version?
Happens at least on Amazon Linux, but I don't think this is specific to given operating system.
What did you do?
The opentelemetry-go-contrib project has some instrumentation for grpc-go. This project defines a
stats.Handler
. We use this in the OpenTelemetry Collector and have seen reports of crashes on its latest version, see open-telemetry/opentelemetry-collector/issues/9296.We unfortunately don't have a minimal example to reproduce this at this time.
What did you expect to see?
No crashes :) I would expect the context value set on the
TagRPC
call to always be recoverable on theHandleRPC
call.What did you see instead?
The context does not have this value. See details on open-telemetry/opentelemetry-collector/issues/9296, the crash trace is:
Additional details
I think this is a bug because of the comment here:
grpc-go/stats/opencensus/opencensus.go
Lines 204 to 207 in ddd377f
The opentelemetry-go-contrib maintainers also think this is a bug. I filed open-telemetry/opentelemetry-go-contrib/pull/4825 to make the code protected against this, but still this seems like something worth looking into in grpc-go.
I did a first pass to try and narrow down what change would have caused this, my guess it that it would be #6716 or, less likely #6750. Maybe @zasweq cam help here?
The text was updated successfully, but these errors were encountered: