-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvclient,server: propagate pprof labels for BatchRequests #101404
Conversation
I wanted to get some feedback on this approach and whether we need a benchmark on how expensive it is for us to check for pprof labels on each sent BatchRequest. I tried this patch out on a roachprod cluster where the majority of ExportRequests were sent from n7,8,9 but were processed on n1,2,3 and it seems to work as expected. This is a profile on remote n1 while the coordinator of the job is on n9:
Profile with
|
@tbg I'm keen to get your take on this approach. We wanted something that we could potentially backport to older releases and thought that this change gave us enough coverage to begin with (all BatchRequests would propagate labels). The alternative would be writing grpc interceptors to get complete coverage but that will probably be harder to reason about for a backport. Are there any benchmarks or optimizations that you would like to see as part of this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @miretskiy, and @tbg)
pkg/server/node.go
line 1251 at r2 (raw file):
} else { // We had this tag before the ResetAndAnnotateCtx() call above. ctx = logtags.AddTag(ctx, "tenant", tenantID.String())
This should really be AddTag(ctx, "tenant", tenantID)
so it doesn't become redactable. Feel free to add a commit to do that.
This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly. Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node. Fixes: cockroachdb#100166 Release note: None
Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @knz, @miretskiy, and @tbg)
pkg/server/node.go
line 1251 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
This should really be
AddTag(ctx, "tenant", tenantID)
so it doesn't become redactable. Feel free to add a commit to do that.
Done.
TFTR! bors r=dt |
Build succeeded: |
blathers backport 23.1 22.2 |
I will let this bake on master for a couple of weeks before merging the backports |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from e9883f5 to blathers/backport-release-22.2-101404: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly.
Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node.
Fixes: #100166
Release note: None