-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: propagate job pprof labels to KV requests sent on behalf of the job #100166
Labels
A-disaster-recovery
A-jobs
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-disaster-recovery
Comments
adityamaru
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-jobs
labels
Mar 30, 2023
cc @cockroachdb/disaster-recovery |
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Apr 13, 2023
This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly. Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node. Fixes: cockroachdb#100166 Release note: None
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Apr 20, 2023
This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly. Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node. Fixes: cockroachdb#100166 Release note: None
craig bot
pushed a commit
that referenced
this issue
Apr 22, 2023
101404: kvclient,server: propagate pprof labels for BatchRequests r=dt a=adityamaru This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly. Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node. Fixes: #100166 Release note: None Co-authored-by: adityamaru <[email protected]>
blathers backport 23.1 22.2 |
I will let this bake on master for a couple of weeks before merging the backports |
blathers-crl bot
pushed a commit
that referenced
this issue
Apr 22, 2023
This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly. Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node. Fixes: #100166 Release note: None
smg260
pushed a commit
to smg260/cockroach
that referenced
this issue
Apr 24, 2023
This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly. Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node. Fixes: cockroachdb#100166 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
A-jobs
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-disaster-recovery
Similar to SQL queries (#86012) jobs would also like their pprof labels to be propagated to kvserver so that requests issued on behalf of the job are tagged as such when profiling is enabled. We already propagate pprof labels across DistSQL boundaries since #91630 so plumbing it into the kvserver will help paint a complete picture.
Note, the coordinator node of the job will always annotate its context with labels regardless of whether the coordinator node is being actively profiled or not. The performance impact of doing this is hypothesized to be negligible when compared to the long-running operation that the job will likely perform. Concretely, there are two ways we can go about propagating these labels to kvserver:
Option 1:
Every request that wishes to have its labels propagated sets a field on the
BatchRequest.Header
containing the profiler labels that the sender's context is annotated with.kvserver
then annotates the context that will serve the request if the node the request is being served on is actively being profiled. This ensures that we do not incur a performance hit when not actively profiling. Note, to get a complete picture, cluster-wide profiling would have to be enabled so as to propagate labels to requests across all nodes.This option is self-contained, and can probably be backported to stable release branches as well.
Option 2:
As prototyped in #60508 we set up gRPC interceptors that propagate the labels across RPC boundaries. This option would be more generalized and allows non-job SQL queries to also benefit from the propagation of labels. This option would probably be harder to backport.
Epic: CRDB-8964
Jira issue: CRDB-26281
The text was updated successfully, but these errors were encountered: