Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: propagate job pprof labels to KV requests sent on behalf of the job #100166

Closed
adityamaru opened this issue Mar 30, 2023 · 3 comments · Fixed by #101404
Closed

kvserver: propagate job pprof labels to KV requests sent on behalf of the job #100166

adityamaru opened this issue Mar 30, 2023 · 3 comments · Fixed by #101404
Assignees
Labels
A-disaster-recovery A-jobs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Mar 30, 2023

Similar to SQL queries (#86012) jobs would also like their pprof labels to be propagated to kvserver so that requests issued on behalf of the job are tagged as such when profiling is enabled. We already propagate pprof labels across DistSQL boundaries since #91630 so plumbing it into the kvserver will help paint a complete picture.

Note, the coordinator node of the job will always annotate its context with labels regardless of whether the coordinator node is being actively profiled or not. The performance impact of doing this is hypothesized to be negligible when compared to the long-running operation that the job will likely perform. Concretely, there are two ways we can go about propagating these labels to kvserver:

Option 1:

Every request that wishes to have its labels propagated sets a field on the BatchRequest.Header containing the profiler labels that the sender's context is annotated with. kvserver then annotates the context that will serve the request if the node the request is being served on is actively being profiled. This ensures that we do not incur a performance hit when not actively profiling. Note, to get a complete picture, cluster-wide profiling would have to be enabled so as to propagate labels to requests across all nodes.

This option is self-contained, and can probably be backported to stable release branches as well.

Option 2:

As prototyped in #60508 we set up gRPC interceptors that propagate the labels across RPC boundaries. This option would be more generalized and allows non-job SQL queries to also benefit from the propagation of labels. This option would probably be harder to backport.

Epic: CRDB-8964

Jira issue: CRDB-26281

@adityamaru adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-jobs labels Mar 30, 2023
@blathers-crl blathers-crl bot added the T-jobs label Mar 30, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 30, 2023

cc @cockroachdb/disaster-recovery

adityamaru added a commit to adityamaru/cockroach that referenced this issue Apr 13, 2023
This change teaches DistSender to populate a BatchRequest's
header with the pprof labels set on the sender's context.
If the node processing the request on the server side has CPU
profiling with labels enabled, then the labels stored in the BatchRequest
will be applied to the root context of the goroutine executing the
request on the server. Doing so will ensure that all server-side
CPU samples of the root goroutine and all its spawned goroutine
will be labeled correctly.

Propagating these labels across RPC boundaries is useful to correlate
server side samples with the sender. For example, in a CPU profile
generated with this change, we will be able to identify which backup
job sent an ExportRequest that is causing a CPU hotspot on a remote node.

Fixes: cockroachdb#100166
Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Apr 20, 2023
This change teaches DistSender to populate a BatchRequest's
header with the pprof labels set on the sender's context.
If the node processing the request on the server side has CPU
profiling with labels enabled, then the labels stored in the BatchRequest
will be applied to the root context of the goroutine executing the
request on the server. Doing so will ensure that all server-side
CPU samples of the root goroutine and all its spawned goroutine
will be labeled correctly.

Propagating these labels across RPC boundaries is useful to correlate
server side samples with the sender. For example, in a CPU profile
generated with this change, we will be able to identify which backup
job sent an ExportRequest that is causing a CPU hotspot on a remote node.

Fixes: cockroachdb#100166
Release note: None
craig bot pushed a commit that referenced this issue Apr 22, 2023
101404: kvclient,server: propagate pprof labels for BatchRequests r=dt a=adityamaru

This change teaches DistSender to populate a BatchRequest's header with the pprof labels set on the sender's context. If the node processing the request on the server side has CPU profiling with labels enabled, then the labels stored in the BatchRequest will be applied to the root context of the goroutine executing the request on the server. Doing so will ensure that all server-side CPU samples of the root goroutine and all its spawned goroutine will be labeled correctly.

Propagating these labels across RPC boundaries is useful to correlate server side samples with the sender. For example, in a CPU profile generated with this change, we will be able to identify which backup job sent an ExportRequest that is causing a CPU hotspot on a remote node.

Fixes: #100166
Release note: None

Co-authored-by: adityamaru <[email protected]>
@craig craig bot closed this as completed in e9883f5 Apr 22, 2023
@adityamaru
Copy link
Contributor Author

blathers backport 23.1 22.2

@adityamaru
Copy link
Contributor Author

I will let this bake on master for a couple of weeks before merging the backports

blathers-crl bot pushed a commit that referenced this issue Apr 22, 2023
This change teaches DistSender to populate a BatchRequest's
header with the pprof labels set on the sender's context.
If the node processing the request on the server side has CPU
profiling with labels enabled, then the labels stored in the BatchRequest
will be applied to the root context of the goroutine executing the
request on the server. Doing so will ensure that all server-side
CPU samples of the root goroutine and all its spawned goroutine
will be labeled correctly.

Propagating these labels across RPC boundaries is useful to correlate
server side samples with the sender. For example, in a CPU profile
generated with this change, we will be able to identify which backup
job sent an ExportRequest that is causing a CPU hotspot on a remote node.

Fixes: #100166
Release note: None
smg260 pushed a commit to smg260/cockroach that referenced this issue Apr 24, 2023
This change teaches DistSender to populate a BatchRequest's
header with the pprof labels set on the sender's context.
If the node processing the request on the server side has CPU
profiling with labels enabled, then the labels stored in the BatchRequest
will be applied to the root context of the goroutine executing the
request on the server. Doing so will ensure that all server-side
CPU samples of the root goroutine and all its spawned goroutine
will be labeled correctly.

Propagating these labels across RPC boundaries is useful to correlate
server side samples with the sender. For example, in a CPU profile
generated with this change, we will be able to identify which backup
job sent an ExportRequest that is causing a CPU hotspot on a remote node.

Fixes: cockroachdb#100166
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-jobs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant