rpc,sql: propagate pprof labels for SQL queries to KV #86012

tbg · 2022-08-12T07:07:26Z

Is your feature request related to a problem? Please describe.

While CPU profiling, we usually generate pprof labels that could help pinning work to queries. This is all pretty half-baked (I think as is it misses lots of queries due to the code path it annotates), but it has been helpful in some investigations.

cockroach/pkg/sql/conn_executor_exec.go

Lines 138 to 146 in 477d1e8

    
           labels := pprof.Labels( 
        
           	"appname", ex.sessionData().ApplicationName, 
        
           	"addr", remoteAddr, 
        
           	"stmt.tag", ast.StatementTag(), 
        
           	"stmt.no.constants", stmtNoConstants, 
        
           ) 
        
           pprof.Do(ctx, labels, func(ctx context.Context) { 
        
           	ev, payload, err = ex.execStmtInOpenState(ctx, parserStmt, prepared, pinfo, res, canAutoCommit) 
        
           })

However, these profiler labels don't propagate across RPC boundaries, and so for any query that fans out, only the local portion will have the labels (since there is a direct goroutine descendancy relationship between the goroutine we annotated and the goroutines that do the work). In particular, if the expensive portion of the processing happens on a remote node, it will not be annotated. The investigator may then be mislead into thinking that another query that is local (but maybe not expensive) is driving the load, which would could lead to a lengthy detour.

Describe the solution you'd like

Propagate pprof labels across RPC boundaries. The right way to do this is likely gRPC interceptors. I wrote them as part of a past prototype, #60508, so some code exists but isn't merged.

Ideally while revisiting this we also move the label annotation to a more useful place, noting what was done (but not merged) in #85977 as I realized there that we weren't capturing all classes of statememts. If we are to use this feature, it can't have big blind spots; wrong information is worse than no information in this case.

Describe alternatives you've considered

We could decide that we don't wish to leverage profiler labels to find offending queries, and instead rely on something else, for example #85977. However, pprof labels seem like a good stopgap measure and I think most of the work is already done.

Additional context

Jira issue: CRDB-18526

irfansharif · 2022-08-12T13:35:15Z

Would we always annotate contexts with profiler labels containing statement data? Even if that node wasn't currently being profiled? Seeing as how we might be grabbing a CPU profile on a remote node where the statement didn't originate and would be relying on propagation to have happened. We should evaluate the overhead, if any, if always annotating. #85948 describes some techniques we use in KV to make this more alloc efficient.

tbg · 2022-08-12T14:19:45Z

That's a good point. If the originator of the work sits elsewhere, we need to tell it to put the labels.
The cheap way to do it is to profile the entire cluster at the same time, over a long enough interval (so that the overlap is ~100% of the time); I think 10s will already be enough.

We could also try to always add labels but it involves a longer performance fight and I tend to think that fleet-wide profiling is a good enough workaround.

sumeerbhola · 2023-01-31T20:16:46Z

I would really like for us to build per-statement distributed CPU profiling. In a past life this has been invaluable in understanding slow queries, and driving system performance improvements, in a way that tracing was inadequate for. It is very hard to understand what code paths are responsible for a slow query, in a production environment, since there are other queries running concurrently (this can be non-trivial even in complex benchmarks where many queries are running) and it is hard to even know which nodes a query will touch. See https://github.com/cockroachlabs/support/issues/2033#issuecomment-1409654095 of an example where traces did not help us understand the root cause of the slowness.

This would roughly work as follows:

At most one query profile can be running at a node at a time. This is fine since this will be used infrequently, and we can assume human coordination to typically only run one of these in a cluster at a time.
At the node level there is a refcount of the "root" goroutines that have "joined" the currently running profile, and the statement ID of that running profile. When a new "root" goroutine (say incoming RPC's goroutine) sees that its payload is requesting profiling it compares its payload's statement ID with the running profile and either (a) starts a cpu profile (if no profile is running), and joins it by setting a go profiler label, (b) joins the existing profiling by setting a profiler label, (c) remembers that it could not profile so it can return a profiling-error when it eventually returns the results. (a) and (b) increment the refcount.
- The profile for a statement can be started and stopped at a node multiple times (due to multiple passes, e.g. multiple BatcheRequests etc.).
When the "root" goroutine is done, it decrements the refcount and if the refcount drops to 0, it stops the profile, fetches the result, filters out the part that does not match the profiler label, and packages this to piggyback in the RPC response.
The "real" root is at the gateway node that started executing the statement, and it aggregates the responses received and its own profile, buckets it by node, and writes it to files when the statement execution is complete. This allows us to look at profiles per node (to discover which node was responsible for the slowness) and also as an aggregate.

kevin-v-ngo · 2023-02-07T18:58:27Z

Chatting with @maryliag, SQL Obs will need to collaborate with potentially KV or other subject matter experts. FYI @thtruo

thtruo · 2023-02-07T21:09:56Z

Thanks for the heads up Kevin! cc @nkodali for your awareness

jordanlewis · 2023-02-11T21:55:17Z

DistSQL flow requests carry their initiating SQL in the request.

So, to enable this for DistSQL, it should be straightforward: we can run pprof.Labels when the DistSQL flow begins execution on the remote node, tagging the stack with stmt.no.constants (and maybe also with distsql.remote or something, so it's clear which ones are remote).

jordanlewis · 2023-02-13T15:56:54Z

Update: #82464 solved the missing DistSQL label propagation in 22.2, but there's still a need to propagate the pprof labels into the KV layer as per the original issue.

tbg · 2023-02-15T11:36:09Z

Update: #82464 solved this in 22.2.

That's great! This is a significant improvement over the status quo. Are we aware of cases where this wouldn't have been enough? I think if we had a node which had lots of leases for write-heavy ranges, but the writes originated from a remote gateway, we would still be flying blind (if there were lots of reads, it's likely though not guaranteed that there'd be a local DistSQL leaf, which would have the tags).

jordanlewis · 2023-02-15T13:41:23Z

Yeah, there are still plenty of cases where we'd need the KV tag propagation, like the cases you mention as well as point reads (don't use DistSQL), index joins, zigzag joins, lookup joins, etc. My comment about "solving" was just about the DistSQL but it wasn't entirely clear, I've edited it to prevent confusion.

tbg · 2023-06-16T07:45:53Z

Summarizing the current state.

The motivating example: two nodes A, B where A drives all of the load (i.e. is SQL gateway) for B, and we want to understand which query/user/app drives the load. Currently, if we profile on B, we don't get any labels (note that we don't necessarily know which node A is; there might be 100s of nodes in this cluster).
Even if we profile on A and B concurrently, we don't get any labels because labels on A are not encoded on the wire in RPCs (and not added to the goroutine's labels on the receiving handler on B even if they were).

Scope of this issue:

the scope of this issue continues to propagate pprof labels (if they are set) on the sender side of an RPC to the receiver side of an RPC (if it is also profiling).
we have server: add debug/pprof/ui/cpu/?node=all endpoint for cluster-wide CPU profile #102734 so we can easily get all nodes to profile simultaneously and get a merged profile with complete labelling, so this approach just needs to be adopted by Support for releases that support it (23.2+)

Related ideas (please discuss them on their respective tracking issues):

can we tag trace spans as "requesting a CPU profile/Go execution trace" and start/stop profiling on all nodes exactly when the trace is active on them? This is roughly what @sumeerbhola describes above¹. It is tracked separately in obs: integrate on-demand CPU profiling / Go execution tracing with CRDB tracing #97215 and @adityamaru has a work in progress, [WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235.
for load spikes that are unpredictable, can we cheaply take repeated profiles until we find the event we were looking for? Assuming the profiles are complete (i.e. this issue), we can use pprof-loop + the ?all endpoint: cli: build in pprof-loop.sh for CPU profiles and Go execution traces #97174 (comment), this works "today" for traces (mod partial labelling) and we could take a similar approach for exec traces: pprof: add /debug/pprof/trace?node=all #105035

https://github.com/cockroachdb/cockroach/issues/86012#issuecomment-1411004870 ↩

adityamaru · 2023-06-16T12:42:46Z

the scope of this issue continues to propagate pprof labels (if they are set) on the sender side of an RPC to the receiver side of an RPC (if it is also profiling).

@tbg have you seen #101404? The hypothesis was that handling this for BatchRequests gets us most of the way there.

tbg · 2023-06-16T12:51:36Z

I hadn't! Great, that's the missing piece. We could always do this "more holistically" but this is fine, let's close this issue since all follow-up items are tracked separately.

tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability-inf labels Aug 12, 2022

blathers-crl bot added the A-observability-inf label Aug 12, 2022

tbg mentioned this issue Oct 13, 2022

server: add cpuprofile_dumper #75799

Closed

kevin-v-ngo added the T-kv-observability label Feb 7, 2023

blathers-crl bot added the A-kv-observability label Feb 7, 2023

tbg mentioned this issue Feb 15, 2023

obs: integrate on-demand CPU profiling / Go execution tracing with CRDB tracing #97215

Open

exalate-issue-sync bot added T-cluster-observability and removed T-observability-inf labels Mar 16, 2023

adityamaru mentioned this issue Mar 30, 2023

kvserver: propagate job pprof labels to KV requests sent on behalf of the job #100166

Closed

exalate-issue-sync bot added T-observability-inf and removed T-cluster-observability labels Jun 15, 2023

tbg closed this as completed Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc,sql: propagate pprof labels for SQL queries to KV #86012

rpc,sql: propagate pprof labels for SQL queries to KV #86012

tbg commented Aug 12, 2022 •

edited by cockroach-jira-scripts

Loading

irfansharif commented Aug 12, 2022

tbg commented Aug 12, 2022

sumeerbhola commented Jan 31, 2023

kevin-v-ngo commented Feb 7, 2023 •

edited

Loading

thtruo commented Feb 7, 2023

jordanlewis commented Feb 11, 2023

jordanlewis commented Feb 13, 2023 •

edited

Loading

tbg commented Feb 15, 2023

jordanlewis commented Feb 15, 2023

tbg commented Jun 16, 2023 •

edited

Loading

adityamaru commented Jun 16, 2023

tbg commented Jun 16, 2023

rpc,sql: propagate pprof labels for SQL queries to KV #86012

rpc,sql: propagate pprof labels for SQL queries to KV #86012

Comments

tbg commented Aug 12, 2022 • edited by cockroach-jira-scripts Loading

irfansharif commented Aug 12, 2022

tbg commented Aug 12, 2022

sumeerbhola commented Jan 31, 2023

kevin-v-ngo commented Feb 7, 2023 • edited Loading

thtruo commented Feb 7, 2023

jordanlewis commented Feb 11, 2023

jordanlewis commented Feb 13, 2023 • edited Loading

tbg commented Feb 15, 2023

jordanlewis commented Feb 15, 2023

tbg commented Jun 16, 2023 • edited Loading

Footnotes

adityamaru commented Jun 16, 2023

tbg commented Jun 16, 2023

tbg commented Aug 12, 2022 •

edited by cockroach-jira-scripts

Loading

kevin-v-ngo commented Feb 7, 2023 •

edited

Loading

jordanlewis commented Feb 13, 2023 •

edited

Loading

tbg commented Jun 16, 2023 •

edited

Loading