cli: build in pprof-loop.sh for CPU profiles and Go execution traces #97174

tbg · 2023-02-15T13:30:58Z

Is your feature request related to a problem? Please describe.

We have the pprof-loop script¹ which helps us periodically collect cluster-wide profiles. This is necessary, for example, when we are experiencing rare events which need to be introspected with Go runtime support (NUMA issues, GC pressure, generally unexplainable latency in traces), or there are intermittent spikes of high CPU activity that are difficult to catch in a manual profile²

In all such cases, we have customers run the script over a longer period of time until the event of interest occurs.

The script is hard to use, since it needs to be invoked on all nodes in the cluster simultaneously, followed by an artifacts collection step.

If we added an out-of-the-box solution that fanned out to the cluster (or a specified set of nodes) and collected the results in a single directory, this would be much easier.

Describe the solution you'd like

Build that out-of-the-box solution, with option to get CPU profile or Go execution trace (both are important in different contexts, though CPU is easier since we're almost there). Here is a prototype: #96749

Replace the custom 10s fan-out CPU profile with an invocation of this tool, for a 10s CPU profile and a subsequent 1s runtime trace.

Describe alternatives you've considered

Additional context

Jira issue: CRDB-28055

Epic CRDB-32402

https://github.com/cockroachdb/cockroach/blob/master/scripts/pprof-loop.sh ↩
though in such cases hopefully the about-to-be-introduced CPU profiler will get us there right away! ↩

kevinkokomani · 2023-05-09T20:51:36Z

For me, ideally there would be a place in the DB Console -> Advanced Debug page where I can input the types of profiles I want and the node(s) I want to gather the profiles on, hit a button and continually gather profiles until I hit a button again (or have some length of time I can configure it for). And I would get a zip download that is a directory of node-specific folders with subdirectories for each of the profile types I gathered, with the timestamp in each profile's filename.

tbg · 2023-05-16T16:03:07Z

@kevinkokomani points out that the script likely doesn't work with secure clusters. Another reason to build it into CRDB. We can change the script so that the user supplants a working "curl" invocation, but this adds even more friction (need to use cockroach auth-session login <sql_user> --certs-dir=<certs_dir> etc).

tbg · 2023-05-17T08:40:58Z

There is #102734 which is related, though not quite the same, since pprof-loop also allows runtime traces, etc, and targets a single node.

@kevinkokomani it would be helpful to get TSEs opinion on what gaps are most important after #102734.

kevinkokomani · 2023-05-25T20:23:01Z

@tbg Sorry, I'm just seeing this. Reading through #102734, seems like it proposes a merged point-in-time CPU profile for troubleshooting cluster-wide issues. I'm not sure if this addresses the same issue we want to address here. Having a merged cluster-wide CPU profile is nice, but what we're after in this issue is the ability to capture profiles in situations where the spikes are very short and sharp. When the CPU increases are sustained, we can easily grab a CPU profile at our leisure, even for multiple nodes. But when they're not sustained and instead "random" and spiky, the pprof-loop is our only recourse. Allowing us to continually gather CPU profiles in chunks over a period of time also gives us a continuous stream of data that lets us compare CPU valleys and peaks easily, which can be quite useful.

That said, is this all moot given the fact that (AFAIK, at least) we are planning to implement the automatic CPU profiling when spikes are detected, similar to heap profile? If that is true but productionizing pprof-loop is still seen as useful, would that be because there are still expected cases where spikes are short and severe enough so as not to be captured by an automatic profiler?

In terms of which profiling endpoints are the most important in general; we are normally well covered with heap profiles due to the automatic heap profiling + ability to gather them on-demand. CPU profiles are the next most common thing we need to look at (if not the most common thing). All other endpoints are much less often used.

edit: I've also shared this to field some more opinions.

NigelNavarro · 2023-05-25T21:13:44Z

@tbg here are some of my thoughts about pprofs as a whole:

As it currently stands, the "graph" option in the CPU/Heap pprof doesn't always load (providing a text message instead that says that we should perhaps install graphviz, even if the latest is installed).
- The thing is, this does not mean that the pprof is corrupted, because if you take the base URL (such as http://localhost:12345/ui/) and append another page besides the main "graph" page (such as http://localhost:12345/ui/flamegraph), the pprof is able to pull up the flamegraph successfully. I don't know who or what is in charge of pulling up pprof details, but this inconsistency needs to be addressed.
We have attempted before (See this draft TSE KB) to help ourselves understand the contents of the pprof. While those with much longer experience reading these graphs can generally decipher what may be going on based off of a historical-analysis approach, we realistically don't have some sort of encyclopedia or definition table that can help us understand the patterns and behaviors in the pprofs themselves. This isn't sustainable long term, and will continue to be something we page #KV to assist with.
Gathering pprofs as immediately as we request them is crucial for understanding issues as they happen. Times when we would like pprofs but aren't readily available to obtain are (but not limited to):
- CPU peak saturation and near-OOM scenarios
- Intermittent spikes in CPU that only last for seconds/minutes
- Cluster unresponsive scenarios (DB Console/CLI unavailability, Asymmetric Network Partitions, etc.)
This is where automatic pprofs saved would be helpful, taken periodically and/or at a designated window of the user's choosing (thinking along the lines of a cluster setting to X profiles per second or something).
- I'm not sure of how the intricacies work with profiling, but if disk space is a concern, perhaps there is a solution out there that utilizes pointers instead, allowing a pprof request to point to a certain timeframe, and the pprof is generated afterwards based off of historical data (like from a snapshot).
Similar to the previous point, historical CPU/Memory analysis is just as important as being able to ad-hoc capture a pprof immediately. As Kevin mentions above, comparing CPU behavior over time is quite helpful in establishing pattern that may lead to a specific workload.
Tying pprof behavior back to workload has always been a primary concern. For each of our cases where pprofs are gathered, we had to use the results of the analysis to approximate what potential query or set of transactions resulted in the CPU/Memory to spike/cause concern. I know that we have some things to address this in 23.1; however, we must continually make an active effort to make it easier to correlate CPU/Memory pprofs to the offending transactions.

To summarize, CPU and Memory pprofs are incredibly powerful... but who are they most useful for? If we're going to make them much more usable for TSEs, readability and freedom to quickly acquire pprofs are going to be the primary requirements to make this happen.

tbg · 2023-05-26T14:37:31Z

s it currently stands, the "graph" option in the CPU/Heap pprof doesn't always load (providing a text message instead that says that we should perhaps install graphviz, even if the latest is installed).
The thing is, this does not mean that the pprof is corrupted, because if you take the bas

@NigelNavarro see the workaround for that issue in #101523 (comment), mind documenting this somewhere the TSEs can find?

Thanks for the other points, I think the CPU pprofs are supposedly much better in 23.1 because they contain the labels for the SQL statements. I think the jury is still out on how well the automatic CPU profiles work, for one datapoint: they default to off, so they would only be available after an additional round-trip to the customer and a reoccurrence.

To summarize, CPU and Memory pprofs are incredibly powerful... but who are they most useful for? If we're going to make them much more usable for TSEs, readability and freedom to quickly acquire pprofs are going to be the primary requirements to make this happen.

I think we struggle to give the L2 teams the right profiles at the right time, that seems like a good plumbing problem to solve, if then the profiles also turn out to be good enough for TSE that would be an added bonus, but I think proper labels should go a very long way at least when the spikes are workload-induced.

NigelNavarro · 2023-05-27T03:07:38Z

@NigelNavarro see the workaround for that issue in #101523 (comment), mind documenting this somewhere the TSEs can find?

Sure thing, caaan do! I've added to the "troubleshooting steps" section of this KB I created not too long ago. You're welcome to edit it and add any additional verbiage on what the hack is actually doing if you'd like.

Thanks for reading all of my pprof observations/concerns. I'm excited to see what we come up with!

tbg · 2023-06-16T07:41:40Z

In #102734, @adityamaru added an endpoint to collect a cluster-wide CPU profile. So pointing today's pprof-loop at that endpoint should take the difficulty out of the process, at least when a CPU profile is requested. For traces, we could take a similar approach, filed #105035.

thtruo · 2023-08-08T19:18:54Z

Had a quick offline conversation with @kevinkokomani about the desired UX for TSEs for sidestepping debug zips and getting access to CPU profiles in a more convenient manner. Noting it here so we don't lose track:

I imagine this would be implemented as like a button in the DB console which provides the following configuration options:

which profiles to get (heap, CPU, goroutine, etc)

which nodes to get profiles from

how long to get profiles for

one time

for a discrete amount of time (where this pprof loop idea comes in)

indefinitely, until I issue a cancel

tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability-inf labels Feb 15, 2023

blathers-crl bot added the A-observability-inf label Feb 15, 2023

tbg mentioned this issue Feb 15, 2023

obs: integrate on-demand CPU profiling / Go execution tracing with CRDB tracing #97215

Open

tbg added T-observability-inf and removed T-observability-inf labels May 17, 2023

healthy-pod added T-observability-inf and removed T-observability-inf labels May 17, 2023

tbg mentioned this issue Jun 16, 2023

pprof: add /debug/pprof/trace?node=all #105035

Open

tbg mentioned this issue Jun 16, 2023

rpc,sql: propagate pprof labels for SQL queries to KV #86012

Closed

dhartunian added P-2 Issues/test failures with a fix SLA of 3 months and removed P-2 Issues/test failures with a fix SLA of 3 months labels Jan 16, 2024

dhartunian added the P-3 Issues/test failures with no fix SLA label Jan 19, 2024

exalate-issue-sync bot added T-observability and removed T-observability-inf labels Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli: build in pprof-loop.sh for CPU profiles and Go execution traces #97174

cli: build in pprof-loop.sh for CPU profiles and Go execution traces #97174

tbg commented Feb 15, 2023 •

edited by exalate-issue-sync bot

Loading

kevinkokomani commented May 9, 2023

tbg commented May 16, 2023

tbg commented May 17, 2023

kevinkokomani commented May 25, 2023 •

edited

Loading

NigelNavarro commented May 25, 2023

tbg commented May 26, 2023

NigelNavarro commented May 27, 2023

tbg commented Jun 16, 2023

thtruo commented Aug 8, 2023

cli: build in pprof-loop.sh for CPU profiles and Go execution traces #97174

cli: build in pprof-loop.sh for CPU profiles and Go execution traces #97174

Comments

tbg commented Feb 15, 2023 • edited by exalate-issue-sync bot Loading

Footnotes

kevinkokomani commented May 9, 2023

tbg commented May 16, 2023

tbg commented May 17, 2023

kevinkokomani commented May 25, 2023 • edited Loading

NigelNavarro commented May 25, 2023

tbg commented May 26, 2023

NigelNavarro commented May 27, 2023

tbg commented Jun 16, 2023

thtruo commented Aug 8, 2023

tbg commented Feb 15, 2023 •

edited by exalate-issue-sync bot

Loading

kevinkokomani commented May 25, 2023 •

edited

Loading