[WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235

adityamaru · 2023-03-08T17:58:58Z

This change introduces a BackgroundProfiler service that is started
during server startup on each node in the cluster. The BackgroundProfiler
is responsible for collecting on-demand CPU profiles and runtime traces
for a particular operation. The profiler can be subscribed to by an in-process
listener. The first Subscriber initializes the collection of the CPU and execution
trace profiles. While the profiles are being collected, only Subscribers carrying
the same profileID are allowed to join the running profiles. The profiles
are stopped and persisted to local storage when the last Subscriber
unsubscribes. The profileID is a unique identifier of the operation that is
being traced. Since only one profile can be running in a process at a time,
any Subscribers with different profileIDs than the current one will be rejected.

The in-process listeners described above will be CRDBs internal tracing
spans. This change introduces a WithBackgroudProfiling option that can be used
to instruct a tracing span to subscribe to the BackgroundProfiler. This option
is propogated to all local and remote child spans created as part of the trace.
Only local, root spans that have background profiling enabled will Subscribe to
the profiler on creation. As mentioned above only one operation can be profiled
at a time. We use the first root span's TraceID as the BackgroundProfiler's profileID.
All subsequent root span's that are part of the same trace will be able to join
the running profile. Tracing span's unsubscribe from the profile on Finish().

Every Susbcriber is returned a wrapped ctx with pprof labels that tie its execution
to the profile being collected by the BackgroundProfiler. These labels are used
to post-process the collected CPU profile and filter out samples that only correspond
to our subscribers. The end result is filtered CPU profile prefixed cpuprofiler.
and a process wide execution trace runtimetrace. persisted to local storage.

This change only introduces the infrastructure to enable on-demand profiling.
The test in profiler_test.go results in a CPU profile with information about
each labelled root operation collected on-demand:

❯ go tool pprof cpuprofiler.2023-03-08T14_51_52.402
Type: cpu
Time: Mar 8, 2023 at 9:51am (EST)
Duration: 10.11s, Total samples = 8.57s (84.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) tags
9171346462634118014: Total 8.6s
906.0ms (10.57%): op2
902.0ms (10.53%): op1
890.0ms (10.39%): op0
886.0ms (10.34%): op7
866.0ms (10.11%): op4
866.0ms (10.11%): op5
854.0ms ( 9.96%): op3
806.0ms ( 9.40%): op8
804.0ms ( 9.38%): op6
790.0ms ( 9.22%): op9

Execution traces do not surface pprof labels in golang yet but a future
patch could consider cherry-picking https://go-review.googlesource.com/c/go/+/446975.
This allows the user to focus on goroutines run with the specified pprof labels:

With this framework in place one could envision the following use cases:

stmt diagnostics requests get a new option to request profiling.
When requested, any local root trace span (i.e. while any part of the trace is
active on a given node) subscribes to profiles, and references to the profiles collected
are stored as payloads in the span. They're then included in the stmt bundle.
even outside of diagnostics, could mark traces as wanting to capture debug info
for "slow spans". Such spans on creation could set a timer that, once it fires,
subscribes to (short) execution traces periodically as a way to snapshot the goroutine's
actions. These could be referenced in the span for later retrieval.

Informs: #97215

This change introduces a BackgroundProfiler service that is started during server startup on each node in the cluster. The BackgroundProfiler is responsible for collecting on-demand CPU profiles and runtime traces for a particular operation. The profiler can be subscribed to by an in-process listener. The first Subscriber initializes the collection of the CPU and execution trace profiles. While the profiles are being collected, only Subscribers carrying the same `profileID` are allowed to join the running profiles. The profiles are stopped and persisted to local storage when the last Subscriber unsubscribes. The `profileID` is a unique identifier of the operation that is being traced. Since only one profile can be running in a process at a time, any Subscribers with different `profileID`s than the current one will be rejected. The in-process listeners described above will be CRDBs internal tracing spans. This change introduces a `WithBackgroudProfiling` option that can be used to instruct a tracing span to subscribe to the BackgroundProfiler. This option is propogated to all local and remote child spans created as part of the trace. Only local, root spans that have background profiling enabled will Subscribe to the profiler on creation. As mentioned above only one operation can be profiled at a time. We use the first root span's `TraceID` as the BackgroundProfiler's `profileID`. All subsequent root span's that are part of the same trace will be able to join the running profile. Tracing span's unsubscribe from the profile on Finish(). Every Susbcriber is returned a wrapped ctx with pprof labels that tie its execution to the profile being collected by the BackgroundProfiler. These labels are used to post-process the collected CPU profile and filter out samples that only correspond to our subscribers. The end result is filtered CPU profile prefixed `cpuprofiler.` and a process wide execution trace `runtimetrace.` persisted to local storage. This change only introduces the infrastructure to enable on-demand profiling. The test in `profiler_test.go` results in a CPU profile with information about each labelled root operation collected on-demand: ❯ go tool pprof cpuprofiler.2023-03-08T14_51_52.402 Type: cpu Time: Mar 8, 2023 at 9:51am (EST) Duration: 10.11s, Total samples = 8.57s (84.77%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) tags 9171346462634118014: Total 8.6s 906.0ms (10.57%): op2 902.0ms (10.53%): op1 890.0ms (10.39%): op0 886.0ms (10.34%): op7 866.0ms (10.11%): op4 866.0ms (10.11%): op5 854.0ms ( 9.96%): op3 806.0ms ( 9.40%): op8 804.0ms ( 9.38%): op6 790.0ms ( 9.22%): op9 Execution traces do not surface pprof labels in golang yet but a future patch could consider cherry-picking https://go-review.googlesource.com/c/go/+/446975. This allows the user to focus on goroutines run with the specified pprof labels: With this framework in place one could envision the following use cases: - stmt diagnostics requests get a new option to request profiling. When requested, any local root trace span (i.e. while any part of the trace is active on a given node) subscribes to profiles, and references to the profiles collected are stored as payloads in the span. They're then included in the stmt bundle. - even outside of diagnostics, could mark traces as wanting to capture debug info for "slow spans". Such spans on creation could set a timer that, once it fires, subscribes to (short) execution traces periodically as a way to snapshot the goroutine's actions. These could be referenced in the span for later retrieval. Informs: cockroachdb#97215

cockroach-teamcity · 2023-03-08T17:59:08Z

This change is

tbg

Haven't reviewed in detail, just left two high-level comments, overall the basic approach sounds good, I hope the conversation continues.

tbg · 2023-04-24T07:10:25Z

pkg/server/backgroundprofiler/profiler/profiler.go

+				// runtime default is a 100Hz. This value will probably require tuning
+				// depending on our use case. Egs: profiling a single query execution vs
+				// profiling a long running job for a few minutes vs profiling a
+				// statement fingerprint.


We have some sugar around this that we should use

cockroach/pkg/server/status_local_file_retrieval.go

Lines 41 to 57 in 1af6635

if err := debug.CPUProfileDo(st, profileType, func() error {

duration := 30 * time.Second

if req.Seconds != 0 {

duration = time.Duration(req.Seconds) * time.Second

}

if err := pprof.StartCPUProfile(&buf); err != nil {

// Construct a gRPC error to return to the caller.

return serverError(ctx, err)

}

defer pprof.StopCPUProfile()

select {

case <-ctx.Done():

return ctx.Err()

case <-time.After(duration):

return nil

}

}); err != nil {

tbg · 2023-04-24T07:15:40Z

pkg/server/backgroundprofiler/profiler/profiler.go

+		syncutil.Mutex
+
+		// profileID is a unique ID that can be associated with the operation that
+		// we are profiling. Only subscribers that are executing as part of this


Is this too restrictive? If multiple traces are inflight across the cluster and touch on some node, one will fail, right? This may seem like an unimportant point now but traces need to be reliable. We already have two consumers of CPU profiler (status page/pprof endpoint, auto CPU profiler) and this new one is possibly more prolific.

It's a bit of a bigger lift (and an annoying one at that) but a pprof file is a protobuf and so is decomposable. We could allow multiple subscribers, have them share a single file, but associate the time span of each subscription in metadata along with the file (mapping subscriber ID to a slice of [start,end) timestamps). For that to work, the BackgroundProfiler should be in the path of any CPU profile.

This is all fairly onerous, I certainly don't think this should be in the first cut, but maybe it's worth pondering anyway.

Note that

cockroach/pkg/settings/cluster/cluster_settings.go

Lines 101 to 114 in 4aa59bc

// SetCPUProfiling is called from the pprofui to inform the system that a CPU

// profile is being recorded. If an error is returned, a profile was already in

// progress and the caller must try again later.

func (s *Settings) SetCPUProfiling(to CPUProfileType) error {

if to == CPUProfileNone {

atomic.StoreInt32(&s.cpuProfiling, int32(CPUProfileNone))

} else if !atomic.CompareAndSwapInt32(&s.cpuProfiling, int32(CPUProfileNone), int32(to)) {

return errors.New("a CPU profile is already in process, try again later")

}

if log.V(1) {

log.Infof(context.Background(), "active CPU profile type set to: %d", to)

}

return nil

}

is our current "mediator of access" to the profiling singleton, so here we're introducing a second one and should consolidate.

adityamaru requested review from irfansharif, tbg, sumeerbhola and andreimatei March 8, 2023 17:59

tbg reviewed Apr 24, 2023

View reviewed changes

tbg mentioned this pull request Jun 16, 2023

rpc,sql: propagate pprof labels for SQL queries to KV #86012

Closed

irfansharif removed request for andreimatei and irfansharif August 29, 2023 13:33

sumeerbhola removed their request for review September 11, 2023 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235

[WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235

adityamaru commented Mar 8, 2023

cockroach-teamcity commented Mar 8, 2023

tbg left a comment

tbg Apr 24, 2023

tbg Apr 24, 2023

	if err := debug.CPUProfileDo(st, profileType, func() error {
	duration := 30 * time.Second
	if req.Seconds != 0 {
	duration = time.Duration(req.Seconds) * time.Second
	}
	if err := pprof.StartCPUProfile(&buf); err != nil {
	// Construct a gRPC error to return to the caller.
	return serverError(ctx, err)
	}
	defer pprof.StopCPUProfile()
	select {
	case <-ctx.Done():
	return ctx.Err()
	case <-time.After(duration):
	return nil
	}
	}); err != nil {

	// SetCPUProfiling is called from the pprofui to inform the system that a CPU
	// profile is being recorded. If an error is returned, a profile was already in
	// progress and the caller must try again later.
	func (s *Settings) SetCPUProfiling(to CPUProfileType) error {
	if to == CPUProfileNone {
	atomic.StoreInt32(&s.cpuProfiling, int32(CPUProfileNone))
	} else if !atomic.CompareAndSwapInt32(&s.cpuProfiling, int32(CPUProfileNone), int32(to)) {
	return errors.New("a CPU profile is already in process, try again later")
	}
	if log.V(1) {
	log.Infof(context.Background(), "active CPU profile type set to: %d", to)
	}
	return nil
	}

[WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235

Are you sure you want to change the base?

[WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235

Conversation

adityamaru commented Mar 8, 2023

cockroach-teamcity commented Mar 8, 2023

tbg left a comment

Choose a reason for hiding this comment

tbg Apr 24, 2023

Choose a reason for hiding this comment

tbg Apr 24, 2023

Choose a reason for hiding this comment