-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235
base: master
Are you sure you want to change the base?
Conversation
This change introduces a BackgroundProfiler service that is started during server startup on each node in the cluster. The BackgroundProfiler is responsible for collecting on-demand CPU profiles and runtime traces for a particular operation. The profiler can be subscribed to by an in-process listener. The first Subscriber initializes the collection of the CPU and execution trace profiles. While the profiles are being collected, only Subscribers carrying the same `profileID` are allowed to join the running profiles. The profiles are stopped and persisted to local storage when the last Subscriber unsubscribes. The `profileID` is a unique identifier of the operation that is being traced. Since only one profile can be running in a process at a time, any Subscribers with different `profileID`s than the current one will be rejected. The in-process listeners described above will be CRDBs internal tracing spans. This change introduces a `WithBackgroudProfiling` option that can be used to instruct a tracing span to subscribe to the BackgroundProfiler. This option is propogated to all local and remote child spans created as part of the trace. Only local, root spans that have background profiling enabled will Subscribe to the profiler on creation. As mentioned above only one operation can be profiled at a time. We use the first root span's `TraceID` as the BackgroundProfiler's `profileID`. All subsequent root span's that are part of the same trace will be able to join the running profile. Tracing span's unsubscribe from the profile on Finish(). Every Susbcriber is returned a wrapped ctx with pprof labels that tie its execution to the profile being collected by the BackgroundProfiler. These labels are used to post-process the collected CPU profile and filter out samples that only correspond to our subscribers. The end result is filtered CPU profile prefixed `cpuprofiler.` and a process wide execution trace `runtimetrace.` persisted to local storage. This change only introduces the infrastructure to enable on-demand profiling. The test in `profiler_test.go` results in a CPU profile with information about each labelled root operation collected on-demand: ❯ go tool pprof cpuprofiler.2023-03-08T14_51_52.402 Type: cpu Time: Mar 8, 2023 at 9:51am (EST) Duration: 10.11s, Total samples = 8.57s (84.77%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) tags 9171346462634118014: Total 8.6s 906.0ms (10.57%): op2 902.0ms (10.53%): op1 890.0ms (10.39%): op0 886.0ms (10.34%): op7 866.0ms (10.11%): op4 866.0ms (10.11%): op5 854.0ms ( 9.96%): op3 806.0ms ( 9.40%): op8 804.0ms ( 9.38%): op6 790.0ms ( 9.22%): op9 Execution traces do not surface pprof labels in golang yet but a future patch could consider cherry-picking https://go-review.googlesource.com/c/go/+/446975. This allows the user to focus on goroutines run with the specified pprof labels: With this framework in place one could envision the following use cases: - stmt diagnostics requests get a new option to request profiling. When requested, any local root trace span (i.e. while any part of the trace is active on a given node) subscribes to profiles, and references to the profiles collected are stored as payloads in the span. They're then included in the stmt bundle. - even outside of diagnostics, could mark traces as wanting to capture debug info for "slow spans". Such spans on creation could set a timer that, once it fires, subscribes to (short) execution traces periodically as a way to snapshot the goroutine's actions. These could be referenced in the span for later retrieval. Informs: cockroachdb#97215
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't reviewed in detail, just left two high-level comments, overall the basic approach sounds good, I hope the conversation continues.
// runtime default is a 100Hz. This value will probably require tuning | ||
// depending on our use case. Egs: profiling a single query execution vs | ||
// profiling a long running job for a few minutes vs profiling a | ||
// statement fingerprint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some sugar around this that we should use
cockroach/pkg/server/status_local_file_retrieval.go
Lines 41 to 57 in 1af6635
if err := debug.CPUProfileDo(st, profileType, func() error { | |
duration := 30 * time.Second | |
if req.Seconds != 0 { | |
duration = time.Duration(req.Seconds) * time.Second | |
} | |
if err := pprof.StartCPUProfile(&buf); err != nil { | |
// Construct a gRPC error to return to the caller. | |
return serverError(ctx, err) | |
} | |
defer pprof.StopCPUProfile() | |
select { | |
case <-ctx.Done(): | |
return ctx.Err() | |
case <-time.After(duration): | |
return nil | |
} | |
}); err != nil { |
syncutil.Mutex | ||
|
||
// profileID is a unique ID that can be associated with the operation that | ||
// we are profiling. Only subscribers that are executing as part of this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this too restrictive? If multiple traces are inflight across the cluster and touch on some node, one will fail, right? This may seem like an unimportant point now but traces need to be reliable. We already have two consumers of CPU profiler (status page/pprof endpoint, auto CPU profiler) and this new one is possibly more prolific.
It's a bit of a bigger lift (and an annoying one at that) but a pprof file is a protobuf and so is decomposable. We could allow multiple subscribers, have them share a single file, but associate the time span of each subscription in metadata along with the file (mapping subscriber ID to a slice of [start,end)
timestamps). For that to work, the BackgroundProfiler should be in the path of any CPU profile.
This is all fairly onerous, I certainly don't think this should be in the first cut, but maybe it's worth pondering anyway.
Note that
cockroach/pkg/settings/cluster/cluster_settings.go
Lines 101 to 114 in 4aa59bc
// SetCPUProfiling is called from the pprofui to inform the system that a CPU | |
// profile is being recorded. If an error is returned, a profile was already in | |
// progress and the caller must try again later. | |
func (s *Settings) SetCPUProfiling(to CPUProfileType) error { | |
if to == CPUProfileNone { | |
atomic.StoreInt32(&s.cpuProfiling, int32(CPUProfileNone)) | |
} else if !atomic.CompareAndSwapInt32(&s.cpuProfiling, int32(CPUProfileNone), int32(to)) { | |
return errors.New("a CPU profile is already in process, try again later") | |
} | |
if log.V(1) { | |
log.Infof(context.Background(), "active CPU profile type set to: %d", to) | |
} | |
return nil | |
} |
This change introduces a BackgroundProfiler service that is started
during server startup on each node in the cluster. The BackgroundProfiler
is responsible for collecting on-demand CPU profiles and runtime traces
for a particular operation. The profiler can be subscribed to by an in-process
listener. The first Subscriber initializes the collection of the CPU and execution
trace profiles. While the profiles are being collected, only Subscribers carrying
the same
profileID
are allowed to join the running profiles. The profilesare stopped and persisted to local storage when the last Subscriber
unsubscribes. The
profileID
is a unique identifier of the operation that isbeing traced. Since only one profile can be running in a process at a time,
any Subscribers with different
profileID
s than the current one will be rejected.The in-process listeners described above will be CRDBs internal tracing
spans. This change introduces a
WithBackgroudProfiling
option that can be usedto instruct a tracing span to subscribe to the BackgroundProfiler. This option
is propogated to all local and remote child spans created as part of the trace.
Only local, root spans that have background profiling enabled will Subscribe to
the profiler on creation. As mentioned above only one operation can be profiled
at a time. We use the first root span's
TraceID
as the BackgroundProfiler'sprofileID
.All subsequent root span's that are part of the same trace will be able to join
the running profile. Tracing span's unsubscribe from the profile on Finish().
Every Susbcriber is returned a wrapped ctx with pprof labels that tie its execution
to the profile being collected by the BackgroundProfiler. These labels are used
to post-process the collected CPU profile and filter out samples that only correspond
to our subscribers. The end result is filtered CPU profile prefixed
cpuprofiler.
and a process wide execution trace
runtimetrace.
persisted to local storage.This change only introduces the infrastructure to enable on-demand profiling.
The test in
profiler_test.go
results in a CPU profile with information abouteach labelled root operation collected on-demand:
❯ go tool pprof cpuprofiler.2023-03-08T14_51_52.402
Type: cpu
Time: Mar 8, 2023 at 9:51am (EST)
Duration: 10.11s, Total samples = 8.57s (84.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) tags
9171346462634118014: Total 8.6s
906.0ms (10.57%): op2
902.0ms (10.53%): op1
890.0ms (10.39%): op0
886.0ms (10.34%): op7
866.0ms (10.11%): op4
866.0ms (10.11%): op5
854.0ms ( 9.96%): op3
806.0ms ( 9.40%): op8
804.0ms ( 9.38%): op6
790.0ms ( 9.22%): op9
Execution traces do not surface pprof labels in golang yet but a future
patch could consider cherry-picking https://go-review.googlesource.com/c/go/+/446975.
This allows the user to focus on goroutines run with the specified pprof labels:
With this framework in place one could envision the following use cases:
stmt diagnostics requests get a new option to request profiling.
When requested, any local root trace span (i.e. while any part of the trace is
active on a given node) subscribes to profiles, and references to the profiles collected
are stored as payloads in the span. They're then included in the stmt bundle.
even outside of diagnostics, could mark traces as wanting to capture debug info
for "slow spans". Such spans on creation could set a timer that, once it fires,
subscribes to (short) execution traces periodically as a way to snapshot the goroutine's
actions. These could be referenced in the span for later retrieval.
Informs: #97215