Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP,DNM] server,tracing: integrate on-demand profiling with CRDB tracing #98235

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adityamaru
Copy link
Contributor

This change introduces a BackgroundProfiler service that is started
during server startup on each node in the cluster. The BackgroundProfiler
is responsible for collecting on-demand CPU profiles and runtime traces
for a particular operation. The profiler can be subscribed to by an in-process
listener. The first Subscriber initializes the collection of the CPU and execution
trace profiles. While the profiles are being collected, only Subscribers carrying
the same profileID are allowed to join the running profiles. The profiles
are stopped and persisted to local storage when the last Subscriber
unsubscribes. The profileID is a unique identifier of the operation that is
being traced. Since only one profile can be running in a process at a time,
any Subscribers with different profileIDs than the current one will be rejected.

The in-process listeners described above will be CRDBs internal tracing
spans. This change introduces a WithBackgroudProfiling option that can be used
to instruct a tracing span to subscribe to the BackgroundProfiler. This option
is propogated to all local and remote child spans created as part of the trace.
Only local, root spans that have background profiling enabled will Subscribe to
the profiler on creation. As mentioned above only one operation can be profiled
at a time. We use the first root span's TraceID as the BackgroundProfiler's profileID.
All subsequent root span's that are part of the same trace will be able to join
the running profile. Tracing span's unsubscribe from the profile on Finish().

Every Susbcriber is returned a wrapped ctx with pprof labels that tie its execution
to the profile being collected by the BackgroundProfiler. These labels are used
to post-process the collected CPU profile and filter out samples that only correspond
to our subscribers. The end result is filtered CPU profile prefixed cpuprofiler.
and a process wide execution trace runtimetrace. persisted to local storage.

This change only introduces the infrastructure to enable on-demand profiling.
The test in profiler_test.go results in a CPU profile with information about
each labelled root operation collected on-demand:

❯ go tool pprof cpuprofiler.2023-03-08T14_51_52.402
Type: cpu
Time: Mar 8, 2023 at 9:51am (EST)
Duration: 10.11s, Total samples = 8.57s (84.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) tags
9171346462634118014: Total 8.6s
906.0ms (10.57%): op2
902.0ms (10.53%): op1
890.0ms (10.39%): op0
886.0ms (10.34%): op7
866.0ms (10.11%): op4
866.0ms (10.11%): op5
854.0ms ( 9.96%): op3
806.0ms ( 9.40%): op8
804.0ms ( 9.38%): op6
790.0ms ( 9.22%): op9

Execution traces do not surface pprof labels in golang yet but a future
patch could consider cherry-picking https://go-review.googlesource.com/c/go/+/446975.
This allows the user to focus on goroutines run with the specified pprof labels:

executiontrace

With this framework in place one could envision the following use cases:

  • stmt diagnostics requests get a new option to request profiling.
    When requested, any local root trace span (i.e. while any part of the trace is
    active on a given node) subscribes to profiles, and references to the profiles collected
    are stored as payloads in the span. They're then included in the stmt bundle.

  • even outside of diagnostics, could mark traces as wanting to capture debug info
    for "slow spans". Such spans on creation could set a timer that, once it fires,
    subscribes to (short) execution traces periodically as a way to snapshot the goroutine's
    actions. These could be referenced in the span for later retrieval.

Informs: #97215

This change introduces a BackgroundProfiler service that is started
during server startup on each node in the cluster. The BackgroundProfiler
is responsible for collecting on-demand CPU profiles and runtime traces
for a particular operation. The profiler can be subscribed to by an in-process
listener. The first Subscriber initializes the collection of the CPU and execution
trace profiles. While the profiles are being collected, only Subscribers carrying
the same `profileID` are allowed to join the running profiles. The profiles
are stopped and persisted to local storage when the last Subscriber
unsubscribes. The `profileID` is a unique identifier of the operation that is
being traced. Since only one profile can be running in a process at a time,
any Subscribers with different `profileID`s than the current one will be rejected.

The in-process listeners described above will be CRDBs internal tracing
spans. This change introduces a `WithBackgroudProfiling` option that can be used
to instruct a tracing span to subscribe to the BackgroundProfiler. This option
is propogated to all local and remote child spans created as part of the trace.
Only local, root spans that have background profiling enabled will Subscribe to
the profiler on creation. As mentioned above only one operation can be profiled
at a time. We use the first root span's `TraceID` as the BackgroundProfiler's `profileID`.
All subsequent root span's that are part of the same trace will be able to join
the running profile. Tracing span's unsubscribe from the profile on Finish().

Every Susbcriber is returned a wrapped ctx with pprof labels that tie its execution
to the profile being collected by the BackgroundProfiler. These labels are used
to post-process the collected CPU profile and filter out samples that only correspond
to our subscribers. The end result is filtered CPU profile  prefixed `cpuprofiler.`
and a process wide execution trace `runtimetrace.` persisted to local storage.

This change only introduces the infrastructure to enable on-demand profiling.
The test in `profiler_test.go` results in a CPU profile with information about
each labelled root operation collected on-demand:

❯ go tool pprof cpuprofiler.2023-03-08T14_51_52.402
Type: cpu
Time: Mar 8, 2023 at 9:51am (EST)
Duration: 10.11s, Total samples = 8.57s (84.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) tags
 9171346462634118014: Total 8.6s
                      906.0ms (10.57%): op2
                      902.0ms (10.53%): op1
                      890.0ms (10.39%): op0
                      886.0ms (10.34%): op7
                      866.0ms (10.11%): op4
                      866.0ms (10.11%): op5
                      854.0ms ( 9.96%): op3
                      806.0ms ( 9.40%): op8
                      804.0ms ( 9.38%): op6
                      790.0ms ( 9.22%): op9

Execution traces do not surface pprof labels in golang yet but a future
patch could consider cherry-picking https://go-review.googlesource.com/c/go/+/446975.
This allows the user to focus on goroutines run with the specified pprof labels:

With this framework in place one could envision the following use cases:

- stmt diagnostics requests get a new option to request profiling.
When requested, any local root trace span (i.e. while any part of the trace is
active on a given node) subscribes to profiles, and references to the profiles collected
are stored as payloads in the span. They're then included in the stmt bundle.

- even outside of diagnostics, could mark traces as wanting to capture debug info
for "slow spans". Such spans on creation could set a timer that, once it fires,
subscribes to (short) execution traces periodically as a way to snapshot the goroutine's
actions. These could be referenced in the span for later retrieval.

Informs: cockroachdb#97215
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed in detail, just left two high-level comments, overall the basic approach sounds good, I hope the conversation continues.

// runtime default is a 100Hz. This value will probably require tuning
// depending on our use case. Egs: profiling a single query execution vs
// profiling a long running job for a few minutes vs profiling a
// statement fingerprint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some sugar around this that we should use

if err := debug.CPUProfileDo(st, profileType, func() error {
duration := 30 * time.Second
if req.Seconds != 0 {
duration = time.Duration(req.Seconds) * time.Second
}
if err := pprof.StartCPUProfile(&buf); err != nil {
// Construct a gRPC error to return to the caller.
return serverError(ctx, err)
}
defer pprof.StopCPUProfile()
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(duration):
return nil
}
}); err != nil {

syncutil.Mutex

// profileID is a unique ID that can be associated with the operation that
// we are profiling. Only subscribers that are executing as part of this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this too restrictive? If multiple traces are inflight across the cluster and touch on some node, one will fail, right? This may seem like an unimportant point now but traces need to be reliable. We already have two consumers of CPU profiler (status page/pprof endpoint, auto CPU profiler) and this new one is possibly more prolific.

It's a bit of a bigger lift (and an annoying one at that) but a pprof file is a protobuf and so is decomposable. We could allow multiple subscribers, have them share a single file, but associate the time span of each subscription in metadata along with the file (mapping subscriber ID to a slice of [start,end) timestamps). For that to work, the BackgroundProfiler should be in the path of any CPU profile.

This is all fairly onerous, I certainly don't think this should be in the first cut, but maybe it's worth pondering anyway.

Note that

// SetCPUProfiling is called from the pprofui to inform the system that a CPU
// profile is being recorded. If an error is returned, a profile was already in
// progress and the caller must try again later.
func (s *Settings) SetCPUProfiling(to CPUProfileType) error {
if to == CPUProfileNone {
atomic.StoreInt32(&s.cpuProfiling, int32(CPUProfileNone))
} else if !atomic.CompareAndSwapInt32(&s.cpuProfiling, int32(CPUProfileNone), int32(to)) {
return errors.New("a CPU profile is already in process, try again later")
}
if log.V(1) {
log.Infof(context.Background(), "active CPU profile type set to: %d", to)
}
return nil
}
is our current "mediator of access" to the profiling singleton, so here we're introducing a second one and should consolidate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants