Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: NewCPUProfile + cpuProfile.Start to allow profile configuration #42502

Open
rogerlucena opened this issue Nov 11, 2020 · 30 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted Proposal Proposal-Accepted
Milestone

Comments

@rogerlucena
Copy link

rogerlucena commented Nov 11, 2020

I think SetCPUProfileRate is, albeit confusing, still functioning. It needs to be called in the right order with StartCPUProfile. Last time I looked (~a month ago), it probably prints some warnings, but it does change the rate.

The pprof package doesn't seem to have a way to set the profiling rate. Not sure if this is really a useful feature.

Originally posted by @cherrymui in #40094 (comment)

The pprof package indeed does not allow us to customize the CPU profiling rate at the moment. This was something critical for my application in google/badwolf#155, since the default profiling rate of 100 was extremely low and did not allow me no see anything in my CPU profile.

To work around this, I had to make a call to runtime.SetCPUProfileRate(800) (with a rate of around 800 hz to see something meaningful and helpful in my CPU profile) before calling pprof.StartCPUProfile. Since there is a hard-coded call to runtime.SetCPUProfileRate(100) inside the pprof.StartCPUProfile, a warning is being printed to my terminal at the moment saying "runtime: cannot set cpu profile rate until previous profile has finished" as this call to runtime.SetCPUProfileRate(100) failed (the boolean cpuprof.on was already set to true in the previous call to runtime.SetCPUProfileRate(800)). This warning is annoying since it is misleading for someone who does not know the internals of pprof.StartCPUProfile, but the important here is that the CPU profiling is done with the first profiling rate that I manually set (800, as I wanted and as wanted from the user point of view).

Given this, what I am trying to say is that the public/exported method runtime.SetCPUProfileRate is still useful, and sometimes even critical for allowing us to use well pprof and to see something relevant on our profiles, depending on the application. I am not alone on this as you can see here (stack overflow) and here (google groups discussion). Even one principal software engineer (L8) I have been working with at Google found it useful when reviewing my PR google/badwolf#155 above and copied the code to use it in one of his applications, on which setting a higher profiling rate was critical as well.

That said, I believe we could, from now on, follow three different paths:

  1. The ideal would be for the pprof.StartCPUProfile method already receive as argument the CPU profile rate we want, allowing us to customize it, with this argument having its default value set to 100 if it was not specified by the function call. This would be the ideal, if we had support for default arguments in Go. But, we can achieve something similar if we had a similar method with signature pprof.StartCPUProfileWithSpecifiedRate(w io.Writer, hz int), in addition to the pprof.StartCPUProfile(w io.Writer) method we already have. This way, the runtime.SetCPUProfileRate method would not necessarily need to be an exported method anymore.

  2. We could keep runtime.SetCPUProfileRate public/exported, but we would have to fix the problem with the misleading warning message "runtime: cannot set cpu profile rate until previous profile has finished" if we had already manually set the profiling rate before the call to pprof.StartCPUProfile. For this we could:

    2.1. Have an auxiliar boolean method runtime.isCPUProfileRateAlreadySet, and call it inside pprof.StartCPUProfile before the hard-coded call to runtime.SetCPUProfileRate(100) there, trying to set the profiling rate to 100 only if this rate was not already previously set. This way, we would not see any strange warning after a call to pprof.StartCPUProfile if we had already previously set the profiling rate manually. I like this solution because it makes minimal changes on your code, it does not change any signature and it resolves exactly the problem we are trying to solve here, while keeping the warning message for other calls to runtime.SetCPUProfileRate that were not from pprof.StartCPUProfile in the situation above.

    2.2. Make runtime.SetCPUProfileRate return an error in the case the profiling rate was already previously set, allowing the caller to handle it and decide what to do from then on (inside pprof.StartCPUProfile we could ignore this error for example). This would change the signature of runtime.SetCPUProfileRate.

  3. We could move the SetCPUProfileRate public signature to the pprof package (with some changes as explained above to avoid the misleading warning message), and deprecate it / make it unexported inside the runtime package.

These were just some of the alternatives that I thought about for now. I would like to hear more from you on what you think about them and if you have any other suggestions for fixing what I explained above.

@ianlancetaylor ianlancetaylor changed the title Alternatives for the future of runtime.SetCPUProfileRate and pprof.StartCPUProfile runtime/pprof: support setting CPU profiling rate to a value other than 100 Nov 11, 2020
@ianlancetaylor ianlancetaylor added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 11, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Nov 11, 2020
imxyb added a commit to imxyb/go that referenced this issue Nov 12, 2020
runtime/pprof: method StartCPUProfile adds a parameter to allow custom cpu rate.
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/269437 mentions this issue: runtime/pprof: method StartCPUProfile adds a parameter to allow custom cpu rate.

@imxyb
Copy link
Contributor

imxyb commented Nov 12, 2020

so we can add method StartCPUProfileWithSpecifiedRate for this issue? @ianlancetaylor

@ianlancetaylor
Copy link
Contributor

I don't think that is the best name, but, yes, something along those lines could work.

@imxyb
Copy link
Contributor

imxyb commented Nov 12, 2020

ok, let me think about it

imxyb added a commit to imxyb/go that referenced this issue Nov 12, 2020
runtime/pprof: method StartCPUProfile adds a parameter to allow custom cpu rate.
imxyb added a commit to imxyb/go that referenced this issue Nov 12, 2020
runtime/pprof: method StartCPUProfile adds a parameter to allow custom cpu rate.
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/269537 mentions this issue: runtime/pprof: method StartCPUProfile adds a parameter to allow custom cpu rate.

@ianlancetaylor ianlancetaylor changed the title runtime/pprof: support setting CPU profiling rate to a value other than 100 proposal: runtime/pprof: add StartCPUProfileWithRate Nov 16, 2020
@ianlancetaylor ianlancetaylor modified the milestones: Backlog, Proposal Nov 16, 2020
@ianlancetaylor ianlancetaylor removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. help wanted labels Nov 16, 2020
@ianlancetaylor
Copy link
Contributor

I turned this issue into a proposal for the API change.

@hyangah
Copy link
Contributor

hyangah commented Nov 16, 2020

@ianlancetaylor Thanks for turning this into a proposal.

The current proposal is specific to StartCPUProfileWithRate. If we decide to add an additional API, can we come up with a version that's flexible enough to be extensible in the future? As far as I remember we had a couple of API change requests before and most of them were rejected or put on hold because of the desire to avoid API changes.

  1. PMU-based profiling: it seems one of the main blockers of the proposal proposal: runtime/pprof: add PMU-based profiles #36821 is currently the API change (in addition to the implementation details). This proposal was originally proposing pprof.StartCPUProfileWithConfig.
  2. Symbolization: I once proposed an option to skip symbolization because the pprof tool may have a way to perform additional symbolization out of band.
  3. Different policy for profiling rate setup: the example issue @rhysh reported (runtime/pprof: Linux CPU profiles inaccurate beyond 250% CPU use #35057) made me wonder if we may end up with a completely different cpu profiling policies in the future that requires beyond a simple hz adjustment. For example, some users may want to try per-thread timer (runtime/pprof: details of Linux SIGPROF delivery may cause very skewed profiles #14434) if the runtime supports. Some users may want to adjust the profile rate based on the GOMAXPROCS (runtime/pprof: Linux CPU profiles inaccurate beyond 250% CPU use #35057).

@rsc
Copy link
Contributor

rsc commented Nov 18, 2020

@hyangah It sounds like you are suggesting some kind of

cpu := pprof.NewCPUProfile()
... configure prof ...
cpu.Start()

or even

cpu := new(pprof.CPUProfile)
...
cpu.Start()

?

@rsc
Copy link
Contributor

rsc commented Dec 2, 2020

It sounds like maybe the interface in the previous message is the answer if we want to solve this problem.
But do we need to solve this problem? How often do people need profile rates other than 100 Hz?
The full stack trace makes the profile event a bit expensive, and I wouldn't want people to set the rate very high and then complain about the overhead.

@hyangah
Copy link
Contributor

hyangah commented Dec 2, 2020

@rsc Choosing the right profiling rate is hard. There are many theories and different opinion around choosing the right value - http://www.brendangregg.com/blog/2014-06-22/perf-cpu-sample.html is talking about a lockstep sampling issue, #35057 is discussing scaling the frequency based on the system's capability because 100hz is sometimes too high, and @rogerlucena wants higher than 100hz (aside from whether 800 hz is reasonable or not). It looks to me it's a parameter people still want to experiment and tune.

@rogerlucena
Copy link
Author

rogerlucena commented Dec 7, 2020

But do we need to solve this problem? How often do people need profile rates other than 100 Hz?

@rsc For my application it was crucial to set a sampling rate higher than 100 Hz to see something useful in the profiling output, as I explained and detailed in the Issue description above. Apart from that, in this description I also referenced other people for which customizing the sampling rate can still be useful: stack overflow and google groups discussion.

@mknyszek mknyszek added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 11, 2022
@aclements
Copy link
Member

One question I haven't seen brought up is what happens if the user attempts to create multiple pprof.CPUProfiles and start them simultaneously.

A simple answer would be that you are only allowed to have one running profile at a time, so the second call to Start panics. Maybe we start there because it's simple.

Alternatively, we could support multiple concurrent profiles. The main use case I see for that is to enable an always-running, low-frequency, fleet-wide profile without blocking the ability to capture a short, high-frequency profile. In the runtime, we would probably implement this by using the highest profile frequency at any given time for all profiles. We could optionally sub-sample that for lower rate profiles to prevent unexpectedly large profile data. The pprof format supports variable-weight samples, so sampling at a higher rate than requested is not a problem from a format perspective.

@aclements
Copy link
Member

Given proposals to extend CPUProfile to support more Linux perf events (#53286), I'm wondering whether CPUProfile is the best name. If we eventually support other events like "branch mispredicts" and "cache misses", which seem like natural extensions, these are no longer "CPU" profiles. I'm not sure what a better name is. The key properties seem to be that this type of profile has to be collected over a window of time and it samples program locations (PC + stack) in response to some event. This contrasts with the profiles captured by runtime/pprof.Profile, which are instantaneous snapshots (possibly of something accumulated since the start of the program), and are sampled over other dimensions (except the "goroutine" profile, which isn't sampled but has always been weird).

@erifan
Copy link

erifan commented Sep 28, 2022

I agree that CPUProfile may not be the best name. For PMU events such as "branch mispredicts", "cycles" and "cache misses", if we use the period as the trigger condition, then I think a core property of them is that we sample once the event occurs a certain number of times. For os-timer, sampling is performed at regular intervals. Therefore, it is difficult for CPUProfile to cover these two different properties. And from the perspective of profiling the execution state of the CPU, perhaps CPUProfile is a good name. But it would be best if there was a better name.

@mknyszek
Copy link
Contributor

Since we're in the freeze and this is work for a new API, I don't think anything is going to happen here. I'm moving this into Go 1.21. This has been sitting here for quite a long time, and at this point I think we just need someone to implement it. Marking as help wanted.

@mknyszek mknyszek modified the milestones: Go1.20, Go1.21 Nov 23, 2022
@aclements
Copy link
Member

The follow-up discussion on adding PMU configuration has some effects on the API here. It adds a Stop method and changes SetRate to SetPeriod.

Bringing these together and working out a few more details, I think the API for this issue is:

// A CPUProfile configures a sampled profile of this program's CPU time.
// The zero value of a CPUProfile samples time spent executing at a default rate
// that is intended to have minimal impact on performance.
//
// Once a profile has been configured, call Start to begin recording the profile and
// Stop to end recording. Start and Stop may be called multiple times on the same CPUProfile.
type CPUProfile struct { ... }

// SetPeriod sets the sampling rate of this profile, as the number of nanoseconds
// of CPU time between samples.
//
// If the period is below a minimum threshold, it will set the period to the minimum threshold.
//
// SetPeriod returns an error if p is already running.
func (p *CPUProfile) SetPeriod(period int64) error

// Start enables CPU profiling for the current process.
// While profiling, the profile will be buffered and written to w.
// Start returns an error if p is already running.
// If another profile (besides p) is running, Start may return an error,
// or it may start a second, concurrent profile.
func (p *CPUProfile) Start(w io.Writer) error

// Stop stops a running CPUProfile and flushes any pending writes.
// It returns only after all writes for the profile have completed.
// If p is not running, Stop does nothing and returns nil.
//
// It is the caller's responsibility to close the profile's Writer if appropriate.
//
// The CPUProfile can be started again after calling Stop.
func (p *CPUProfile) Stop() error

Does this seem right? Some of this documentation would be generalized when and if we add PMU events, but for this issue I wanted to keep it more specific.

@erifan
Copy link

erifan commented Nov 24, 2022

The https://github.com/golang/go/issues/53286 also adds the following method.

// SetSampleKernel sets whether to count the kernel events, pass in “true” to count the kernel events, otherwise not count.
// This function has no effect on os-timer based events, because os-timer events always count kernel events.
// To count the kernel events, the CAP_PERFORM permission or the admin permission is required.
func (c *CPUProfile) SetSampleKernel(bool) error

This has been sitting here for quite a long time, and at this point I think we just need someone to implement it. Marking as help wanted.

I can do it, and I have already sent out a prototype for these two proposals, see https://go-review.googlesource.com/c/go/+/410799/9

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/410797 mentions this issue: runtime: add perfEventOpen, fcntl, and ioctl and some consts for PMU profiling

@mknyszek mknyszek modified the milestones: Go1.21, Backlog, Go1.22 Jun 9, 2023
@mknyszek
Copy link
Contributor

mknyszek commented Jun 9, 2023

It seems this didn't quite make it for Go 1.21. Since it's an accepted proposal we should probably do something about it in the next dev cycle. Kicking to Go 1.22.

@aadamowski
Copy link

Just adding my 2c in favor of making the rate configurable: if the rate is hardcoded to 100Hz, this does not stop real world CPUs from getting faster.

The given profiling/test run that managed to capture some samples in the past may start getting fewer samples, or even no samples at all when run on faster and faster CPUs - eventually the time passing between pprof.StartCPUProfile() and pprof.StopCPUProfile() may become smaller than the period of 1/100th of a second.

I just dealt with this issue on a pretty high-powered server, and had to set the rate to 500Hz in order to capture just 3 samples from a run with relatively complex logic.

I think that maybe Hz isn't the right unit for controlling sampling rate, since it's absolute. Something relative to CPU speed (e.g. number of CPU cycles) might be more appropriate.

@rhysh
Copy link
Contributor

rhysh commented Jul 31, 2024

The tree is frozen for Go 1.23. Should we move this to Go 1.24, @mknyszek ?

Just adding my 2c in favor of making the rate configurable: if the rate is hardcoded to 100Hz, this does not stop real world CPUs from getting faster.

On Linux, the current approach of asking the kernel to track on-CPU time means we're limited by its tick frequency. Often that's set to 250 Hz, though some sites configure it to be 1000 Hz (or 100 Hz). See #35057 for what it can look like when a process (or now, a thread) earns more than one sample per kernel tick. Sampling at faster than kernel's tick interval will require a different approach, not just a config change. (It would be neat to use perf_event_open, see #36821 and #53286 for how that could look and why we don't have it yet.)

eventually the time passing between pprof.StartCPUProfile() and pprof.StopCPUProfile() may become smaller than the period of 1/100th of a second.

But also on Linux, we now set the initial expiration of the timer to be uniformly distributed between 0 and 10 ms rather than fixed at 10 ms. That provides a fair chance of getting a sample even for very short profiling windows or very light workloads (assuming the timing of the work isn't correlated with the kernel's timer interrupt). So it should be safe and fair to collect several profiles (or thousands, if you like) and merge them to get a more detailed view.

@prattmic prattmic modified the milestones: Go1.23, Go1.24 Jul 31, 2024
@insilications
Copy link

The PR google/badwolf#155 referenced by OP also helped me to reveal a problem that wasn't showing up with the hard-coded profiling. This change, along with possible future support for the PMU, is very welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment