-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grunning: add library for precise on-CPU time measurement #85977
Conversation
@rickystewart: PTAL at the build tags, to make sure I've got them right. @tbg, @sumeerbhola: PTAL at everything else. Future PRs will wire this up into the elastic CPU limiter we're building towards for latency isolation for backups. |
f961c60
to
8605a33
Compare
Package grunning is a library that's able to retrieve on-CPU running time for individual goroutines. It relies on using a patched Go and provides a primitive for fine-grained CPU attribution and control through a single API: package grunning // Time returns the time spent by the current goroutine in the // running state. func Time() time.Duration The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625. We build CRDB using use the patched Go runtime for all officially supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly building CRDB also use happen to use two platforms we don't use a patched Go for: - FreeBSD (we don't have cross-compilers setup), and - M1/M2 Macs (we don't have a code-signing pipeline, yet). We use '(darwin && arm64) || freebsd || !bazel' as the build tag to exclude such platforms. See cockroachdb#84867 for more details. This package tests various properties we should expect over the running time value. It does not make assertions given the CI environments we run these under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy CRDB deployments). This is also why they're skipped under stress. Still, these tests are useful to understand the properties we expect running time to have: === RUN TestEquivalentGoroutines thread=03 expected≈10.00% got= 9.98% of on-cpu time thread=06 expected≈10.00% got=10.00% of on-cpu time thread=02 expected≈10.00% got=10.01% of on-cpu time thread=10 expected≈10.00% got=10.01% of on-cpu time thread=07 expected≈10.00% got= 9.99% of on-cpu time thread=04 expected≈10.00% got= 9.99% of on-cpu time thread=09 expected≈10.00% got=10.00% of on-cpu time thread=01 expected≈10.00% got= 9.99% of on-cpu time thread=08 expected≈10.00% got=10.02% of on-cpu time thread=05 expected≈10.00% got=10.02% of on-cpu time --- PASS: TestEquivalentGoroutines (0.56s) === RUN TestProportionalGoroutines thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x --- PASS: TestProportionalGoroutines (1.72s) === RUN TestPingPongHog pinger/ponger expected≈1.00x got=0.96x --- PASS: TestPingPongHog (0.91s) Release note: None
8605a33
to
5b0982e
Compare
@irfansharif now's a good time to diversify away from me as a reviewer since I am sort of swamped and also likely won't be involved with admission control much. I assume Sumeer has got you covered but it's also worth thinking who else might be the third person on the squad, ideally someone from KV or an Obs team. |
@kvoli or @lidorcarmel: mind giving this a look from the KV side of things? The immediate use is going to be in admission control, but I know the both of you are thinking about load balancing/allocation generally, and at some level, I think this will play a part (even if indirectly through AC): #83490. |
Thanks! bors r+ |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 8 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/util/grunning/enabled_test.go
line 111 at r2 (raw file):
for i := trip; i > 0; i-- { ret += i ret = ret ^ (i + 0xcafebabe)
why the different computation compared to TestEquivalentGoroutines
? I am assuming the only thing that matters in this test is that the different goroutines do different number of iterations.
pkg/util/grunning/enabled_test.go
line 118 at r2 (raw file):
} results := make([]int64, 10)
why is TestEquivalentGoroutines
using a map when a slice like in this test would suffice?
pkg/util/grunning/enabled_test.go
line 133 at r2 (raw file):
total := int64(0) for _, result := range results { total += result
Does the barrier due to the WaitGroup
obviate the need to do an atomic load here? If yes, why is there an atomic store above, when the different goroutines are writing to different ints?
pkg/util/grunning/enabled_test.go
line 186 at r2 (raw file):
}() ping <- true // start ping-pong <-stop
The test immediately stops the CPU hog? What is the purpose of the CPU hog?
It is not quite clear what this test is trying to demonstrate. Perhaps say something like
// Test that demonstrates that if 2 goroutines alternately cycle between running and waiting, they will get similar running times.
pkg/util/grunning/enabled_test.go
line 201 at r2 (raw file):
for n := 0; n < b.N; n++ { _ = grunning.Time() }
what was the result of this benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent test changes over at #86185.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/util/grunning/enabled_test.go
line 111 at r2 (raw file):
Previously, sumeerbhola wrote…
why the different computation compared to
TestEquivalentGoroutines
? I am assuming the only thing that matters in this test is that the different goroutines do different number of iterations.
The computation is just something I copied off of golang/go#36821, and did so originally when evaluating CPU accuracy of this measurement compared to profiler data (same as that golang/go issue). And yes, the only thing that matters is the different number of iterations.
pkg/util/grunning/enabled_test.go
line 118 at r2 (raw file):
Previously, sumeerbhola wrote…
why is
TestEquivalentGoroutines
using a map when a slice like in this test would suffice?
No real reasons, I was being sloppy.
pkg/util/grunning/enabled_test.go
line 133 at r2 (raw file):
Previously, sumeerbhola wrote…
Does the barrier due to the
WaitGroup
obviate the need to do an atomic load here? If yes, why is there an atomic store above, when the different goroutines are writing to different ints?
It does, I was being sloppy.
pkg/util/grunning/enabled_test.go
line 186 at r2 (raw file):
Previously, sumeerbhola wrote…
The test immediately stops the CPU hog? What is the purpose of the CPU hog?
It is not quite clear what this test is trying to demonstrate. Perhaps say something like
// Test that demonstrates that if 2 goroutines alternately cycle between running and waiting, they will get similar running times.
The test isn't stopping the CPU hog, that line is waiting for the CPU hog to finish. I'll add a comment.
pkg/util/grunning/enabled_test.go
line 201 at r2 (raw file):
Previously, sumeerbhola wrote…
what was the result of this benchmark?
$ dev bench pkg/util/grunning -f BenchmarkGRunningTime -v
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkGRunningTime
BenchmarkGRunningTime-24 38336452 31.59 ns/op
PASS
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:
The motivating RFC is over at #82356. Informs #82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See ci: add scripts to build patched versions of Go runtime #84867 for more details.
This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:
Release note: None