Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grunning: add library for precise on-CPU time measurement #85977

Merged
merged 1 commit into from
Aug 12, 2022

Conversation

irfansharif
Copy link
Contributor

Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:

package grunning

// Time returns the time spent by the current goroutine in the
// running state.
func Time() time.Duration

The motivating RFC is over at #82356. Informs #82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:

This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:

=== RUN   TestEquivalentGoroutines
    thread=03 expected≈10.00% got= 9.98% of on-cpu time
    thread=06 expected≈10.00% got=10.00% of on-cpu time
    thread=02 expected≈10.00% got=10.01% of on-cpu time
    thread=10 expected≈10.00% got=10.01% of on-cpu time
    thread=07 expected≈10.00% got= 9.99% of on-cpu time
    thread=04 expected≈10.00% got= 9.99% of on-cpu time
    thread=09 expected≈10.00% got=10.00% of on-cpu time
    thread=01 expected≈10.00% got= 9.99% of on-cpu time
    thread=08 expected≈10.00% got=10.02% of on-cpu time
    thread=05 expected≈10.00% got=10.02% of on-cpu time
--- PASS: TestEquivalentGoroutines (0.56s)

=== RUN   TestProportionalGoroutines
    thread=01 got  1.82% of on-cpu time: expected≈ 1.00x got=1.00x
    thread=02 got  3.64% of on-cpu time: expected≈ 2.00x got=2.00x
    thread=03 got  5.47% of on-cpu time: expected≈ 3.00x got=3.00x
    thread=04 got  7.28% of on-cpu time: expected≈ 4.00x got=4.00x
    thread=05 got  9.09% of on-cpu time: expected≈ 5.00x got=4.99x
    thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
    thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
    thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
    thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
    thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
--- PASS: TestProportionalGoroutines (1.72s)

=== RUN   TestPingPongHog
    pinger/ponger expected≈1.00x got=0.96x
--- PASS: TestPingPongHog (0.91s)

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

@rickystewart: PTAL at the build tags, to make sure I've got them right. @tbg, @sumeerbhola: PTAL at everything else. Future PRs will wire this up into the elastic CPU limiter we're building towards for latency isolation for backups.

@irfansharif irfansharif force-pushed the 220810.cputime branch 4 times, most recently from f961c60 to 8605a33 Compare August 11, 2022 19:20
@irfansharif irfansharif requested a review from a team August 11, 2022 19:20
pkg/util/grunning/enabled_test.go Outdated Show resolved Hide resolved
pkg/util/grunning/enabled.go Show resolved Hide resolved
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:

    package grunning

    // Time returns the time spent by the current goroutine in the
    // running state.
    func Time() time.Duration

The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625.

We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See cockroachdb#84867 for more details.

This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:

    === RUN   TestEquivalentGoroutines
        thread=03 expected≈10.00% got= 9.98% of on-cpu time
        thread=06 expected≈10.00% got=10.00% of on-cpu time
        thread=02 expected≈10.00% got=10.01% of on-cpu time
        thread=10 expected≈10.00% got=10.01% of on-cpu time
        thread=07 expected≈10.00% got= 9.99% of on-cpu time
        thread=04 expected≈10.00% got= 9.99% of on-cpu time
        thread=09 expected≈10.00% got=10.00% of on-cpu time
        thread=01 expected≈10.00% got= 9.99% of on-cpu time
        thread=08 expected≈10.00% got=10.02% of on-cpu time
        thread=05 expected≈10.00% got=10.02% of on-cpu time
    --- PASS: TestEquivalentGoroutines (0.56s)

    === RUN   TestProportionalGoroutines
        thread=01 got  1.82% of on-cpu time: expected≈ 1.00x got=1.00x
        thread=02 got  3.64% of on-cpu time: expected≈ 2.00x got=2.00x
        thread=03 got  5.47% of on-cpu time: expected≈ 3.00x got=3.00x
        thread=04 got  7.28% of on-cpu time: expected≈ 4.00x got=4.00x
        thread=05 got  9.09% of on-cpu time: expected≈ 5.00x got=4.99x
        thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
        thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
        thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
        thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
        thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
    --- PASS: TestProportionalGoroutines (1.72s)

    === RUN   TestPingPongHog
        pinger/ponger expected≈1.00x got=0.96x
    --- PASS: TestPingPongHog (0.91s)

Release note: None
@tbg
Copy link
Member

tbg commented Aug 12, 2022

@irfansharif now's a good time to diversify away from me as a reviewer since I am sort of swamped and also likely won't be involved with admission control much. I assume Sumeer has got you covered but it's also worth thinking who else might be the third person on the squad, ideally someone from KV or an Obs team.

@tbg tbg removed their request for review August 12, 2022 14:14
@irfansharif
Copy link
Contributor Author

@kvoli or @lidorcarmel: mind giving this a look from the KV side of things? The immediate use is going to be in admission control, but I know the both of you are thinking about load balancing/allocation generally, and at some level, I think this will play a part (even if indirectly through AC): #83490.

@irfansharif
Copy link
Contributor Author

Thanks!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 12, 2022

Build succeeded:

@craig craig bot merged commit ff3fc7e into cockroachdb:master Aug 12, 2022
@irfansharif irfansharif deleted the 220810.cputime branch August 12, 2022 17:25
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 8 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/util/grunning/enabled_test.go line 111 at r2 (raw file):

		for i := trip; i > 0; i-- {
			ret += i
			ret = ret ^ (i + 0xcafebabe)

why the different computation compared to TestEquivalentGoroutines? I am assuming the only thing that matters in this test is that the different goroutines do different number of iterations.


pkg/util/grunning/enabled_test.go line 118 at r2 (raw file):

	}

	results := make([]int64, 10)

why is TestEquivalentGoroutines using a map when a slice like in this test would suffice?


pkg/util/grunning/enabled_test.go line 133 at r2 (raw file):

	total := int64(0)
	for _, result := range results {
		total += result

Does the barrier due to the WaitGroup obviate the need to do an atomic load here? If yes, why is there an atomic store above, when the different goroutines are writing to different ints?


pkg/util/grunning/enabled_test.go line 186 at r2 (raw file):

	}()
	ping <- true // start ping-pong
	<-stop

The test immediately stops the CPU hog? What is the purpose of the CPU hog?

It is not quite clear what this test is trying to demonstrate. Perhaps say something like
// Test that demonstrates that if 2 goroutines alternately cycle between running and waiting, they will get similar running times.


pkg/util/grunning/enabled_test.go line 201 at r2 (raw file):

	for n := 0; n < b.N; n++ {
		_ = grunning.Time()
	}

what was the result of this benchmark?

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent test changes over at #86185.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/util/grunning/enabled_test.go line 111 at r2 (raw file):

Previously, sumeerbhola wrote…

why the different computation compared to TestEquivalentGoroutines? I am assuming the only thing that matters in this test is that the different goroutines do different number of iterations.

The computation is just something I copied off of golang/go#36821, and did so originally when evaluating CPU accuracy of this measurement compared to profiler data (same as that golang/go issue). And yes, the only thing that matters is the different number of iterations.


pkg/util/grunning/enabled_test.go line 118 at r2 (raw file):

Previously, sumeerbhola wrote…

why is TestEquivalentGoroutines using a map when a slice like in this test would suffice?

No real reasons, I was being sloppy.


pkg/util/grunning/enabled_test.go line 133 at r2 (raw file):

Previously, sumeerbhola wrote…

Does the barrier due to the WaitGroup obviate the need to do an atomic load here? If yes, why is there an atomic store above, when the different goroutines are writing to different ints?

It does, I was being sloppy.


pkg/util/grunning/enabled_test.go line 186 at r2 (raw file):

Previously, sumeerbhola wrote…

The test immediately stops the CPU hog? What is the purpose of the CPU hog?

It is not quite clear what this test is trying to demonstrate. Perhaps say something like
// Test that demonstrates that if 2 goroutines alternately cycle between running and waiting, they will get similar running times.

The test isn't stopping the CPU hog, that line is waiting for the CPU hog to finish. I'll add a comment.


pkg/util/grunning/enabled_test.go line 201 at r2 (raw file):

Previously, sumeerbhola wrote…

what was the result of this benchmark?

$ dev bench pkg/util/grunning -f BenchmarkGRunningTime -v  
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkGRunningTime
BenchmarkGRunningTime-24        38336452                31.59 ns/op
PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants