x/build, x/build/cmd/coordinator: support for performance test execution #49207

prattmic · 2021-10-28T18:16:31Z

Background

#48803 tracks the creation of a performance monitoring system for the Go toolchain. This issue covers the first bullet: adding support to the build coordinator for running performance tests.

The performance tests we plan to run fall into one of these categories:

Standard “testing” package benchmarks living in the main Go repo or x/benchmarks.
“bent” third-party build- and micro-benchmarks.
Additional third party application large-scale benchmarks added to x/benchmarks.

Initial Work

Limitations

To start collecting data sooner rather than later, the initial version as described here will simplify the problem by applying the following limitations, which we intend to eventually remove:

Builds will be scheduled using the default coordinator priority (LIFO by commit time), rather than a more complex bisection scheme.
Benchmarks only run against the Go toolchain version under test, not the baseline.
No support for TryBot performance testing.
No special snapshotting for benchmark external dependencies.

MVP Design

Builds will initially run on VMs of consistent size and microarchitecture (buildlet named host-linux-amd64-perf). We will characterize the noise level using VMs and may later switch to sole tenant VMs or a dedicated physical machine to reduce noise. The benchmarks have a few external dependencies, notably rsync and perflock. These will be pre-installed in the machine image.

The new build configuration named linux-amd64-perf runs the performance tests as x/benchmarks sub-repo tests. It is configured to run only x/benchmarks tests by setting RunBench = true.

The benchmarks in x/benchmarks are not all run via go test, so runSubrepoTests will have special support for running x/benchmarks benchmarks. The tooling in x/benchmarks is still in flux to improve usability, so rather than encoding minor details about the tools into the coordinator, it will simply execute a to-be-written tool go run golang.org/x/benchmarks/cmd/bench, which is responsible for the details of running all benchmarks.

cmd/bench outputs results to stdout in the Go Benchmark Data Format. The coordinator uploads these results to perfdata.golang.org, adding additional configuration keys like:

Go toolchain commit
x/benchmarks commit
Build time

Old benchmark support

The coordinator has some support for running benchmarks out of x/benchmarks from 2017. This support was disabled in 2018 due to lack of support for some migrations in the coordinator. See CL 354315 for the full enumeration of this code.

This proposal will initially remove nearly all this support, as it is not relevant to the benchmarks we want to run today, and may be confusing to future readers. Some parts will be reused or repurposed, such as:

BuildConfig.RunBench to indicate performance test builders.
Client code to upload results to perfdata.golang.org.

Future Work

The design above is the minimum necessary to start running tests and collecting data, and is a starting point for future improvements we will want. Here I discuss the future changes we expect to make and the general expected design. We expect the priority and design of these to change as we learn from the running the MVP.

Baseline testing

To minimize noise from environmental changes like OS updates, we would like to run all tests against both the toolchain under test and a “fixed” baseline toolchain, which only changes occasionally (monthly?).

The main change here is to adjust runAllSharded to build both the baseline and test toolchain.

The baseline toolchain version is exposed to the tests as GOROOT_BASELINE.

To take advantage of toolchain snapshotting, we likely want to extend buildStatus.build to support individually fetching the baseline and test toolchain from different snapshots.

We expect this to be the first extension from the MVP design.

Scheduling priority

With many benchmarks and only a single buildlet, we expect that there may not be enough capacity to run every single commit. The coordinator’s currently scheduling algorithm is LIFO by commit time.

During lulls (such as weekends), the system will backfill in LIFO order, which may leave large gaps of untested commits. Instead, we would like to adjust the algorithm to prefer testing commits which will shrink the largest untested gap. i.e., effectively a binary search ordering. This order may even be an improvement to apply to all builds, not just performance tests.

Adding support for this will require plumbing more information about the completed builds into the scheduler.

Benchmark dependency snapshotting / caching

“bent” has large external dependencies fetched over the internet. “bent” fetches many third-party packages (via simple go get). Future benchmarks may fetch pre-built binary assests.

If these operations prove to be very expensive parts of testing, we may want to explore snapshotting these dependencies to save time across builds. The coordinator’s built-in snapshotting mechanism may not provide any speed boost vs fetching over the internet. Instead, read-only and checksummed copies of the dependencies could be saved on the buildlet for use across multiple builds.

The most important aspect here is to ensure that a test of a bad version of the toolchain can’t corrupt the cache in a way that breaks future builds.

cc @mknyszek @aclements @dr2chase @jeremyfaller @golang/release

The text was updated successfully, but these errors were encountered:

gopherbot · 2021-10-28T19:23:53Z

Change https://golang.org/cl/354315 mentions this issue: cmd/coordinator,dashboard,internal/buildgo: remove benchmark support

gopherbot · 2021-10-29T21:31:40Z

Change https://golang.org/cl/359854 mentions this issue: cmd/bench: add benchmark wrapper

Running benchmarks has been disabled since 2018. Remove all the code to keep things more maintainable and understandable. We will be adding new benchmarking support soon, and may reuse some of this code, but don't want half-working code adding confusion. For golang/go#49207. Change-Id: I11d52b0315bed4d91651c162af11853895012868 Reviewed-on: https://go-review.googlesource.com/c/build/+/354315 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Carlos Amedee <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>

The coordinator is getting support for running the benchmarks in this repository. Since the benchmarks and interface are in flux, encoding all of the details of running Go tests, bent arguments, etc into the coordinator will likely cause churn and frustrating migration issues. Instead, add cmd/bench which serves as the simple entrypoint for the coordinator. The coordinator runs cmd/bench with the GOROOT to test (eventually multiple GOROOTs), and this binary takes care of the remaining details. Right now, we just do a basic go test golang.org/x/benchmarks/... and simple invocation of bent. Note that bent does not pass without https://golang.org/cl/354634. For golang/go#49207 Change-Id: I5c9cf89540cab605c0a64e17af85311d37985c25 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/359854 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Knyszek <[email protected]>

gopherbot · 2021-11-04T19:51:47Z

Change https://golang.org/cl/361418 mentions this issue: cmd/coordinator: upload performance test results to perfdata.golang.org

gopherbot · 2021-11-04T19:51:48Z

Change https://golang.org/cl/361417 mentions this issue: cmd/coordinator: run performance tests from x/benchmarks

gopherbot · 2021-11-04T20:45:52Z

Change https://golang.org/cl/354311 mentions this issue: dashboard: add linux-amd64-perf host and builder

Add initial support for running the performance tests from x/benchmarks. Since there are a variety of different test suites (some `go test`, bent, etc), x/benchmarks provides a basic wrapper command, golang.org/x/benchmarks/cmd/bench which know the minute details. The coordinator just needs to run that one command. This build mode is limited to builds of x/benchmarks on builders with RunBench set to true. Currently there are none, a future CL will add the initial such linux-amd64 builder. For golang/go#49207 Change-Id: Ie006ec4a3757a5c2fed0925a3f9eb91edeaa5224 Reviewed-on: https://go-review.googlesource.com/c/build/+/361417 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>

The performance test output contains benchfmt-formatted benchmark results. Upload the output wholesale to perfdata.golang.org for long-term storage and analysis. The results for now are a bit rough, as the output may also contain unrelated output that lines that look like benchfmt. For example. "go: downloading github.com/BurntSushi/toml v0.3.1" adds a "go" label with the value "downloading ...". In the future, we will ideally filter these a bit better (perhaps in x/benchmarks/cmd/bench). For golang/go#49207 Change-Id: Ifd2512c93902a74f9040db0f9d0c600348fc1849 Reviewed-on: https://go-review.googlesource.com/c/build/+/361418 Reviewed-by: Alexander Rakoczy <[email protected]> Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]>

Add a new builder to run the x/benchmarks performance tests on linux-amd64. For now, this runs on a GCE C2 instance type, as these instances have well-defined, consistent CPUs and other server architecture components. In basic noise testing, even standard VMs of this type appear to be fairly low noise. As we gain experience with actual monitoring, we may change this to a sole-tenant VM type or even a dedicated machine if necessary. For golang/go#49207 Change-Id: I17eaeeb5349af925249940bebd5b860a2579e6df Reviewed-on: https://go-review.googlesource.com/c/build/+/354311 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>

gopherbot · 2021-11-05T17:33:28Z

Change https://golang.org/cl/361656 mentions this issue: internal/coordinator/pool: count C2 and N2 quotas separately

We currently use E2, C2, and N2 instances on GCE. C2 and N2 instances have their own quotas, which are accounted separately from the CPUS quotas. This could probably be cleaned up to keep track of all CPU quotas and handle more instance types, but this should work for the time being. See: https://cloud.google.com/compute/quotas#cpu_quota For golang/go#49207 Change-Id: Ida1e8de3c857560637095d57e972bca7222284ed Reviewed-on: https://go-review.googlesource.com/c/build/+/361656 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]>

gopherbot · 2021-11-05T18:33:01Z

Change https://golang.org/cl/361734 mentions this issue: cmd/bent: remove required dependencies

gopherbot · 2021-11-05T19:25:55Z

Change https://golang.org/cl/361754 mentions this issue: dashboard: SkipSnapshot for linux-amd64-perf

Since this builder doesn't build the go repo, it will be waiting forever for a snapshot. Instead, just build Go for each run. For golang/go#49207 Change-Id: I34a73b507278db402c478b4f5956633996772aae Reviewed-on: https://go-review.googlesource.com/c/build/+/361754 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>

gopherbot · 2021-11-05T20:47:37Z

Change https://golang.org/cl/361874 mentions this issue: cmd/coordinator: add basic metadata to perfdata upload

Make rsync optional with fallback to cp. Remove use of /usr/bin/time and replace with measuring time directly from Go. For golang/go#49207 Change-Id: Ief5a7a90f9460ddec1d5a51c99d4a534e38a5d9c Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/361734 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Reviewed-by: David Chase <[email protected]>

These make it possible to tell what was run, as well as a convenience field stating whether this was a post-submit build or a trybot run. For golang/go#49207 Change-Id: Iba979bcfd5a3bbdc11e2df0b8de4094cc7212356 Reviewed-on: https://go-review.googlesource.com/c/build/+/361874 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>

gopherbot · 2021-11-08T22:27:10Z

Change https://golang.org/cl/362375 mentions this issue: cmd/bench: wait for load average to drop before starting

gopherbot · 2022-01-06T23:05:48Z

Change https://golang.org/cl/376096 mentions this issue: cmd/bench: benchmark baseline toolchain

gopherbot · 2022-01-07T21:20:53Z

Change https://golang.org/cl/376634 mentions this issue: cmd/coordinator: baseline toolchain for benchmarks

If BENCH_BASELINE_GOROOT is set, additionally benchmark that toolchain. The benchfmt label 'toolchain' differentiates the 'experiment' and 'baseline' toolchains. For golang/go#49207. Change-Id: I737fa56786dc482172942462c5776c4c2773c0c5 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/376096 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Michael Knyszek <[email protected]>

When benchmarking, we want to benchmark both the toolchain under test (i.e., buildStatus.Rev) as well as an older "baseline" toolchain, which will be compared against. For now, the baseline toolchain is the latest stable release. In the future we may want to update more frequently, but this is a simple starting point. This CL determines the baseline toolchain commit for a given test and installs it on the buildlet at BENCH_BASELINE_GOROOT. golang.org/x/benchmarks/cmd/bench is responsible for utilizing the baseline toolchain. CL 376096 is the corresponding change to cmd/bench. Most of the baseline toolchain logic is limited to runBenchmarkTests(). In theory, it logically fits a bit better with the rest of the toolchain logic in build() et al, but keeping it limited to runBenchmarkTests() helps keep the common build() path from getting much more complex for a minor edge-case feature. For golang/go#49207. For golang/go#48803. Change-Id: Id63f8333cf9d1ff952850c3347e999b5e98f7294 Reviewed-on: https://go-review.googlesource.com/c/build/+/376634 Reviewed-by: Alex Rakoczy <[email protected]> Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>

The coordinator runs tests on a freshly booted VM which may still be running background boot tasks when bench starts. For minimal noise, wait for the system load average to drop (indicating background tasks have completed) before continuing with benchmarking. For golang/go#49207. Change-Id: I8df01592fea31d49eae54074213e202b21d5728a Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/362375 Reviewed-by: Michael Knyszek <[email protected]> Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>

gopherbot · 2022-01-13T00:51:35Z

Change https://golang.org/cl/378214 mentions this issue: cmd/bench: move toolchain selection closer to execution

gopherbot · 2022-01-13T05:07:16Z

Change https://golang.org/cl/378274 mentions this issue: cmd/bench: integrate the Sweet benchmarks

gopherbot · 2022-01-13T16:47:47Z

Change https://golang.org/cl/378336 mentions this issue: dashboard: extend perf builder timeout

gopherbot · 2022-01-13T16:47:48Z

Change https://golang.org/cl/378334 mentions this issue: dashboard,internal/coordinator/pool: VM delete timeout from host config

Allow individual host configurations to override the VM delete timeout if they are using for longer than normal builds. For golang/go#49207. Change-Id: I9c5c80e5ee7dac2375cff17c64871ae2211f6309 Reviewed-on: https://go-review.googlesource.com/c/build/+/378334 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Alex Rakoczy <[email protected]>

Right now we "iterate" over toolchains (GOROOTs) at the outer-most part of the tool, but bringing that in closer lets us do things like only build the benchmarking tools once. This change also introduces abstractions around the Go tool from the Sweet tool to simplify and deduplicate some code. For instance, building bent currently fails with the baseline GOROOT because the GOROOT environment variable isn't set correctly, but the "gotest" benchmarks do. For golang/go#49207. Change-Id: I6816e1112174f951d3bc22c2b1033b8e98dc0327 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/378214 Reviewed-by: Michael Pratt <[email protected]> Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: David Chase <[email protected]>

The benchmarks in the perf builder may take several hours to complete. Extend the VM deletion timeout so that they stick around long enough to complete benchmarking. For golang/go#49207. Change-Id: I3e9d2a1df657406ef0f80b9c0cb713df3b716ca8 Reviewed-on: https://go-review.googlesource.com/c/build/+/378336 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Alex Rakoczy <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>

gopherbot · 2022-01-31T23:16:56Z

Change https://golang.org/cl/382097 mentions this issue: cmd/bench: run 10 iterations for each bent benchmark

gopherbot · 2022-02-03T18:14:28Z

Change https://golang.org/cl/382894 mentions this issue: env: include make in linux/amd64 builder imagesk

Some benchmarks in x/benchmarks from external sources wrap the go tool in make. Add make to the linux/amd64 builders where these benchmarks will run. For golang/go#49207. Change-Id: I4ea16c0aa63d1b520c61d0a2b9dabffdd8bb7094 Reviewed-on: https://go-review.googlesource.com/c/build/+/382894 Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>

For golang/go#49207. Change-Id: Ib18c5f574e30333a7d9d80019e26d6a565f4db1e Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/378274 Reviewed-by: Michael Pratt <[email protected]> Trust: Michael Knyszek <[email protected]>

For golang/go#49207. Change-Id: I83fa87a603cf26ed61d324975166388db1801487 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/382097 Reviewed-by: David Chase <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]>

gopherbot · 2022-04-01T21:50:08Z

Change https://go.dev/cl/397655 mentions this issue: cmd/coordinator: record commit time in RFC3339

This is consistent with the format used by perfdata.golang.org for upload-time and x/perf/cmd/bench for runstamp. For golang/go#49207. Change-Id: I0c800629c23eb830803d3017806ca6c9c8907b87 Reviewed-on: https://go-review.googlesource.com/c/build/+/397655 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]>

prattmic · 2022-05-31T16:47:31Z

There will likely be more follow-up work here (some of the "future work" items), but they aren't planned right now and the core work is done.

prattmic added Performance NeedsFix The path to resolution is known, but the work has not been done. labels Oct 28, 2021

prattmic added this to the Backlog milestone Oct 28, 2021

gopherbot added the Builders x/build issues (builders, bots, dashboards) label Oct 28, 2021

toothrot mentioned this issue Nov 5, 2021

x/build: coordinator shouldn't wait forever for builders that do not snapshot #49400

Open

dmitshur changed the title ~~x/build: coordinator support for performance test execution~~ x/build, x/build/cmd/coordinator: support for performance test execution May 31, 2022

dmitshur mentioned this issue May 31, 2022

x/build/cmd/coordinator: tracking bug for running benchmarks in the coordinator. #19871

Closed

prattmic closed this as completed May 31, 2022

dmitshur modified the milestones: Backlog, Unreleased May 31, 2022

dmitshur mentioned this issue Jun 17, 2022

x/build: record and collect compiler benchmarks #17167

Open

golang locked and limited conversation to collaborators May 31, 2023

gopherbot added the FrozenDueToAge label May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/build, x/build/cmd/coordinator: support for performance test execution #49207

x/build, x/build/cmd/coordinator: support for performance test execution #49207

prattmic commented Oct 28, 2021

gopherbot commented Oct 28, 2021

gopherbot commented Oct 29, 2021

gopherbot commented Nov 4, 2021

gopherbot commented Nov 4, 2021

gopherbot commented Nov 4, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 8, 2021

gopherbot commented Jan 6, 2022

gopherbot commented Jan 7, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 31, 2022

gopherbot commented Feb 3, 2022

gopherbot commented Apr 1, 2022

prattmic commented May 31, 2022

x/build, x/build/cmd/coordinator: support for performance test execution #49207

x/build, x/build/cmd/coordinator: support for performance test execution #49207

Comments

prattmic commented Oct 28, 2021

Background

Initial Work

Limitations

MVP Design

Old benchmark support

Future Work

Baseline testing

Scheduling priority

Benchmark dependency snapshotting / caching

gopherbot commented Oct 28, 2021

gopherbot commented Oct 29, 2021

gopherbot commented Nov 4, 2021

gopherbot commented Nov 4, 2021

gopherbot commented Nov 4, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 5, 2021

gopherbot commented Nov 8, 2021

gopherbot commented Jan 6, 2022

gopherbot commented Jan 7, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 13, 2022

gopherbot commented Jan 31, 2022

gopherbot commented Feb 3, 2022

gopherbot commented Apr 1, 2022

prattmic commented May 31, 2022