-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build, x/build/cmd/coordinator: support for performance test execution #49207
Comments
Change https://golang.org/cl/354315 mentions this issue: |
Change https://golang.org/cl/359854 mentions this issue: |
Running benchmarks has been disabled since 2018. Remove all the code to keep things more maintainable and understandable. We will be adding new benchmarking support soon, and may reuse some of this code, but don't want half-working code adding confusion. For golang/go#49207. Change-Id: I11d52b0315bed4d91651c162af11853895012868 Reviewed-on: https://go-review.googlesource.com/c/build/+/354315 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Carlos Amedee <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>
The coordinator is getting support for running the benchmarks in this repository. Since the benchmarks and interface are in flux, encoding all of the details of running Go tests, bent arguments, etc into the coordinator will likely cause churn and frustrating migration issues. Instead, add cmd/bench which serves as the simple entrypoint for the coordinator. The coordinator runs cmd/bench with the GOROOT to test (eventually multiple GOROOTs), and this binary takes care of the remaining details. Right now, we just do a basic go test golang.org/x/benchmarks/... and simple invocation of bent. Note that bent does not pass without https://golang.org/cl/354634. For golang/go#49207 Change-Id: I5c9cf89540cab605c0a64e17af85311d37985c25 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/359854 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Michael Knyszek <[email protected]>
Change https://golang.org/cl/361418 mentions this issue: |
Change https://golang.org/cl/361417 mentions this issue: |
Change https://golang.org/cl/354311 mentions this issue: |
Add initial support for running the performance tests from x/benchmarks. Since there are a variety of different test suites (some `go test`, bent, etc), x/benchmarks provides a basic wrapper command, golang.org/x/benchmarks/cmd/bench which know the minute details. The coordinator just needs to run that one command. This build mode is limited to builds of x/benchmarks on builders with RunBench set to true. Currently there are none, a future CL will add the initial such linux-amd64 builder. For golang/go#49207 Change-Id: Ie006ec4a3757a5c2fed0925a3f9eb91edeaa5224 Reviewed-on: https://go-review.googlesource.com/c/build/+/361417 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>
The performance test output contains benchfmt-formatted benchmark results. Upload the output wholesale to perfdata.golang.org for long-term storage and analysis. The results for now are a bit rough, as the output may also contain unrelated output that lines that look like benchfmt. For example. "go: downloading github.com/BurntSushi/toml v0.3.1" adds a "go" label with the value "downloading ...". In the future, we will ideally filter these a bit better (perhaps in x/benchmarks/cmd/bench). For golang/go#49207 Change-Id: Ifd2512c93902a74f9040db0f9d0c600348fc1849 Reviewed-on: https://go-review.googlesource.com/c/build/+/361418 Reviewed-by: Alexander Rakoczy <[email protected]> Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]>
Add a new builder to run the x/benchmarks performance tests on linux-amd64. For now, this runs on a GCE C2 instance type, as these instances have well-defined, consistent CPUs and other server architecture components. In basic noise testing, even standard VMs of this type appear to be fairly low noise. As we gain experience with actual monitoring, we may change this to a sole-tenant VM type or even a dedicated machine if necessary. For golang/go#49207 Change-Id: I17eaeeb5349af925249940bebd5b860a2579e6df Reviewed-on: https://go-review.googlesource.com/c/build/+/354311 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>
Change https://golang.org/cl/361656 mentions this issue: |
We currently use E2, C2, and N2 instances on GCE. C2 and N2 instances have their own quotas, which are accounted separately from the CPUS quotas. This could probably be cleaned up to keep track of all CPU quotas and handle more instance types, but this should work for the time being. See: https://cloud.google.com/compute/quotas#cpu_quota For golang/go#49207 Change-Id: Ida1e8de3c857560637095d57e972bca7222284ed Reviewed-on: https://go-review.googlesource.com/c/build/+/361656 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]>
Change https://golang.org/cl/361734 mentions this issue: |
Change https://golang.org/cl/361754 mentions this issue: |
Since this builder doesn't build the go repo, it will be waiting forever for a snapshot. Instead, just build Go for each run. For golang/go#49207 Change-Id: I34a73b507278db402c478b4f5956633996772aae Reviewed-on: https://go-review.googlesource.com/c/build/+/361754 Trust: Alexander Rakoczy <[email protected]> Run-TryBot: Alexander Rakoczy <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
Change https://golang.org/cl/361874 mentions this issue: |
Make rsync optional with fallback to cp. Remove use of /usr/bin/time and replace with measuring time directly from Go. For golang/go#49207 Change-Id: Ief5a7a90f9460ddec1d5a51c99d4a534e38a5d9c Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/361734 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Reviewed-by: David Chase <[email protected]>
These make it possible to tell what was run, as well as a convenience field stating whether this was a post-submit build or a trybot run. For golang/go#49207 Change-Id: Iba979bcfd5a3bbdc11e2df0b8de4094cc7212356 Reviewed-on: https://go-review.googlesource.com/c/build/+/361874 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Alexander Rakoczy <[email protected]>
Change https://golang.org/cl/362375 mentions this issue: |
Change https://golang.org/cl/376096 mentions this issue: |
Change https://golang.org/cl/376634 mentions this issue: |
If BENCH_BASELINE_GOROOT is set, additionally benchmark that toolchain. The benchfmt label 'toolchain' differentiates the 'experiment' and 'baseline' toolchains. For golang/go#49207. Change-Id: I737fa56786dc482172942462c5776c4c2773c0c5 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/376096 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Michael Knyszek <[email protected]>
When benchmarking, we want to benchmark both the toolchain under test (i.e., buildStatus.Rev) as well as an older "baseline" toolchain, which will be compared against. For now, the baseline toolchain is the latest stable release. In the future we may want to update more frequently, but this is a simple starting point. This CL determines the baseline toolchain commit for a given test and installs it on the buildlet at BENCH_BASELINE_GOROOT. golang.org/x/benchmarks/cmd/bench is responsible for utilizing the baseline toolchain. CL 376096 is the corresponding change to cmd/bench. Most of the baseline toolchain logic is limited to runBenchmarkTests(). In theory, it logically fits a bit better with the rest of the toolchain logic in build() et al, but keeping it limited to runBenchmarkTests() helps keep the common build() path from getting much more complex for a minor edge-case feature. For golang/go#49207. For golang/go#48803. Change-Id: Id63f8333cf9d1ff952850c3347e999b5e98f7294 Reviewed-on: https://go-review.googlesource.com/c/build/+/376634 Reviewed-by: Alex Rakoczy <[email protected]> Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
The coordinator runs tests on a freshly booted VM which may still be running background boot tasks when bench starts. For minimal noise, wait for the system load average to drop (indicating background tasks have completed) before continuing with benchmarking. For golang/go#49207. Change-Id: I8df01592fea31d49eae54074213e202b21d5728a Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/362375 Reviewed-by: Michael Knyszek <[email protected]> Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Change https://golang.org/cl/378214 mentions this issue: |
Change https://golang.org/cl/378274 mentions this issue: |
Change https://golang.org/cl/378336 mentions this issue: |
Change https://golang.org/cl/378334 mentions this issue: |
Allow individual host configurations to override the VM delete timeout if they are using for longer than normal builds. For golang/go#49207. Change-Id: I9c5c80e5ee7dac2375cff17c64871ae2211f6309 Reviewed-on: https://go-review.googlesource.com/c/build/+/378334 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Alex Rakoczy <[email protected]>
Right now we "iterate" over toolchains (GOROOTs) at the outer-most part of the tool, but bringing that in closer lets us do things like only build the benchmarking tools once. This change also introduces abstractions around the Go tool from the Sweet tool to simplify and deduplicate some code. For instance, building bent currently fails with the baseline GOROOT because the GOROOT environment variable isn't set correctly, but the "gotest" benchmarks do. For golang/go#49207. Change-Id: I6816e1112174f951d3bc22c2b1033b8e98dc0327 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/378214 Reviewed-by: Michael Pratt <[email protected]> Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: David Chase <[email protected]>
The benchmarks in the perf builder may take several hours to complete. Extend the VM deletion timeout so that they stick around long enough to complete benchmarking. For golang/go#49207. Change-Id: I3e9d2a1df657406ef0f80b9c0cb713df3b716ca8 Reviewed-on: https://go-review.googlesource.com/c/build/+/378336 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Alex Rakoczy <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>
Change https://golang.org/cl/382097 mentions this issue: |
Change https://golang.org/cl/382894 mentions this issue: |
Some benchmarks in x/benchmarks from external sources wrap the go tool in make. Add make to the linux/amd64 builders where these benchmarks will run. For golang/go#49207. Change-Id: I4ea16c0aa63d1b520c61d0a2b9dabffdd8bb7094 Reviewed-on: https://go-review.googlesource.com/c/build/+/382894 Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>
For golang/go#49207. Change-Id: Ib18c5f574e30333a7d9d80019e26d6a565f4db1e Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/378274 Reviewed-by: Michael Pratt <[email protected]> Trust: Michael Knyszek <[email protected]>
For golang/go#49207. Change-Id: I83fa87a603cf26ed61d324975166388db1801487 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/382097 Reviewed-by: David Chase <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Trust: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Change https://go.dev/cl/397655 mentions this issue: |
This is consistent with the format used by perfdata.golang.org for upload-time and x/perf/cmd/bench for runstamp. For golang/go#49207. Change-Id: I0c800629c23eb830803d3017806ca6c9c8907b87 Reviewed-on: https://go-review.googlesource.com/c/build/+/397655 Trust: Michael Pratt <[email protected]> Run-TryBot: Michael Pratt <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
There will likely be more follow-up work here (some of the "future work" items), but they aren't planned right now and the core work is done. |
Background
#48803 tracks the creation of a performance monitoring system for the Go toolchain. This issue covers the first bullet: adding support to the build coordinator for running performance tests.
The performance tests we plan to run fall into one of these categories:
Initial Work
Limitations
To start collecting data sooner rather than later, the initial version as described here will simplify the problem by applying the following limitations, which we intend to eventually remove:
MVP Design
Builds will initially run on VMs of consistent size and microarchitecture (buildlet named
host-linux-amd64-perf
). We will characterize the noise level using VMs and may later switch to sole tenant VMs or a dedicated physical machine to reduce noise. The benchmarks have a few external dependencies, notablyrsync
andperflock
. These will be pre-installed in the machine image.The new build configuration named
linux-amd64-perf
runs the performance tests as x/benchmarks sub-repo tests. It is configured to run only x/benchmarks tests by setting RunBench = true.The benchmarks in x/benchmarks are not all run via
go test
, so runSubrepoTests will have special support for running x/benchmarks benchmarks. The tooling in x/benchmarks is still in flux to improve usability, so rather than encoding minor details about the tools into the coordinator, it will simply execute a to-be-written toolgo run golang.org/x/benchmarks/cmd/bench
, which is responsible for the details of running all benchmarks.cmd/bench
outputs results to stdout in the Go Benchmark Data Format. The coordinator uploads these results to perfdata.golang.org, adding additional configuration keys like:Old benchmark support
The coordinator has some support for running benchmarks out of x/benchmarks from 2017. This support was disabled in 2018 due to lack of support for some migrations in the coordinator. See CL 354315 for the full enumeration of this code.
This proposal will initially remove nearly all this support, as it is not relevant to the benchmarks we want to run today, and may be confusing to future readers. Some parts will be reused or repurposed, such as:
Future Work
The design above is the minimum necessary to start running tests and collecting data, and is a starting point for future improvements we will want. Here I discuss the future changes we expect to make and the general expected design. We expect the priority and design of these to change as we learn from the running the MVP.
Baseline testing
To minimize noise from environmental changes like OS updates, we would like to run all tests against both the toolchain under test and a “fixed” baseline toolchain, which only changes occasionally (monthly?).
The main change here is to adjust runAllSharded to build both the baseline and test toolchain.
The baseline toolchain version is exposed to the tests as GOROOT_BASELINE.
To take advantage of toolchain snapshotting, we likely want to extend buildStatus.build to support individually fetching the baseline and test toolchain from different snapshots.
We expect this to be the first extension from the MVP design.
Scheduling priority
With many benchmarks and only a single buildlet, we expect that there may not be enough capacity to run every single commit. The coordinator’s currently scheduling algorithm is LIFO by commit time.
During lulls (such as weekends), the system will backfill in LIFO order, which may leave large gaps of untested commits. Instead, we would like to adjust the algorithm to prefer testing commits which will shrink the largest untested gap. i.e., effectively a binary search ordering. This order may even be an improvement to apply to all builds, not just performance tests.
Adding support for this will require plumbing more information about the completed builds into the scheduler.
Benchmark dependency snapshotting / caching
“bent” has large external dependencies fetched over the internet. “bent” fetches many third-party packages (via simple
go get
). Future benchmarks may fetch pre-built binary assests.If these operations prove to be very expensive parts of testing, we may want to explore snapshotting these dependencies to save time across builds. The coordinator’s built-in snapshotting mechanism may not provide any speed boost vs fetching over the internet. Instead, read-only and checksummed copies of the dependencies could be saved on the buildlet for use across multiple builds.
The most important aspect here is to ensure that a test of a bad version of the toolchain can’t corrupt the cache in a way that breaks future builds.
cc @mknyszek @aclements @dr2chase @jeremyfaller @golang/release
The text was updated successfully, but these errors were encountered: