Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up bors cycle time #8456

Closed
alexcrichton opened this issue Aug 11, 2013 · 10 comments
Closed

Speed up bors cycle time #8456

alexcrichton opened this issue Aug 11, 2013 · 10 comments

Comments

@alexcrichton
Copy link
Member

Right now the cycle time for a passing pull request is 3+ hours, which is up from about 2 hours a week ago, and up from ~1hr if I remember correctly from a few months ago. This isn't really an artifact of rust getting slower, but rather @bors working even harder at testing. It does look like there's some easy low-hanging fruit which could speed up cycle time by at least an hour or two.

Currently the slowest builds are *-nopt (3+ hours), *-all (~2 hours), and *-vg (~1.5 hours). The baseline build time seems to be ~1 hour, and I think that we can get back to that level of speed with these suggestions:

  1. For *-nopt builds, build the compiler with optimizations, but don't run tests with optimizations. This could be enabled by Allow disabling optimizations in tests only #8450, and I believe that with the runtime now unoptimized for *-nopt builds we're seeing a huge slowdown.

  2. Parallelize the *-all builds. From what I understand, these builds essentially target 4 separate builds on linux/mac: H64 -> T32, H64 -> T64, H32 -> T64, H32 -> T32 (where H == host word size, T == target word size). Two of these are already tested by other builders (H64 -> T64, H32 -> T32), and the remaining two don't really need to run on the same builder, they could be built in parallel.

    Basically I think we could get the same test coverage by removing the *-all builders and introducing *-cross32 and *-cross64 builders instead (where the host is different from the target architecture)

  3. Either resolve Investigate running tests under Address Sanitizer #749 (should run a lot faster than valgrind), or run only a subset of the tests under valgrind. This is kind of tricky to do either of these, but I think that in the immediate future it's more reasonable to run a subset of the tests in valgrind. I'm not sure if others would agree, but I'd think that we could run only the run-pass, libXtest, and run-fail suites on valgrind bots and get mostly the same signal as running valgrind over the entire build.

    Perhaps the snapshot builders could run valgrind for the entire suite, but inevitably we'd get a failure which would then be difficult to debug (but I think would be worth it).

I think that with these three possibilities combined, we could get the cycle time back down to an hour and start processing a lot more builds in one day.

@graydon, @brson, would it be possible to configure the bots this way?

@graydon
Copy link
Contributor

graydon commented Aug 11, 2013

Maybe! But I'm also interested in figuring out the sources of the slowdown in the first place.

@brson
Copy link
Contributor

brson commented Aug 11, 2013

The purpose of the -all builds is primarily to ensure that the build system continues to support cross-compiling, both in cross-target and multi-host configurations. In the multi-host configuration where you are building a compiler for both 64 and 32-bit targets there's no easy way to avoid building 4 complete sets of libraries. We could make the multi-host build more efficient in a few ways, primarily by not building all three stages on the secondary hosts - just jumping straight to stage2, and also by bootstrapping the secondary hosts from an earlier stage of the primary. We could also not run the tests or run fewer tests in the cross builds - what's most important is that the build system continues to function.

The valgrind builds could run single-threaded if they are not already.

#7425 is about optimizing the compiler on the nopt bots.

@alexcrichton
Copy link
Member Author

Would it be possible to add the *-crossXX builders which actually run tests, but only one host -> target architecture, and then also have the *-all builders but not running any tests. That way we could make sure that the cross-build system works along with running tests against the builds, but in parallel instead of serialized.

Also, #8450 should close #7425 after the configuration is changed on the bots.

@graydon
Copy link
Contributor

graydon commented Aug 12, 2013

#4572 / generally reorganizing the metadata-read path for small files is likely the next biggest win.

@alexcrichton
Copy link
Member Author

After implementing suggestion 1, and some runtime optimizations, we're down from 3h to 1h40m at the longest time. Currently the longest builders are the *-all-opt builders. The next slowest is *-opt-vg clocking in at 1h20m followed by linux-32-nopt-t clocking in around 1h10m.

For the next speedup win, it looks like the -all-opt builders are running tests in configurations which are already tested by the *-opt builders. Specifically:

  • check-stage2-T-x86_64-unknown-linux-gnu-H-x86_64-unknown-linux-gnu-cfail
  • check-stage2-T-x86_64-unknown-linux-gnu-H-x86_64-unknown-linux-gnu-std
  • check-stage2-T-x86_64-unknown-linux-gnu-H-x86_64-unknown-linux-gnu-extra
  • check-stage2-T-x86_64-unknown-linux-gnu-H-x86_64-unknown-linux-gnu-rpass
  • check-stage2-T-x86_64-unknown-linux-gnu-H-x86_64-unknown-linux-gnu-rustpkg
  • check-stage2-T-x86_64-unknown-linux-gnu-H-x86_64-unknown-linux-gnu-rusti
  • check-stage2-T-x86_64-unknown-linux-gnu-H-x86_64-unknown-linux-gnu-rfail

Basically these combinations are being run:

  • T-x86_64-H-x86_64
  • T-i686-H-x86_64

Oddly enough, these combinations are not being run

  • T-x86_64-H-i686
  • T-i686-H-i686

@brson, would it be possible to eliminate the T-x86_64-H-x86_64 tests from being run on the grounds that the functionality is already tested elsewhere? Also, is it an oversight or is it intended that the H-i686 tests aren't running?

@graydon
Copy link
Contributor

graydon commented Aug 14, 2013

I'll look into this today. I also think there's reason to believe the tests are not parallelizing due to the new scheduler, and possibly there are negative interactions with valgrind on the -vg bots.

@graydon
Copy link
Contributor

graydon commented Aug 20, 2013

There are other strategic moves in terms of the buildbot that we could do here, though I am hesitant to do any ahead of simply fixing the speed bottlenecks in the compiler and testsuite. For reference sake:

  • Sully's suggestion: run most tests using the check-fast target then run the tests marked xfail-fast manually to complete coverage.
  • Niko's suggestion: run valgrind tests periodically (say once every 10 landings) and halt bors if they fail. This requires modifying bors somewhat, and leaves us with having to bisect to find culprits, and blocks everyone on breakage rather than just the person responsible for the bad change, but it does mean we'd avoid sitting on valgrind breakage for too long.
  • My suggestion: split the testsuite into 'mod N' pieces and run N-times as many builders. This way builder K runs a test if the test number mod N is K. This will not help with the straight-line bit but will absorb arbitrarily large testsuites.

@brson
Copy link
Contributor

brson commented Aug 21, 2013

@alexcrichton the reason all four configurations are not tested is essentially an historical accident - I never paid too much attention to what make check does in this configuration.

Under the cross configuration the tests could mostly be eliminated, but I would like to continue to ensure that the makefiles work (tests.mk is crazy complicated) so would like to keep some subset of the cross tests running.

Actually though when thinking of the most important use case here of ensure that the android cross tests work we should actually be doing the cross tests differently, testing the build arch compiling the cross target arch (e.g. x86_64->i686, x86_64->x86_64), not all host archs testing the build arch (e.g. x86_64->x86_64, i686->x86_64).

@alexcrichton
Copy link
Member Author

Closing this for now. The cycle time is around 1hr 10min now, which is very tolerable. The queue does not have a huge backup nowadays, so I don't think that this is a pressing matter any more.

More specific issues may be opened in the future, but I don't think that this blanket issue is necessary any more.

@l0kod
Copy link
Contributor

l0kod commented Aug 20, 2014

Distributed build systems for Rust: http://discuss.rust-lang.org/t/distributed-build-systems-for-rust/400

flip1995 pushed a commit to flip1995/rust that referenced this issue Mar 14, 2022
check `use_self` in `pat`

fixes rust-lang#6955

changelog: check `use_self` in `pat`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants