Add type-system-chess to stress trait resolution. #1680

Dragon-Hatcher · 2023-07-28T15:07:19Z

This adds the rust version of my Dragon-Hatcher/type-system-chess in order to stress trait resolution. There is basically no code generation involved, only type checking.

On my laptop CARGO_INCREMENTAL=0 cargo check ; cargo clean takes 38.68 seconds. This time can easily adjusted by adding or removing positions to analyze.

collector/compile-benchmarks/type-system-chess/perf-config.json

Kobzol · 2023-07-28T15:17:00Z

Wow, pretty cool project I gotta say :) We'll see what others think about it being a benchmark.

CC @lcnr - does this look like something that could e.g. usefully stress test the new trait solver?

lcnr · 2023-07-28T17:29:10Z

if you have the capacity to add it as a secondary benchmark then I think this is somewhat valuable. It could be a good indicator preventing us from accidentally introducing some non-linear complexity somewhere

Kobzol · 2023-07-28T19:06:07Z

Ok, thank you. We will discuss this at the performance team meeting next week.

In any case, 40s seems a bit too much. We run each benchmark three times, and adding (at least) 120s to the suite is a bit much to my taste. Could you (eventually) reduce this down to ~10-15s?

Dragon-Hatcher · 2023-07-28T22:57:42Z

Changing the board configuration to just the two kings in the center of the board brings the time down to around that range however it then uses fewer of the types involved. Could we instead run it just once? I noticed some of the other benchmarks do that.

Kobzol · 2023-07-30T07:59:34Z

That's also possible, but the fewer iterations, the larger the noise could be. It would only make sense if the benchmark couldn't be easily reduced to a smaller time (as e.g. cargo) and if the additional time was actually relevant. If the performed work is the same, and just expanded from 15s to 45s, then I'm not sure it's worth it. Exponential behaviour and other problems will probably be seen also on 15s compilation, which is already quite long.

Dragon-Hatcher · 2023-07-30T16:48:17Z

Okay. If the performance team is interested in this we can see how long it takes to run on the CI pipeline.

nnethercote · 2023-08-04T07:39:38Z

I took a look with Cachegrind. The profile is attached. It certainly looks different to any other profile I've ever seen!
cgann-type-system-chess.txt

In general, hash map insertion is extremely hot.

Some specific code snippets follow, with instruction counts.

In compiler/rustc_middle/src/traits/mod.rs:

20,794,676,850 (8.6%)  #[derive(Clone, Debug, PartialEq, Eq, Lift, HashStable, TyEncodable, TyDecodable)]
             .         #[derive(TypeVisitable, TypeFoldable)]
             .         pub enum ObligationCauseCode<'tcx> {

In library/core/src/option.rs:

             .         #[unstable(feature = "spec_option_partial_eq", issue = "none", reason = "exposed only for rustc")]
             .         impl<T: PartialEq> SpecOptionPartialEq for T {
             .             #[inline]
             .             default fn eq(l: &Option<T>, r: &Option<T>) -> bool {
16,796,501,172 (7.0%)          match (l, r) { 
             .                     (Some(l), Some(r)) => *l == *r,
             .                     (None, None) => true,
             .                     _ => false,
             .                 }
             .             }
             .         }

In compiler/rustc_middle/src/traits/mod.rs:

11,197,133,402 (4.6%)  #[derive(Clone, Debug, PartialEq, Eq, Lift, HashStable, TyEncodable, TyDecodable)]
             .         #[derive(TypeVisitable, TypeFoldable)]
             .         pub struct ImplDerivedObligationCause<'tcx> {

In library/alloc/src/boxed.rs:

            .         #[stable(feature = "rust1", since = "1.0.0")]
            .         impl<T: ?Sized + PartialEq, A: Allocator> PartialEq for Box<T, A> {
            .             #[inline]
            .             fn eq(&self, other: &Self) -> bool {
9,597,542,916 (4.0%)          PartialEq::eq(&**self, &**other)
            .             }

And with Callgrind I was able to identify that this code is the heart of the problem. I haven't investigated what kind of data is in there, e.g. whether the obligations list gets extremely long.

I have profiles of 1000 of the top crates on crates.io on my machine, and none of them look anything like this profile. So my inclination is to not create a benchmark for this, because it's so unusual. Having said that, there's a good chance that some small changes will speed up compilation of this code drastically.

lcnr · 2023-08-04T07:50:20Z

Because I am unsure where else to put it:

I spent an hour or so playing with https://github.com/Dragon-Hatcher/type-system-chess and -Ztrait-solver=next after rust-lang/rust#114287. The status is that 1) the new solver seems to be faster than the old one here (when only using S2 of main) and the recursion depth computation varies wildly between both solvers. The new solver needs a recursion limit of 512 to compile all impls. It also always overflows once we reify S3 right now, have to figure out why exactly. I will continue looking into this with the goal to get at least Fool's mate to work. This is a really fun benchmark for the trait solver, even if it isn't too indicative of real crates 😁

Dragon-Hatcher · 2023-08-05T20:24:26Z

@nnethercote I'm not that knowledgeable in this area but is it possible that even though the workload doesn't look like that of real world crates it could still be a useful benchmark? Real crates don't spend 100% of their time doing trait resolution but lots of crates still need to do heavy trait resolution. If we improve this benchmark then those area could get faster. My thinking when I submitted this was that it would give a pure signal on that front. OTOH maybe the workloads here are too extreme to be a good signal and optimizing for this benchmark could be harmful to normal crates.

nnethercote · 2023-08-07T06:47:55Z

Real crates don't spend 100% of their time doing trait resolution but lots of crates still need to do heavy trait resolution. If we improve this benchmark then those area could get faster.

Like I said above:

I have profiles of 1000 of the top crates on crates.io on my machine, and none of them look anything like this profile.

This is a unique program 😄

Equally importantly, there are constraints on the benchmark suite size. More benchmarks means it takes longer to run, and there are more results to interpret on every run. We currently have 19 primary and 23 secondary benchmarks. I already think there are too many secondary benchmarks. We just don't have space for every stress test, unfortunately.

I just tried changing the PredicateObligations type in the compiler from a Vec to a HashSet. That could help speed this up a lot, because most of the time is spent deduplicating very long obligation vectors. (I measured them, the biggest ones have over 250,000 elements; after de-duplicating their size shrinks by 2x or more.) Unfortunately PredicateObligations is used in many places, and working through all of those places is taking some time, and I'm not sure if I'll hit a showstopper along the way.

This gives massive (~7x) compile time and memory usage reductions for the trait system stress test in rust-lang/rustc-perf#1680.

nnethercote · 2023-08-09T05:51:51Z

FWIW, I got about a 7x reduction in compile time and memory usage for this program in rust-lang/rust#114611. There are a couple of secondary benchmarks that also get wins of a couple of percent. And I looked more closely at the top 1000 crates, I see a handful there that might also benefit by 1 or 2%.

Dragon-Hatcher · 2023-08-09T23:24:20Z

Really cool!

…piler-errors Speed up compilation of `type-system-chess` [`type-system-chess`](rust-lang/rustc-perf#1680) is an unusual program that implements a compile-time chess position solver in the trait system(!) This PR is about making it compile faster. r? `@ghost`

lqd · 2023-08-18T13:46:19Z

rust-lang/rust#114611 has landed so we can close this PR.

Dragon-Hatcher · 2023-08-19T02:47:53Z

Thanks for your work @nnethercote!

Speed up compilation of `type-system-chess` [`type-system-chess`](rust-lang/rustc-perf#1680) is an unusual program that implements a compile-time chess position solver in the trait system(!) This PR is about making it compile faster. r? `@ghost`

Dragon-Hatcher added 2 commits July 28, 2023 10:02

Add type system chess code

e617877

Config

611eb8a

Kobzol reviewed Jul 28, 2023

View reviewed changes

collector/compile-benchmarks/type-system-chess/perf-config.json Show resolved Hide resolved

Exclude codegen profiles

795456f

nnethercote mentioned this pull request Aug 8, 2023

Speed up compilation of type-system-chess rust-lang/rust#114611

Merged

nnethercote added a commit to nnethercote/rust that referenced this pull request Aug 9, 2023

Ignore cause and recursion_depth in Obligation::{eq,hash}.

698f0e3

This gives massive (~7x) compile time and memory usage reductions for the trait system stress test in rust-lang/rustc-perf#1680.

Dragon-Hatcher closed this Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add type-system-chess to stress trait resolution. #1680

Add type-system-chess to stress trait resolution. #1680

Dragon-Hatcher commented Jul 28, 2023

Kobzol commented Jul 28, 2023

lcnr commented Jul 28, 2023

Kobzol commented Jul 28, 2023

Dragon-Hatcher commented Jul 28, 2023

Kobzol commented Jul 30, 2023

Dragon-Hatcher commented Jul 30, 2023

nnethercote commented Aug 4, 2023

lcnr commented Aug 4, 2023

Dragon-Hatcher commented Aug 5, 2023

nnethercote commented Aug 7, 2023 •

edited

Loading

nnethercote commented Aug 9, 2023

Dragon-Hatcher commented Aug 9, 2023

lqd commented Aug 18, 2023

Dragon-Hatcher commented Aug 19, 2023

Add type-system-chess to stress trait resolution. #1680

Add type-system-chess to stress trait resolution. #1680

Conversation

Dragon-Hatcher commented Jul 28, 2023

Kobzol commented Jul 28, 2023

lcnr commented Jul 28, 2023

Kobzol commented Jul 28, 2023

Dragon-Hatcher commented Jul 28, 2023

Kobzol commented Jul 30, 2023

Dragon-Hatcher commented Jul 30, 2023

nnethercote commented Aug 4, 2023

lcnr commented Aug 4, 2023

Dragon-Hatcher commented Aug 5, 2023

nnethercote commented Aug 7, 2023 • edited Loading

nnethercote commented Aug 9, 2023

Dragon-Hatcher commented Aug 9, 2023

lqd commented Aug 18, 2023

Dragon-Hatcher commented Aug 19, 2023

nnethercote commented Aug 7, 2023 •

edited

Loading