Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type-system-chess to stress trait resolution. #1680

Closed
wants to merge 3 commits into from

Conversation

Dragon-Hatcher
Copy link

This adds the rust version of my Dragon-Hatcher/type-system-chess in order to stress trait resolution. There is basically no code generation involved, only type checking.

On my laptop CARGO_INCREMENTAL=0 cargo check ; cargo clean takes 38.68 seconds. This time can easily adjusted by adding or removing positions to analyze.

@Kobzol
Copy link
Contributor

Kobzol commented Jul 28, 2023

Wow, pretty cool project I gotta say :) We'll see what others think about it being a benchmark.

CC @lcnr - does this look like something that could e.g. usefully stress test the new trait solver?

@lcnr
Copy link
Contributor

lcnr commented Jul 28, 2023

if you have the capacity to add it as a secondary benchmark then I think this is somewhat valuable. It could be a good indicator preventing us from accidentally introducing some non-linear complexity somewhere

@Kobzol
Copy link
Contributor

Kobzol commented Jul 28, 2023

Ok, thank you. We will discuss this at the performance team meeting next week.

In any case, 40s seems a bit too much. We run each benchmark three times, and adding (at least) 120s to the suite is a bit much to my taste. Could you (eventually) reduce this down to ~10-15s?

@Dragon-Hatcher
Copy link
Author

Changing the board configuration to just the two kings in the center of the board brings the time down to around that range however it then uses fewer of the types involved. Could we instead run it just once? I noticed some of the other benchmarks do that.

@Kobzol
Copy link
Contributor

Kobzol commented Jul 30, 2023

That's also possible, but the fewer iterations, the larger the noise could be. It would only make sense if the benchmark couldn't be easily reduced to a smaller time (as e.g. cargo) and if the additional time was actually relevant. If the performed work is the same, and just expanded from 15s to 45s, then I'm not sure it's worth it. Exponential behaviour and other problems will probably be seen also on 15s compilation, which is already quite long.

@Dragon-Hatcher
Copy link
Author

Okay. If the performance team is interested in this we can see how long it takes to run on the CI pipeline.

@nnethercote
Copy link
Contributor

I took a look with Cachegrind. The profile is attached. It certainly looks different to any other profile I've ever seen!
cgann-type-system-chess.txt

In general, hash map insertion is extremely hot.

Some specific code snippets follow, with instruction counts.

In compiler/rustc_middle/src/traits/mod.rs:

20,794,676,850 (8.6%)  #[derive(Clone, Debug, PartialEq, Eq, Lift, HashStable, TyEncodable, TyDecodable)]
             .         #[derive(TypeVisitable, TypeFoldable)]
             .         pub enum ObligationCauseCode<'tcx> {

In library/core/src/option.rs:

             .         #[unstable(feature = "spec_option_partial_eq", issue = "none", reason = "exposed only for rustc")]
             .         impl<T: PartialEq> SpecOptionPartialEq for T {
             .             #[inline]
             .             default fn eq(l: &Option<T>, r: &Option<T>) -> bool {
16,796,501,172 (7.0%)          match (l, r) { 
             .                     (Some(l), Some(r)) => *l == *r,
             .                     (None, None) => true,
             .                     _ => false,
             .                 }
             .             }
             .         }  

In compiler/rustc_middle/src/traits/mod.rs:

11,197,133,402 (4.6%)  #[derive(Clone, Debug, PartialEq, Eq, Lift, HashStable, TyEncodable, TyDecodable)]
             .         #[derive(TypeVisitable, TypeFoldable)]
             .         pub struct ImplDerivedObligationCause<'tcx> {

In library/alloc/src/boxed.rs:

            .         #[stable(feature = "rust1", since = "1.0.0")]
            .         impl<T: ?Sized + PartialEq, A: Allocator> PartialEq for Box<T, A> {
            .             #[inline]
            .             fn eq(&self, other: &Self) -> bool {
9,597,542,916 (4.0%)          PartialEq::eq(&**self, &**other)
            .             }

And with Callgrind I was able to identify that this code is the heart of the problem. I haven't investigated what kind of data is in there, e.g. whether the obligations list gets extremely long.

I have profiles of 1000 of the top crates on crates.io on my machine, and none of them look anything like this profile. So my inclination is to not create a benchmark for this, because it's so unusual. Having said that, there's a good chance that some small changes will speed up compilation of this code drastically.

@lcnr
Copy link
Contributor

lcnr commented Aug 4, 2023

Because I am unsure where else to put it:

I spent an hour or so playing with https://github.com/Dragon-Hatcher/type-system-chess and -Ztrait-solver=next after rust-lang/rust#114287. The status is that 1) the new solver seems to be faster than the old one here (when only using S2 of main) and the recursion depth computation varies wildly between both solvers. The new solver needs a recursion limit of 512 to compile all impls. It also always overflows once we reify S3 right now, have to figure out why exactly. I will continue looking into this with the goal to get at least Fool's mate to work. This is a really fun benchmark for the trait solver, even if it isn't too indicative of real crates 😁

@Dragon-Hatcher
Copy link
Author

@nnethercote I'm not that knowledgeable in this area but is it possible that even though the workload doesn't look like that of real world crates it could still be a useful benchmark? Real crates don't spend 100% of their time doing trait resolution but lots of crates still need to do heavy trait resolution. If we improve this benchmark then those area could get faster. My thinking when I submitted this was that it would give a pure signal on that front. OTOH maybe the workloads here are too extreme to be a good signal and optimizing for this benchmark could be harmful to normal crates.

@nnethercote
Copy link
Contributor

nnethercote commented Aug 7, 2023

Real crates don't spend 100% of their time doing trait resolution but lots of crates still need to do heavy trait resolution. If we improve this benchmark then those area could get faster.

Like I said above:

I have profiles of 1000 of the top crates on crates.io on my machine, and none of them look anything like this profile.

This is a unique program 😄

Equally importantly, there are constraints on the benchmark suite size. More benchmarks means it takes longer to run, and there are more results to interpret on every run. We currently have 19 primary and 23 secondary benchmarks. I already think there are too many secondary benchmarks. We just don't have space for every stress test, unfortunately.

I just tried changing the PredicateObligations type in the compiler from a Vec to a HashSet. That could help speed this up a lot, because most of the time is spent deduplicating very long obligation vectors. (I measured them, the biggest ones have over 250,000 elements; after de-duplicating their size shrinks by 2x or more.) Unfortunately PredicateObligations is used in many places, and working through all of those places is taking some time, and I'm not sure if I'll hit a showstopper along the way.

nnethercote added a commit to nnethercote/rust that referenced this pull request Aug 9, 2023
This gives massive (~7x) compile time and memory usage reductions for
the trait system stress test in
rust-lang/rustc-perf#1680.
@nnethercote
Copy link
Contributor

FWIW, I got about a 7x reduction in compile time and memory usage for this program in rust-lang/rust#114611. There are a couple of secondary benchmarks that also get wins of a couple of percent. And I looked more closely at the top 1000 crates, I see a handful there that might also benefit by 1 or 2%.

@Dragon-Hatcher
Copy link
Author

Really cool!

bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 18, 2023
…piler-errors

Speed up compilation of `type-system-chess`

[`type-system-chess`](rust-lang/rustc-perf#1680) is an unusual program that implements a compile-time chess position solver in the trait system(!)  This PR is about making it compile faster.

r? `@ghost`
@lqd
Copy link
Member

lqd commented Aug 18, 2023

rust-lang/rust#114611 has landed so we can close this PR.

@Dragon-Hatcher
Copy link
Author

Thanks for your work @nnethercote!

github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Aug 21, 2023
Speed up compilation of `type-system-chess`

[`type-system-chess`](rust-lang/rustc-perf#1680) is an unusual program that implements a compile-time chess position solver in the trait system(!)  This PR is about making it compile faster.

r? `@ghost`
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Apr 7, 2024
Speed up compilation of `type-system-chess`

[`type-system-chess`](rust-lang/rustc-perf#1680) is an unusual program that implements a compile-time chess position solver in the trait system(!)  This PR is about making it compile faster.

r? `@ghost`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
Speed up compilation of `type-system-chess`

[`type-system-chess`](rust-lang/rustc-perf#1680) is an unusual program that implements a compile-time chess position solver in the trait system(!)  This PR is about making it compile faster.

r? `@ghost`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants