perf regression between nightly 2023-06-28 and 2023-07-01 #113372

Dirbaio · 2023-07-05T16:36:26Z

There's some performance regression between nightly-2023-06-28 and nightly-2023-07-01.

Compilation times for my company's firmware project went from 13s on nightly-2023-06-28 to 1m 43s on nightly-2023-07-01. This is for "change a single line, then build", not for a cold build.

The project is a single rather big crate making extensive use of async and AFIT, and TAIT through the #[embassy_executor::task] macro to place futures in statics. Unfortunately it's not open source. I also haven't been able to isolate the regression to a single bad pattern of code causing it. It seems all async code contributes to the slowness: as I remove code compilation gets faster, and it seems linearly proportional.

I've taken flamegraphs with -Zself-profile. The extra time is spent in codegen_module -> fn_abi_of_instance -> deduced_param_attrs -> is_freeze_raw -> evaluate_obligation

nightly-2023-06-28, 13s: flamegraph
nightly-2023-07-01, 1m 43s: flamegraph

I do have one open-source project which also seems affected by the regression, but not as egregiously. To reproduce, cd examples/nrf; cargo build --release.

3 attempts of "change 1 line then build":
nightly-2023-06-28: 2.51s 2.85s 3.51s. flamegraph
nightly-2023-07-01: 4.22s 4.10s 4.09s. flamegraph

fn_abi_of_instance takes 14% of the time in the new nightly, but barely none in the old one. So it seems it's the same issue.

Also unfortunately nightly-2023-06-29 and nightly-2023-06-30 have an unrelated (?) bug which makes compilation fail with bogus "impl ?Sized is not a Future" errors, which is #113155 (introduced in #98867, fixed in #113165). I don't think that's the cause of the perf regression, but it prevents bisecting it (it just points me to #98867. Scripting the bisect to fail only on timeout would probably point to #113165, though I haven't tried)

The text was updated successfully, but these errors were encountered:

compiler-errors · 2023-07-05T16:47:07Z

It's probably #113108.

Do you have really deeply nested futures? It's probably that.

Dirbaio · 2023-07-05T16:48:47Z

Do you have really deeply nested futures?

Yes, I do.

compiler-errors · 2023-07-05T16:54:30Z

Even if it is #113108, I don't really know what I can do about this situation without at least being able to see some shared code.

For the record -- that PR doesn't just introduce a regression for no good reason, it fixes real codegen errors that are due to the fact that we weren't actually revealing opaque types during codegen even though we should¹. The problem is that you've probably got extremely deep futures that were hiding behind the (accidental) type erasure of opaque types, and now that we're revealing them, we're doing more work when checking auto traits like Freeze.

But that's just my conjecture for the cause of the regression.

In specific situations that are unfortunately easier to encounter with futures, especially ones that borrow lifetimes ↩

compiler-errors · 2023-07-05T16:58:56Z

~~You may be able to mitigate this regression by introducing boxing in cases where you know that the futures are gonna be super deep, though it's sometimes kinda random where you'd have to introduce it.~~

You're probably working on embedded stuff anyways, so nevermind on the boxing lol

Dirbaio · 2023-07-05T17:00:04Z

is the ekv repro I've linked above not big enough? I've tried to see if I can cook up some artificial example that bloats compile times more, unsuccessfully so far. I'll keep trying.

And yeah I can't Box :(

compiler-errors · 2023-07-05T17:00:57Z

Oh, sorry, didn't see the link. Just saw "company code" and stopped reading 😓

compiler-errors · 2023-07-05T17:03:18Z

Hm.... I don't know if I can reasonably debug a 2s -> 4s compile time regression. I really need code that's doing a significant amount of more work. If you can find something that's more dramatic of a regression than that, I would really appreciate it.

Newer nightlies have a bad perf regression rust-lang/rust#113372

Dirbaio · 2023-07-05T18:06:03Z

Okay, I found a pattern of death: embassy-rs/ekv@53918ac

with nightly-2023-06-28 it builds fine in 2.5s.
with nightly-2023-07-01 it hangs and eats all my RAM.

compiler-errors · 2023-07-05T21:11:49Z

Interesting. I'll look at that PR and see if there's anything that can be done to optimize the post-mono trait solving to be less unfriendly to large futures, but I will note that creating arbitrarily large futures eventually will cause the compiler to slow down...

What I'd really like to understand is how a real-world program ends up with such large futures that this regression becomes meaningful -- but I understand how that may be hard to summarize without sharing too much info about private code 😶

Dirbaio · 2023-07-05T21:40:20Z

I don't think I'm doing anything particularly cursed, just writing async code 😅

The deepest future I have in my firmware is this I think (removing the code for it made the biggest difference in the compile time):

top level task with a loop handling incoming network requests
Some request header is parsed from the socket with a "request ID", a match is done on it to dispatch to the right handler.
One of the request IDs is a "config update", which requires...
Parsing a stream of key+value from the socket, writing it to ekv.
ekv itself has quite a bit of layers with complex logic (transaction, record, file, page, then a Flash async trait. you can see that in that repo)
flash access goes through a Mutex because other parts of the system can use other flash partitions concurrently
flash is implemented using more async in the HAL

so yeah that can easily be 30 nested futures 😓.

It's not particularly cursed though! it's straightforward code that you could also write blocking, just with async and .await in the right places. I don't go out of my way to write things as state machines manually and then wrap them with async to reduce futures depth. I don't want to, I'm using async to not have to do that.

compiler-errors · 2023-07-05T21:42:31Z

Makes sense. I'll see what I can do about it.

compiler-errors · 2023-07-06T16:58:01Z

@KittyBorgX: that tag is usually used to track PRs that are perf regressions with the internal perf testing suite (rust-timer). We have another tag for issues like this: I-slow.

lqd · 2023-07-06T17:04:42Z

@compiler-errors I-compiletime (I-slow is for performance of generated code)

compiler-errors · 2023-07-06T17:05:37Z

Corrected! Should've read the rest of the tag description 😸

KittyBorgX · 2023-07-07T00:45:41Z

@KittyBorgX: that tag is usually used to track PRs that are perf regressions with the internal perf testing suite (rust-timer). We have another tag for issues like this: I-slow.

Ah my bad 😅 I'm still new to this and I'm learning haha :)

compiler-errors · 2023-07-10T15:15:23Z

@Dirbaio: I haven't been able to get around to this yet, but since y'all are using AFIT, I'm not gonna consider this a regression that needs to be reverted on beta or anything like that since you're presumably just pinned to the nightly right before this regression.

If I haven't gotten to it in another week or so, please ping me so I may reconsider reverting the PR that I regressed this in, but just trying to make sure I understand and am able to set a reasonable timeline for this fix.

Dirbaio · 2023-07-10T15:17:29Z

Thanks for the update. If there's anything I can do to help please let me know. For example, would it help if I minimized it to a self-contained .rs file using no libs?

compiler-errors · 2023-07-10T15:19:49Z

That would certainly be cool if you could.

compiler-errors · 2023-07-31T23:32:52Z

@Dirbaio I'm free enough to look at this again -- how do I reproduce that ekv example? Just a cargo build? I don't see a hang, but maybe I'm missing a feature flag or something?

Dirbaio · 2023-08-01T08:11:04Z

git clone https://github.com/embassy-rs/ekv --branch perf-regression-repro
cd ekv/examples/nrf
cargo +nightly build --release

compiler-errors · 2023-08-01T16:12:21Z

ah, nrf is what i needed to be building 🫠

Dirbaio · 2023-08-17T14:38:44Z

Minimized: https://github.com/Dirbaio/perf-regression-repro/blob/main/src/main.rs

A lot of stuff needs to happen for it to reproduce 😅

lto=fat and opt-level=s
~~Future must be stored in a static (I guess this is what causes checking it is Freeze)~~
Future must be initialized via a function taking a closure
Needs lots of nested futures (this is what causes exponential blowup)
Needs a generic async fn
the async fn needs to call an async method on the trait.

The more mainNN() levels you add, it gets exponentially slower until it runs out of memory (after eating 20gb+ of RAM)

30: 0m6.349s
31: 0m10.144s
32: 0m16.839s
33: 0m23.656s
34: 0m43.653s
35: 1m15.742s
36: 1m50.963s
37: killed by OOM

EDIT: turns out putting the future in a static is not needed. it's the closure that causes the hang.

Also, I've found that this repro does hang with nightly-2023-06-28 and older, so it's not a repro of the regression, it's just a repro of a pre-existing hang.

the original non-minimized repro has less nesting levels (20) but more complex code inside, it does hang on 2023-07-01 but works on 2023-06-28 and earlier. There's something about EKV that makes newer nightlies choke harder, but I haven't been able to minimize it yet...

Dirbaio · 2023-08-17T22:46:22Z

I can confirm #114948 fixes the issue! It fixes all the hang repros found, and also the regression on the project I originally found it in (company code)

"change one line" build times on company code:

nightly-2023-06-28: 17s-20s
nightly-2023-08-16: 1m 43s
PR #114948: 22s-24s (EDIT: not entrely apples-to-apples because I haven't built with PGO+LTO+BOLT unlike the published nightlies)

Dirbaio · 2023-08-18T10:18:28Z

🥳

thank you @compiler-errors !

Dirbaio added a commit to embassy-rs/embassy that referenced this issue Jul 5, 2023

Downgrade nightly.

c6cd698

Newer nightlies have a bad perf regression rust-lang/rust#113372

Dirbaio added a commit to embassy-rs/ekv that referenced this issue Jul 5, 2023

repro for rust-lang/rust#113372

53918ac

KittyBorgX added the perf-regression Performance regression. label Jul 6, 2023

compiler-errors added I-slow Issue: Problems and improvements with respect to performance of generated code. and removed perf-regression Performance regression. labels Jul 6, 2023

compiler-errors added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 6, 2023

compiler-errors added the requires-nightly This issue requires a nightly compiler in some way. label Jul 10, 2023

This comment was marked as off-topic.

Sign in to view

compiler-errors self-assigned this Aug 17, 2023

compiler-errors mentioned this issue Aug 17, 2023

Normalize before checking if local is freeze in deduced_param_attrs #114948

Merged

bors closed this as completed in 0f7f6b7 Aug 18, 2023

kpcyrd mentioned this issue Sep 20, 2023

Compile stuck forever with --release and 1.72: yash-syntax #115297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf regression between nightly 2023-06-28 and 2023-07-01 #113372

perf regression between nightly 2023-06-28 and 2023-07-01 #113372

Dirbaio commented Jul 5, 2023 •

edited

Loading

compiler-errors commented Jul 5, 2023

Dirbaio commented Jul 5, 2023

compiler-errors commented Jul 5, 2023 •

edited

Loading

compiler-errors commented Jul 5, 2023 •

edited

Loading

Dirbaio commented Jul 5, 2023

compiler-errors commented Jul 5, 2023

compiler-errors commented Jul 5, 2023

Dirbaio commented Jul 5, 2023

compiler-errors commented Jul 5, 2023

Dirbaio commented Jul 5, 2023 •

edited

Loading

compiler-errors commented Jul 5, 2023

compiler-errors commented Jul 6, 2023

lqd commented Jul 6, 2023

compiler-errors commented Jul 6, 2023 •

edited

Loading

KittyBorgX commented Jul 7, 2023

compiler-errors commented Jul 10, 2023

Dirbaio commented Jul 10, 2023

compiler-errors commented Jul 10, 2023

This comment was marked as off-topic.

compiler-errors commented Jul 31, 2023

Dirbaio commented Aug 1, 2023

compiler-errors commented Aug 1, 2023

Dirbaio commented Aug 17, 2023 •

edited

Loading

Dirbaio commented Aug 17, 2023 •

edited

Loading

Dirbaio commented Aug 18, 2023

perf regression between nightly 2023-06-28 and 2023-07-01 #113372

perf regression between nightly 2023-06-28 and 2023-07-01 #113372

Comments

Dirbaio commented Jul 5, 2023 • edited Loading

compiler-errors commented Jul 5, 2023

Dirbaio commented Jul 5, 2023

compiler-errors commented Jul 5, 2023 • edited Loading

Footnotes

compiler-errors commented Jul 5, 2023 • edited Loading

Dirbaio commented Jul 5, 2023

compiler-errors commented Jul 5, 2023

compiler-errors commented Jul 5, 2023

Dirbaio commented Jul 5, 2023

compiler-errors commented Jul 5, 2023

Dirbaio commented Jul 5, 2023 • edited Loading

compiler-errors commented Jul 5, 2023

compiler-errors commented Jul 6, 2023

lqd commented Jul 6, 2023

compiler-errors commented Jul 6, 2023 • edited Loading

KittyBorgX commented Jul 7, 2023

compiler-errors commented Jul 10, 2023

Dirbaio commented Jul 10, 2023

compiler-errors commented Jul 10, 2023

This comment was marked as off-topic.

compiler-errors commented Jul 31, 2023

Dirbaio commented Aug 1, 2023

compiler-errors commented Aug 1, 2023

Dirbaio commented Aug 17, 2023 • edited Loading

Dirbaio commented Aug 17, 2023 • edited Loading

Dirbaio commented Aug 18, 2023

Dirbaio commented Jul 5, 2023 •

edited

Loading

compiler-errors commented Jul 5, 2023 •

edited

Loading

compiler-errors commented Jul 5, 2023 •

edited

Loading

Dirbaio commented Jul 5, 2023 •

edited

Loading

compiler-errors commented Jul 6, 2023 •

edited

Loading

Dirbaio commented Aug 17, 2023 •

edited

Loading

Dirbaio commented Aug 17, 2023 •

edited

Loading