-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stm327xx-hal benchmark #802
Conversation
lol, now rust-lang/rust#79560 is affecting rustc-perf too 😆 |
Wow, does that mean |
In instruction counts, on a specific benchmark that stresses that part of rustdoc, but yes :) |
Impressive! Congratulations and thanks @bugadani :) |
Well, the idea and basically a review comment for every character belongs to @jyn514 :) |
This should hopefully be working now that |
It does stress the impl block overlap pass of rustc, which is usually not very relevant, but in this instance it is. In the past, I used this crate to locally assess the improvements of my PR: rust-lang/rust#78323 (comment) It would be amazing to have it be part of the benchmark suite. |
AFAIK that's a different kine of STM devices. F4 and H7 series are significantly different. Probably comparably horrific, though. |
@bugadani yeah they are different devices but I think their crates are structured more or less the same, so cause the compiler to run into the equal situation. IDK why I've taken stm32f4 instead of one of the other devices but I don't think it matters much. |
It probably matters little, which is used. But the embedded world will be happy for any improvements that affect these (and a lot more device support crates, actually)! |
Indeed, I don't really care which crate is added, as long as it has tons of inherent impl blocks on the same type :). |
@Mark-Simulacrum I think this is ready to go, is it waiting on anything from me? Or did you want to add diesel instead (#807) instead of adding both or just stm32? |
This is adding 10 minutes to the check CI builder - from 20 minutes to 30 with this PR. Some of that is likely noise, but this is too large a delta for the perf server; I don't know how easy it would be to cut down the benchmark here but as-is I cannot justify spending that much time on just one crate on the perf server. |
@adamgreig do you have suggestions for making this smaller while still stressing a large number of blanket impls? Right now I'm only passing |
Do you mean By the way, it's possible to generate the crate all in a single mod.rs file instead of one file per module, if that would be useful for this PR. Edit: another option is to use a smaller crate in the same family, e.g. stm32f4 or stm32f0, which use the same code generator but have fewer items. I expect they'd still stress the large number of blanket impls, though, unless that's coming from something especially pathological in stm32h7. |
Thanks! I was passing |
Ah, sorry, I've misread and thought this was building stm32h7 rather than stm32h7xx-hal and my suggestions were in that direction. Not sure what you could do to cut down stm32h7xx-hal build times, but maybe @richardeoin will have ideas? I assume the slow documentation is in stm32h7xx-hal rather than stm32h7 itself? |
On my fastest machine stm32h7 takes 35 seconds in --release mode and stm32h743 takes 15.6 (for comparison, syn takes 5 seconds). The difference between stm32h743 and 742 is negligible. Rustdoc on the other hand takes 40 seconds for stm32h7 and a whopping 96.4 for stm32h743. That makes me hopeful that we can fix this just by merging rust-lang/rust#77700. |
Those times are still likely not tenable for perf, FWIW - syn takes over a minute, and we're not really in a position to add another 3-4 minutes to our duration right now. |
The |
@est31 it sounds like you've already done a lot of work in this area, are you interested in taking the PR over? |
@jyn514 sure can do that |
@jyn514 as simulacrum doesn't want to merge things that add a minute, maybe instead of this PR we add a simple stress test? |
One minute is fine. Adding 3+ minutes is not currently tenable :) |
A stress test sounds fine to me :) I just want to make sure this is measured somehow. |
@est31 Any progress? If we wanted to add a full compile test instead of a stress-test we could also use a slightly simpler HAL impl than F7, I'm already using https://github.com/stm32-rs/stm32f0xx-hal to do compile time and code size (which would be a cool feature to add to rustc-perf, too) comparisons. |
On crates with many blanket impls, such as `stm32`, this inner closure is executed many hundreds of thousands of times. This makes it very slightly faster. Unfortunately, I don't have a good test case for showing that it's faster until rust-lang/rustc-perf#802 is merged.
Before: ``` collector error: Failed to benchmark 'stm32f4', recorded: failed to obtain pkgid in '"/tmp/.tmpLF63db"' ``` After: ``` collector error: Failed to benchmark 'stm32f4', recorded: failed to obtain pkgid in '"/tmp/.tmpAmktlm"': expected success, got exit code: 101 stderr=error: a Cargo.lock must exist for this command stdout= ```
This confuses rustc-fake, which expects to only be run when profiled (on the code being timed, not on dependencies).
@Mark-Simulacrum I changed this to use stm32f4, which takes 18 seconds on a doc run and 106 seconds overall. Is that fast enough? |
Oh, forgot to copy/paste #807 (comment) here - I'm hoping to land diesel first most likely, but largely land these two benchmarks and evaluate based on the metrics we're now collecting where things stand (and either back things out or not, depending on the effects on queue depth). |
Merging this for now, and we can evaluate whether to keep it in depending on how the queue does. So far the addition of diesel looks to have been a minor increase in overlap, leading to slightly longer queues, but my recollection of queueing theory suggests that there's going to be a steep cliff rather than gradually getting worse. |
rustdoc: Don't enter an infer_ctxt in get_blanket_impls for impls that aren't blanket impls Less broken version of rust-lang#82856. get_blanket_impls is a *very* hot region of rustdoc, so even small changes like this should help. Unfortunately I don't have benchmarks for this until rust-lang/rustc-perf#802 is merged.
See rust-lang/rust#77700 (comment), rust-lang/rust#78761. This is mostly a rustdoc benchmark but I wouldn't be surprised if it stresses rustc too.