Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion of defaults settings for build dependencies for fastest compile times #10481

Open
lqd opened this issue Mar 15, 2022 · 8 comments
Open
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Performance Gotta go fast!

Comments

@lqd
Copy link
Member

lqd commented Mar 15, 2022

Problem

There may be new settings and default values that could improve compile times for build dependencies.

Proposed Solution

In particular, turning off debuginfo, and to a lesser extent stripping, and turning off incremental compilation for local build dependencies.

Notes

As discussed in this zulip thread, we've been looking at compile times over a wider range of the crates.io ecosystem.

Looking at cargo schedules, it seems there are quite a few projects where build scripts, proc-macros, and their dependencies are on the critical path, and the reason why I'm opening this thread: to discuss the default settings used to build the "for host" nodes.

There were already great insights about these a few years ago: cargo focuses on fast compile times rather than raw throughput of the built artifacts, leading to the current defaults for build dependencies.

I'd like to specifically discuss these in the dev.build-override profile:

  • -Cdebuginfo=2: is this removable ? I myself have never tried running a build script or proc-macro in a debugger (and I'd suspect the latter to be hindered by the proc macro bridge + rustc execution model) but that would impact backtraces and lineinfo, if panics happen (though I'm not sure how compile errors generated by panicking proc-macros look under a different debuginfo level).
  • -Cstrip=debuginfo: on the targets where the linker is aware of this (rather than stripping after the fact) this can be a small win (e.g. on build scripts). This one could arguably be added to the release profile as well if for some reason we wanted to be as aggressive as possible, but it's probably slightly better but within noise in release builds I've tried.
  • -Cincremental: this looks removable for local build-dependencies. Incremental compilation is a 30%-50% pessimization that is amortized by building things more than once. This is already tracked in this cargo issue about build scripts: it would improve them slightly. They are an interesting node: they usually are small, and absolute improvements would be small. However, they are somewhat slow to build even for simple scripts, and they take up a couple spots in the cargo schedule: when they are on the critical path they can slow things down by serializing subgraphs and delaying dependent units of work (e.g. removing syn's and proc-macro2's simple build scripts can improve from scratch compile times up to 12% even though their own build time is lower than that). This could also improve local proc-macros a bit, but it's not clear cut (or noticeable in my tests below, but I don't believe you could have local proc-macros published on crates.io anyways).

I've tried to validate these in the real world:

  • since this is quite simple to try, I've made a prototype of this
  • did a perf.rlo run to "ensure" there are no surprising regressions (but perf.rlo itself only tracks leaf crates build times in most situations)
  • and benchmarked it locally over our extended set of crates (780 or so of the most popular crates on crates.io), with check and debug builds. There's a readme in there with more info, simple summaries and hyperfine results for each crate. (Note that these benchmarks were done with the slightly more aggressive "symbols" stripping, but it seems more sensible to only strip "debuginfo" in the odd case a panic happens. That's a 1% improvement at best anyways. I have not measured stripping in release builds either)

These wins seem to indeed translate to real world uses. In the cases we're interested in, heavy build dependencies on the critical path, that is quite noticeable: a lot of nice improvements with a tight confidence interval, both for check builds and debug builds. The regressions are smaller in magnitude and with a wide confidence interval: there are a few super short benchmarks (20-30% of all the crates), making variance in timing more impactful in their results (both improvements and regressions). (Note that hyperfine's summaries are about changes over 100%, and these are not a percentage change. For example, snafu's results are "1.24 times faster", and that is a "1 / 1.24" change, around -20% and not -24%. hyperfine will return "2 times faster" to mean -50%). These overall look like an improvement to me. For check builds at j12 there are around 300 out of the 780 crates, with >=3% mean improvement, 175 >= 5%, 115 >= 10%, 50 >= 13%, etc. For debug builds at j8, there are 215 with >= 5% mean improvement.

Possible problems:

  • lack of debuggability ? documentation would be needed to show which settings to choose for maximum debuggability of build dependencies when needed, rather than fastest compile times
  • backtraces and panics ?
  • stripping may be an overhead on platforms where it's done after the fact (osx ?)
  • less reuse when a dependency is shared between a build dependency and a regular dependency: the different defaults will require another compilation. (That already happens today but is opt-in, changing the debuginfo level, panic method, etc)

Alternatives:

  • having debuginfo at level 1 instead of 2 ? Most of the wins come from the lack of debuginfo, maybe level 1 could be a compromise between speed and debuggability.
  • since all these are just default values and easily overridable by users, we could also just document these values without changing the defaults. (I'd probably prefer these defaults as the "pit of success" for the common cases and faster compiles, but could have easily missed important use-cases making this change unacceptable)

I have been working on a PR, and am currently still updating cargo unit tests (especially the ones related to the expectation of reuse, and some debugging is needed in the examples scraping tests).

That means I won't be done before today's t-cargo meeting, and thus opening an issue as @joshtriplett suggested, in case this could be a topic at the meeting (not that it's particularly urgent or anything)

@lqd lqd added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Mar 15, 2022
@bjorn3
Copy link
Member

bjorn3 commented Mar 15, 2022

though I'm not sure how compile errors generated by panicking proc-macros look under a different debuginfo level

-Cdebuginfo=1 emits just the information necessary for backtraces with inlined frames and source locations and without locals debuginfo. Turning off debuginfo completely will cause inlined frames to be omitted and source locations to not be shown.

-Cincremental: this looks removable. Incremental compilation is a 30%-50% pessimization that is amortized by building things more than once.

Incr comp is always disabled for non-local dependencies.

having debuginfo at level 1 instead of 2 ?

This would be my personal preference even if only by slightly over no debuginfo.

@lqd
Copy link
Member Author

lqd commented Mar 15, 2022

Incr comp is always disabled for non-local dependencies.

Indeed I'm specifically talking about local build scripts and proc macros in that paragraph, I've added some clarification.

This one is of course the least impactful and could be avoided.

@joshtriplett
Copy link
Member

joshtriplett commented Mar 15, 2022

We discussed this in today's @rust-lang/cargo meeting.

We were generally in favor of this kind of optimization, as long as it doesn't have an undue impact on people's ongoing development.

We'd like to omit the changes for strip for now, as there are some platforms where that may add more work than others.

For incremental, we'd be interested in the impact this has: how much time does it save in the case where the build script is unmodified, versus how much time it adds to building the build script when modified (e.g. adding one println statement). Given that the former happens far more often than the latter, we'd be willing to accept a noticeable ratio there, but we'd like to know the tradeoff we're making.

For debuginfo, we'd likely be in favor of making that change, but we'd probably also want a message added to cases such as build script failures or panics in proc macros to let people know that they could get better backtraces by turning on debuginfo.

@lqd
Copy link
Member Author

lqd commented Mar 17, 2022

In light of the above feedback (thanks a bunch for that!), I have more precise data and analysis.

I have more numbers in this gist for each of the possible options we discussed above for the build-override profile, on snafu (the most improved crate in the benchmarks), syn (it has a build script and is likely one of the most popular dependencies to be impacted by this change) and ripgrep (because it's a binary and has a build script):

  • baseline
  • debug=1
  • debug=off
  • debug=off, strip=debuginfo
  • debug=off, strip=symbols
  • debug=off, strip=symbols, incremental=false
  • as a bonus: debug=off, strip=symbols, incremental=false, and using LLD (I plan on doing more tests in the future, with cg_clif, mold, etc)

1. debuginfo

I believe potential concerns about debuggability are mostly for build scripts on stable: proc macros currently don't print backtraces by default. Much like debugging macros, you have to use nightly and -Z proc-macro-backtrace (although it also seems to currently print an ICE message suggesting people to open a github issue -- or maybe it does ICE, à la -Z treat-err-as-bug).

So interestingly on stable, turning off debuginfo here, and even strip=symbols, would not change the current behavior whatsoever. It would shrink their size significantly though: a no-op proc macro .so is 7.7MB (turning off debuginfo brings them down to 7.3MB -- that's also their size under the release profile -- strip=debuginfo to 2.2MB, and strip=symbols to 1.8MB).

For build scripts, it's of course the same as regular binaries. If you're not expecting to run them in a debugger, and unless you have many files and functions in your build script, the panic location is often enough. And if you're not stripping, that backtrace will also contain all of libstd's frames.

For debuginfo, we'd likely be in favor of making that change, but we'd probably also want a message added to cases such as build script failures or panics in proc macros to let people know that they could get better backtraces by turning on debuginfo.

I'll be looking into that for cargo and rustc, then.

This would be my personal preference even if only by slightly over no debuginfo.

The above explanations about debuggability, and the fact that it also was an additional 6% faster on snafu, is why my preference is to turn it off by default rather than change it to 1, especially if we have messages suggesting to bump it whenever there are errors.

2. stripping

We'd like to omit the changes for strip for now, as there are some platforms where that may add more work than others.

Agreed completely. It will be interesting to see the effects on osx and windows in the future. As you can see in the gist, it's a small improvement in most cases (or within noise), although on ripgrep it's still noticeable at -1.5%.

Let's table this for now.

3. incremental

Speaking of small improvements sometimes within noise, as you can see in the gist on syn and ripgrep, turning off incremental for local build-scripts is not the greatest of wins. There are no local proc-macros on crates.io so I don't have numbers for this specifically.

For incremental, we'd be interested in the impact this has: how much time does it save in the case where the build script is unmodified, versus how much time it adds to building the build script when modified.

The most popular build scripts are "quick to build (for humans)", in the hundreds of milliseconds (but since that's slow, we've also been looking into that) and multiple times longer than the time they take to run. The time saved is rather small in the absolute, as is the delay when the build script is modified. The wins are usually in lessening the overheads when they're on the critical path (the spots they take in the schedule and the delays they introduce) like the other build dependencies.

Since it's also a small change, let's table it as well for now.


I'll then focus on error messages and debuginfo for now.

Maybe the profiles documentation could still mention these other two settings, as potential wins people could try to have the fastest compile times (and/or smallest build-override binaries; e.g. also in the release.build-override profile if people wanted to have smaller artifacts)). Would that be ok ?

@nnethercote
Copy link
Contributor

I'm not a Cargo expert, but my two cents: if there will a message explaining how to get better backtraces, I suggest going with debuginfo=off. The backtraces you get that way have function names, so they're not terrible. And the speed benefit of going from debuginfo=1 to debuginfo=off is significant.

@lqd
Copy link
Member Author

lqd commented Mar 21, 2022

I've opened a draft PR for this: #10493

@klensy
Copy link
Contributor

klensy commented Aug 25, 2022

Some time ago i submitted similar PR rust-lang/rust#98140 which gives nice improvements (1972mb to 132mb size redution on x86_64-pc-windows-msvc for ui tests) and it literally -Cstrip=debuginfo.

@link2xt
Copy link

link2xt commented Sep 8, 2023

Somewhat related, in my case some dependencies are both build dependencies and actual dependencies. build-overrides result in the same dependency being compiled twice, which does not help with build times: #12645

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Performance Gotta go fast!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants