rustc: Implement ThinLTO #44841

alexcrichton · 2017-09-25T16:47:28Z

This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows incremental LTO
which should enable even faster release builds without compromising on the
performance we have today.

This commit uses a -Z thinlto flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:

In one mode we'll only perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.

There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:

Controlling parallelism means we can use the existing jobserver support to
avoid overloading machines.
We will likely want a slightly different form of incremental caching which
integrates with our own incremental strategy, but this is yet to be
determined.
This buys us some flexibility about when/where we run ThinLTO, as well as
having it tailored to fit our needs for the time being.
Finally this allows us to reuse some artifacts such as our TargetMachine
creation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.

My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!

rust-highfive · 2017-09-25T16:47:31Z

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2017-09-25T16:50:03Z

r? @michaelwoerister

Note that the first commit here is from #44783, so shouldn't need extra review, the thin support should all be in the second. Example output of the modified time_graph module is here (that's a thin LTO measurement for Cargo), and I've also collected some various data about ThinLTO, codegen units, compiletime, and runtime.

alexcrichton · 2017-09-25T16:51:56Z

src/librustc/session/config.rs

-            if codegen_unit_name.contains(NUMBERED_CODEGEN_UNIT_MARKER) {
-                // If we use the numbered naming scheme for modules, we don't want
-                // the files to look like <crate-name><extra>.<crate-name>.<index>.<ext>
-                // but simply <crate-name><extra>.<index>.<ext>


@michaelwoerister mind helping me understand what was going on here? For ThinLTO we need to make sure that all the objects and such have unique names, which this comment seems to indicate we will have achieved. (although in practice I didn't see crate name hashes and such in those names)

With this name munging left in though I found that lots of objects were overwriting one another by accident, because I think the backend of shuffling files around was "getting weird". I couldn't find a downside to removing this logic, though, so I was curious if you knew what this was originally added for?

This logic looks like a measure for keeping filenames/paths short. We could probably just get rid of the "numbered" codegen unit naming scheme (incremental does not use that). That was only introduced to stay close to the then existing behavior. That would have the downside of codegen units having somewhat misleading names but they should be unique.

michaelwoerister · 2017-09-25T17:54:45Z

That's a nice surprise :)

The compilation speedups look great. How many cores does the machine you're testing on have?
Do you know how runtime speeds compare for debug builds with different # of codegen units? Or asked differently, is there any real downside to using multiple codegen units for debug builds?
Runtime performance is worse, in some cases significantly so it seems. It think this is acceptable but we might want to think about how we introduce this feature, what recommendations we give, etc.
Increasing the number of codegen units seems to not have a detrimental effect on runtime performance. This is great for peak memory consumption (because we can split work into smaller chunks) and also great for incremental compilation.
This is awesome 🎉

cc @rust-lang/compiler

alexcrichton · 2017-09-25T19:13:15Z

How many cores does the machine you're testing on have?

Oops an excellent question! I had 8 cores.

Or asked differently, is there any real downside to using multiple codegen units for debug builds?

AFAIK no, I realized when doing all this that we should just turn this on by default. I was gonna do that in a separate PR orthogonal to this though.

Runtime performance is worse, in some cases significantly so it seems.

Hm sort of! I think it's a lot better than it looks though. For example, here's a comparison between 1 and 16 codegen units, without ThinLTO enabled (of the regex benchmark suite, threshold 5% regression at least). And then here's the same comparison when those 16 CGUs are compiled with ThinLTO. Notably nearly every benchmark regesses (sometimes by 10x) with just vanilla codegen units, whereas with ThinLTO the worst regression is 18ns/iter -> 27ns/iter, and ThinLTO even improved the performance in one case!

In other words, my conclusion is that runtime performance is "basically the same" modulo compiler wizardry details. My guess is that any possible performance loss could be easily fixed by #[inline] if truly necessary, but in most cases I doubt this'll be necessary. AFAIK ThinLTO is supposed to be the same as LTO perf-wise on the LLVM layer, so it may just continue to evolve at that layer of implementation as well.

Overally though I definitely wouldn't classify it as overall a significant performance loss, only a very minor performance loss in some esoteric situations, at best. From what I've seen it does everything you'd expect it to do across the board. Then again, that's why this is unstable to start out with :)

bors · 2017-09-25T20:39:37Z

☔ The latest upstream changes (presumably #44085) made this pull request unmergeable. Please resolve the merge conflicts.

hanna-kruppe · 2017-09-26T09:43:54Z

Amazing!!

On the subject of performance, note that ThinLTO has historically been missing some optimizations of classical LTO (just due to them not being implemented). One class of such optimizations specifically called out on LLVM's "open projects" page is "propagating more global informations across the program". I assume upstream has made progress on this since LLVM 4.0, so maybe we'll see improved performance just by updating to LLVM 5.0?

This commit changes the default of rustc to use 32 codegen units when compiling in debug mode, typically an opt-level=0 compilation. Since their inception codegen units have matured quite a bit, gaining features such as: * Parallel translation and codegen enabling codegen units to get worked on even more quickly. * Deterministic and reliable partitioning through the same infrastructure as incremental compilation. * Global rate limiting through the `jobserver` crate to avoid overloading the system. The largest benefit of codegen units has forever been faster compilation through parallel processing of modules on the LLVM side of things, using all the cores available on build machines that typically have many available. Some downsides have been fixed through the features above, but the major downside remaining is that using codegen units reduces opportunities for inlining and optimization. This, however, doesn't matter much during debug builds! In this commit the default number of codegen units for debug builds has been raised from 1 to 32. This should enable most `cargo build` compiles that are bottlenecked on translation and/or code generation to immediately see speedups through parallelization on available cores. Work is being done to *always* enable multiple codegen units (and therefore parallel codegen) but it requires rust-lang#44841 at least to be landed and stabilized, but stay tuned if you're interested in that aspect!

mersinvald · 2017-09-26T15:53:41Z

@alexcrichton I've just ran some build time benchmarks on my project that uses a lot of popular rust libraries and codegen (diesel, hyper, serde, tokio, futures, reqwest) on my Intel Core i5 laptop (skylake 2c/4t) and got these results:

1 unit:    92 secs
2 units:   81 secs
4 units:   83 secs
8 units:   85 secs
16 units:  90 secs
32 units:  102 secs

Cargo profile:

# The development profile, used for `cargo build`.
[profile.dev]
opt-level = 0
debug = true 
lto = false
debug-assertions = true 
codegen-units = 4

rustc 1.22.0-nightly (17f56c5 2017-09-21)

As expected, the best results is for codegen units of number of system threads, and 32 is way to much for an average machine.

Did you concider an option to select number of codegen units depending on number of cpus, with num_cpus crate?

Thank you for working on compile times!

alexcrichton · 2017-09-26T15:57:42Z

@mersinvald very interesting! Did you mean to comment on #44853 though? If so, maybe we can continue over there?

mersinvald · 2017-09-26T16:02:55Z

@alexcrichton ok, sorry :)

sgrif · 2017-09-26T23:00:47Z

@alexcrichton FYI if you were trying to convince us that you weren't a robot, this and #44853 back-to-back aren't helping. ;)

…rister rustc: Default 32 codegen units at O0 This commit changes the default of rustc to use 32 codegen units when compiling in debug mode, typically an opt-level=0 compilation. Since their inception codegen units have matured quite a bit, gaining features such as: * Parallel translation and codegen enabling codegen units to get worked on even more quickly. * Deterministic and reliable partitioning through the same infrastructure as incremental compilation. * Global rate limiting through the `jobserver` crate to avoid overloading the system. The largest benefit of codegen units has forever been faster compilation through parallel processing of modules on the LLVM side of things, using all the cores available on build machines that typically have many available. Some downsides have been fixed through the features above, but the major downside remaining is that using codegen units reduces opportunities for inlining and optimization. This, however, doesn't matter much during debug builds! In this commit the default number of codegen units for debug builds has been raised from 1 to 32. This should enable most `cargo build` compiles that are bottlenecked on translation and/or code generation to immediately see speedups through parallelization on available cores. Work is being done to *always* enable multiple codegen units (and therefore parallel codegen) but it requires #44841 at least to be landed and stabilized, but stay tuned if you're interested in that aspect!

bors · 2017-09-29T12:58:58Z

☔ The latest upstream changes (presumably #44853) made this pull request unmergeable. Please resolve the merge conflicts.

alexcrichton · 2017-09-30T22:45:55Z

Rebased and should be ready for review!

r? @michaelwoerister

bors · 2017-10-06T02:12:20Z

⌛ Testing commit 5f66c99efd92eef321dbe007e5a71b7b757c8f8b with merge f8577605e7e59dd1354533f9465a365cc4cb814d...

bors · 2017-10-06T03:16:10Z

💔 Test failed - status-travis

kennytm · 2017-10-06T05:18:18Z

Multiple incremental tests on x86_64-gnu ICE'd.

...
[00:58:39] ---- [incremental] incremental/spike.rs stdout ----
[00:58:39] 	
[00:58:39] error in revision `rpass2`: compilation failed!
[00:58:39] status: exit code: 101
[00:58:39] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/incremental/spike.rs" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental" "--target=x86_64-unknown-linux-gnu" "--cfg" "rpass2" "-Z" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/spike.inc" "--error-format" "json" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/spike.stage2-x86_64-unknown-linux-gnu" "-Crpath" "-O" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Z" "query-dep-graph" "-Z" "query-dep-graph" "-Zincremental-info" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/spike.stage2-x86_64-unknown-linux-gnu.incremental.libaux"
[00:58:39] stdout:
[00:58:39] ------------------------------------------
[00:58:39] 
[00:58:39] ------------------------------------------
[00:58:39] stderr:
[00:58:39] ------------------------------------------
[00:58:39] error: internal compiler error: unexpected panic
[00:58:39] 
[00:58:39] note: the compiler unexpectedly panicked. this is a bug.
[00:58:39] 
[00:58:39] note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
[00:58:39] 
[00:58:39] note: rustc 1.22.0-dev running on x86_64-unknown-linux-gnu
[00:58:39] 
[00:58:39] incremental: session directory: 6 files hard-linked
[00:58:39] incremental: session directory: 0 files copied
[00:58:39] thread 'rustc' panicked at 'no entry found for key', /checkout/src/libcore/option.rs:839:4
[00:58:39] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:58:39] 
[00:58:39] 
[00:58:39] ------------------------------------------
[00:58:39] 
[00:58:39] thread '[incremental] incremental/spike.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2433:8
[00:58:39] 
[00:58:39] 
[00:58:39] failures:
[00:58:39]     [incremental] incremental/add_private_fn_at_krate_root_cc/struct_point.rs
[00:58:39]     [incremental] incremental/cache_file_headers.rs
[00:58:39]     [incremental] incremental/change_add_field/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_fn/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_fn_cc/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_impl_method/struct_point.rs
[00:58:39]     [incremental] incremental/change_private_impl_method_cc/struct_point.rs
[00:58:39]     [incremental] incremental/change_pub_inherent_method_body/struct_point.rs
[00:58:39]     [incremental] incremental/change_pub_inherent_method_sig/struct_point.rs
[00:58:39]     [incremental] incremental/change_symbol_export_status.rs
[00:58:39]     [incremental] incremental/commandline-args.rs
[00:58:39]     [incremental] incremental/issue-35593.rs
[00:58:39]     [incremental] incremental/issue-38222.rs
[00:58:39]     [incremental] incremental/krate-inherent.rs
[00:58:39]     [incremental] incremental/krate-inlined.rs
[00:58:39]     [incremental] incremental/remapped_paths_cc/main.rs
[00:58:39]     [incremental] incremental/remove-private-item-cross-crate/main.rs
[00:58:39]     [incremental] incremental/spans_in_type_debuginfo.rs
[00:58:39]     [incremental] incremental/spike.rs
[00:58:39] 
[00:58:39] test result: �[31mFAILED�(B�[m. 59 passed; 19 failed; 0 ignored; 0 measured; 0 filtered out

alexcrichton · 2017-10-06T18:50:51Z

@bors: r=michaelwoerister

bors · 2017-10-06T18:50:52Z

📌 Commit 1cb0c99 has been approved by michaelwoerister

bors · 2017-10-07T08:03:21Z

⌛ Testing commit 1cb0c99d055072c9dfcaf922b26fc66f25cbbb43 with merge 54b42bae05a52565bcf4ebb2780d7f94deae900d...

bors · 2017-10-07T09:36:08Z

💔 Test failed - status-travis

kennytm · 2017-10-07T09:43:38Z

src/test/run-pass/thin-lto-inlines.rs

+        let foo = foo as usize as *const u8;
+        let bar = bar::bar as usize as *const u8;
+
+        assert_eq!(*foo, *bar);


asm.js failed on this line...

[01:25:14] failures: [01:25:14] [01:25:14] ---- [run-pass] run-pass/thin-lto-inlines.rs stdout ---- [01:25:14] [01:25:14] error: test run failed! [01:25:14] status: exit code: 101 [01:25:14] command: "/emsdk-portable/node/4.1.1_64bit/bin/node" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/thin-lto-inlines.stage2-asmjs-unknown-emscripten.js" [01:25:14] stdout: [01:25:14] ------------------------------------------ [01:25:14] 3 3 [01:25:14] [01:25:14] ------------------------------------------ [01:25:14] stderr: [01:25:14] ------------------------------------------ [01:25:14] thread 'main' panicked at 'assertion failed: `(left == right)` [01:25:14] left: `115`, [01:25:14] right: `99`', /checkout/src/test/run-pass/thin-lto-inlines.rs:36:8 [01:25:14] note: Run with `RUST_BACKTRACE=1` for a backtrace. [01:25:14] [01:25:14] ------------------------------------------ [01:25:14] [01:25:14] thread '[run-pass] run-pass/thin-lto-inlines.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2433:8 [01:25:14] note: Run with `RUST_BACKTRACE=1` for a backtrace.

This commit is an implementation of LLVM's ThinLTO for consumption in rustc itself. Currently today LTO works by merging all relevant LLVM modules into one and then running optimization passes. "Thin" LTO operates differently by having more sharded work and allowing parallelism opportunities between optimizing codegen units. Further down the road Thin LTO also allows *incremental* LTO which should enable even faster release builds without compromising on the performance we have today. This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then also implements two forms of ThinLTO: * In one mode we'll *only* perform ThinLTO over the codegen units produced in a single compilation. That is, we won't load upstream rlibs, but we'll instead just perform ThinLTO amongst all codegen units produced by the compiler for the local crate. This is intended to emulate a desired end point where we have codegen units turned on by default for all crates and ThinLTO allows us to do this without performance loss. * In anther mode, like full LTO today, we'll optimize all upstream dependencies in "thin" mode. Unlike today, however, this LTO step is fully parallelized so should finish much more quickly. There's a good bit of comments about what the implementation is doing and where it came from, but the tl;dr; is that currently most of the support here is copied from upstream LLVM. This code duplication is done for a number of reasons: * Controlling parallelism means we can use the existing jobserver support to avoid overloading machines. * We will likely want a slightly different form of incremental caching which integrates with our own incremental strategy, but this is yet to be determined. * This buys us some flexibility about when/where we run ThinLTO, as well as having it tailored to fit our needs for the time being. * Finally this allows us to reuse some artifacts such as our `TargetMachine` creation, where all our options we used today aren't necessarily supported by upstream LLVM yet. My hope is that we can get some experience with this copy/paste in tree and then eventually upstream some work to LLVM itself to avoid the duplication while still ensuring our needs are met. Otherwise I fear that maintaining these bindings may be quite costly over the years with LLVM updates!

alexcrichton · 2017-10-07T15:18:05Z

@bors: r=michaelwoerister

bors · 2017-10-07T15:18:06Z

📌 Commit 4ca1b19 has been approved by michaelwoerister

bors · 2017-10-07T18:11:23Z

⌛ Testing commit 4ca1b19 with merge 33f1f8654b50e49db838b64e76c4af59bc55ddb5...

bors · 2017-10-07T19:39:27Z

💔 Test failed - status-appveyor

retep998 · 2017-10-07T22:04:29Z

failures:
---- [run-make] run-make\issue-26092 stdout ----
	
error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
PATH="/c/projects/rust/build/x86_64-pc-windows-gnu/test/run-make/issue-26092.stage2-x86_64-pc-windows-gnu:C:\projects\rust\build\x86_64-pc-windows-gnu\stage2\bin:/c/projects/rust/build/x86_64-pc-windows-gnu/stage0-tools/x86_64-pc-windows-gnu/release/deps:/c/projects/rust/build/x86_64-pc-windows-gnu/stage0-sysroot/lib/rustlib/x86_64-pc-windows-gnu/lib:/c/Program Files (x86)/Inno Setup 5:/c/Python27:/c/projects/rust/mingw64/bin:/usr/bin:/c/Perl/site/bin:/c/Perl/bin:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/Program Files/7-Zip:/c/Program Files/Microsoft/Web Platform Installer:/c/Tools/GitVersion:/c/Tools/PsTools:/c/Program Files/Git LFS:/c/Program Files (x86)/Subversion/bin:/c/Program Files/Microsoft SQL Server/120/Tools/Binn:/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/110/Tools/Binn:/c/Program Files (x86)/Microsoft SQL Server/120/Tools/Binn:/c/Program Files/Microsoft SQL Server/120/DTS/Binn:/c/Program Files (x86)/Microsoft SQL Server/120/Tools/Binn/ManagementStudio:/c/Tools/WebDriver:/c/Program Files (x86)/Microsoft SDKs/TypeScript/1.4:/c/Program Files (x86)/Microsoft Visual Studio 12.0/Common7/IDE/PrivateAssemblies:/c/Program Files (x86)/Microsoft SDKs/Azure/CLI/wbin:/c/Ruby193/bin:/c/Tools/NUnit/bin:/c/Tools/xUnit:/c/Tools/MSpec:/c/Tools/Coverity/bin:/c/Program Files (x86)/CMake/bin:/c/go/bin:/c/Program Files/Java/jdk1.8.0/bin:/c/Python27:/c/Program Files/nodejs:/c/Program Files (x86)/iojs:/c/Program Files/iojs:/c/Users/appveyor/AppData/Roaming/npm:/c/Program Files/Microsoft SQL Server/130/Tools/Binn:/c/Program Files (x86)/MSBuild/14.0/Bin:/c/Tools/NuGet:/c/Program Files (x86)/Microsoft Visual Studio 14.0/Common7/IDE/CommonExtensions/Microsoft/TestWindow:/c/Program Files/Microsoft DNX/Dnvm:/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/130/Tools/Binn:/c/Program Files (x86)/Microsoft SQL Server/130/Tools/Binn:/c/Program Files (x86)/Microsoft SQL Server/130/DTS/Binn:/c/Program Files/Microsoft SQL Server/130/DTS/Binn:/c/Program Files (x86)/Microsoft SQL Server/110/DTS/Binn:/c/Program Files (x86)/Microsoft SQL Server/120/DTS/Binn:/c/Program Files (x86)/Apache/Maven/bin:/c/Python27/Scripts:/c/Tools/NUnit3:/c/Program Files/Mercurial:/c/Program Files/LLVM/bin:/c/Program Files/dotnet:/c/Program Files/erl8.3/bin:/c/Tools/curl/bin:/c/Program Files/Amazon/AWSCLI:/c/Program Files (x86)/Microsoft SQL Server/140/DTS/Binn:/c/Program Files (x86)/Microsoft Visual Studio 14.0/Common7/IDE/Extensions/Microsoft/SQLDB/DAC/140:/c/Program Files (x86)/Yarn/bin:/c/Program Files/Git/cmd:/c/Program Files/Git/usr/bin:/c/ProgramData/chocolatey/bin:/c/Tools/vcpkg:/c/Program Files (x86)/nodejs:/c/Program Files/Microsoft Service Fabric/bin/Fabric/Fabric.Code:/c/Program Files/Microsoft SDKs/Service Fabric/Tools/ServiceFabricLocalClusterManager:/c/Users/appveyor/AppData/Local/Yarn/bin:/c/Users/appveyor/AppData/Roaming/npm:/c/Program Files/AppVeyor/BuildAgent:/c/projects/rust:/c/projects/rust/handle" 'C:\projects\rust\build\x86_64-pc-windows-gnu\stage2\bin\rustc.exe' --out-dir /c/projects/rust/build/x86_64-pc-windows-gnu/test/run-make/issue-26092.stage2-x86_64-pc-windows-gnu -L /c/projects/rust/build/x86_64-pc-windows-gnu/test/run-make/issue-26092.stage2-x86_64-pc-windows-gnu  -o "" blank.rs 2>&1 | \
		grep -i 'No such file or directory'
------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:4: all] Error 1
------------------------------------------
thread '[run-make] run-make\issue-26092' panicked at 'explicit panic', src\tools\compiletest\src\runtest.rs:2433:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
failures:
    [run-make] run-make\issue-26092
test result: FAILED. 160 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

petrochenkov · 2017-10-07T22:13:52Z

Spurious. (#43402)
@bors retry

bors · 2017-10-07T22:18:26Z

⌛ Testing commit 4ca1b19 with merge ac76206...

rustc: Implement ThinLTO This commit is an implementation of LLVM's ThinLTO for consumption in rustc itself. Currently today LTO works by merging all relevant LLVM modules into one and then running optimization passes. "Thin" LTO operates differently by having more sharded work and allowing parallelism opportunities between optimizing codegen units. Further down the road Thin LTO also allows *incremental* LTO which should enable even faster release builds without compromising on the performance we have today. This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then also implements two forms of ThinLTO: * In one mode we'll *only* perform ThinLTO over the codegen units produced in a single compilation. That is, we won't load upstream rlibs, but we'll instead just perform ThinLTO amongst all codegen units produced by the compiler for the local crate. This is intended to emulate a desired end point where we have codegen units turned on by default for all crates and ThinLTO allows us to do this without performance loss. * In anther mode, like full LTO today, we'll optimize all upstream dependencies in "thin" mode. Unlike today, however, this LTO step is fully parallelized so should finish much more quickly. There's a good bit of comments about what the implementation is doing and where it came from, but the tl;dr; is that currently most of the support here is copied from upstream LLVM. This code duplication is done for a number of reasons: * Controlling parallelism means we can use the existing jobserver support to avoid overloading machines. * We will likely want a slightly different form of incremental caching which integrates with our own incremental strategy, but this is yet to be determined. * This buys us some flexibility about when/where we run ThinLTO, as well as having it tailored to fit our needs for the time being. * Finally this allows us to reuse some artifacts such as our `TargetMachine` creation, where all our options we used today aren't necessarily supported by upstream LLVM yet. My hope is that we can get some experience with this copy/paste in tree and then eventually upstream some work to LLVM itself to avoid the duplication while still ensuring our needs are met. Otherwise I fear that maintaining these bindings may be quite costly over the years with LLVM updates!

bors · 2017-10-08T00:40:54Z

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing ac76206 to master...

michaelwoerister · 2017-10-09T07:45:38Z

🎉 🎉 🎉 🎉 🎉

rust-highfive assigned arielb1 Sep 25, 2017

rust-highfive assigned michaelwoerister and unassigned arielb1 Sep 25, 2017

alexcrichton commented Sep 25, 2017

View reviewed changes

alexcrichton mentioned this pull request Sep 25, 2017

rustc: Default 32 codegen units at O0 #44853

Merged

arielb1 added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 26, 2017

alexcrichton force-pushed the thinlto branch 3 times, most recently from d8ae4f8 to d3b0c49 Compare September 26, 2017 21:29

alexcrichton force-pushed the thinlto branch from d3b0c49 to 51e37aa Compare September 30, 2017 22:43

alexcrichton added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 30, 2017

alexcrichton force-pushed the thinlto branch from 5f66c99 to 1cb0c99 Compare October 6, 2017 18:50

kennytm reviewed Oct 7, 2017

View reviewed changes

alexcrichton force-pushed the thinlto branch from 1cb0c99 to 4ca1b19 Compare October 7, 2017 15:17

bors merged commit 4ca1b19 into rust-lang:master Oct 8, 2017

bors mentioned this pull request Oct 8, 2017

rustc: Don't inline in CGUs at -O0 #45075

Merged

alexcrichton deleted the thinlto branch October 8, 2017 02:08

petrochenkov mentioned this pull request Oct 8, 2017

Some run-pass tests fail after recent LTO/CGU changes #45103

Closed

alexcrichton mentioned this pull request Oct 16, 2017

Tracking issue for enabling multiple CGUs in release mode by default #45320

Closed

11 tasks

kennytm mentioned this pull request Apr 5, 2018

ci: Remove x86_64-gnu-incremental builder #49674

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc: Implement ThinLTO #44841

rustc: Implement ThinLTO #44841

alexcrichton commented Sep 25, 2017

rust-highfive commented Sep 25, 2017

alexcrichton commented Sep 25, 2017

alexcrichton Sep 25, 2017

michaelwoerister Sep 25, 2017

michaelwoerister commented Sep 25, 2017

alexcrichton commented Sep 25, 2017

bors commented Sep 25, 2017

hanna-kruppe commented Sep 26, 2017 •

edited

Loading

mersinvald commented Sep 26, 2017

alexcrichton commented Sep 26, 2017

mersinvald commented Sep 26, 2017 •

edited

Loading

sgrif commented Sep 26, 2017

bors commented Sep 29, 2017

alexcrichton commented Sep 30, 2017

bors commented Oct 6, 2017

bors commented Oct 6, 2017

kennytm commented Oct 6, 2017

alexcrichton commented Oct 6, 2017

bors commented Oct 6, 2017

bors commented Oct 7, 2017

bors commented Oct 7, 2017

kennytm Oct 7, 2017

alexcrichton commented Oct 7, 2017

bors commented Oct 7, 2017

bors commented Oct 7, 2017

bors commented Oct 7, 2017

retep998 commented Oct 7, 2017

petrochenkov commented Oct 7, 2017

bors commented Oct 7, 2017

bors commented Oct 8, 2017

michaelwoerister commented Oct 9, 2017

rustc: Implement ThinLTO #44841

rustc: Implement ThinLTO #44841

Conversation

alexcrichton commented Sep 25, 2017

rust-highfive commented Sep 25, 2017

alexcrichton commented Sep 25, 2017

alexcrichton Sep 25, 2017

Choose a reason for hiding this comment

michaelwoerister Sep 25, 2017

Choose a reason for hiding this comment

michaelwoerister commented Sep 25, 2017

alexcrichton commented Sep 25, 2017

bors commented Sep 25, 2017

hanna-kruppe commented Sep 26, 2017 • edited Loading

mersinvald commented Sep 26, 2017

alexcrichton commented Sep 26, 2017

mersinvald commented Sep 26, 2017 • edited Loading

sgrif commented Sep 26, 2017

bors commented Sep 29, 2017

alexcrichton commented Sep 30, 2017

bors commented Oct 6, 2017

bors commented Oct 6, 2017

kennytm commented Oct 6, 2017

alexcrichton commented Oct 6, 2017

bors commented Oct 6, 2017

bors commented Oct 7, 2017

bors commented Oct 7, 2017

kennytm Oct 7, 2017

Choose a reason for hiding this comment

alexcrichton commented Oct 7, 2017

bors commented Oct 7, 2017

bors commented Oct 7, 2017

bors commented Oct 7, 2017

retep998 commented Oct 7, 2017

petrochenkov commented Oct 7, 2017

bors commented Oct 7, 2017

bors commented Oct 8, 2017

michaelwoerister commented Oct 9, 2017

hanna-kruppe commented Sep 26, 2017 •

edited

Loading

mersinvald commented Sep 26, 2017 •

edited

Loading