Some DCE issues with ThinLTO leaving in construction of unused parameters #47409

gamozolabs · 2018-01-13T09:46:57Z

Since ThinLTO became default in Rust my bootloader went from 17 KiB to about 42 KiB in size, which puts it over the 32 KiB limit. To get my bootloader to fit in this limit I've been able to strip out all of core::fmt by not having any parameters to my panic_fmt routine and this allows LTO to effectively delete all construction of the core::fmt parameters to panic_fmt and thus save huge amounts of code.

Since ThinLTO became default it seems that certain functions using volatile memory or inline assembly struggle to be optimized fully, however this is all I've found that replicates the issue, perhaps more common use of Rust could also trigger it.

For the tiny test case (included at the end of this issue) historically (from at least 2016 to November 2017) produced a .text section of ~100 bytes, and a total virtual program size of ~500 bytes. Since ThinLTO became default this tiny program grows to ~2400 bytes of .text and ~3200 bytes overall.

While this 2 KiB increase seems small, keep in mind it is for an application that is trivial and only has one thing using core::fmt (the bounds check panic on the array index). When a program grows and more panic come about these changes to be a much larger increase, in my case over 20 KiB.

While I'm writing an RFC to address this issue more permanently by proposing an optional (developer opt-in) panic routine which does not rely on core::fmt (probably just a &'static str for the filename and u32 for the line for bare bones debugging where nothing more can be afforded), I wanted to open this issue just in case it's a more fundamental optimization issue that could be fixed.

The core::intrinsics::abort() is critical in the panic_fmt for the code size increase to occur. Other things that work here as well are statements like asm!("nop") or really any assembly statement. Further, assembly statements that are included in functions that are called from panic_fmt also cause the optimization problem to occur. This makes workarounds impossible in my case as my panic dumps to a serial port which ultimately requires inline assembly to access. So for my bootloader I have resorted to using an old version of Rust.

What I find particularly strange is that while I've defined my panic_fmt to not take any parameters this doesn't seem to be a big enough hint to Rust/LLVM that they are not used and thus all construction of these parameters can be DCE'd.

For a size listing of this test program using all nightly builds of Rust in the past 1.5 years see: https://gist.github.com/gamozolabs/d72c05e2b200f02397f1495cf33c2674

We can see from this that the size increase occured between rustc 1.24.0-nightly (f9b0897c5 2017-12-02) and rustc 1.24.0-nightly (1956d5535 2017-12-03) which happens to be when ThinLTO went default.

I've tried a few different tests using various panic_fmt declarations, with parameters, without, as extern, as pub, etc, and they seem to all be affected by this issue.

Here's an example basic piece of code to be built with nightly-x86_64-pc-windows-msvc using args -C lto -C panic=abort -O:

Here is the disassembly difference between this same code but one with ThinLTO and one with full LTO: https://gist.github.com/gamozolabs/41b2059a55d5e0b60d7251dd8a31c055

#![no_std]
#![feature(lang_items, start, core_intrinsics)]

#[lang = "panic_fmt"]
#[no_mangle]
pub extern fn rust_begin_panic(_msg: core::fmt::Arguments,
                               _file: &'static str,
                               _line: u32,
                               _column: u32) -> ! {
    unsafe { core::intrinsics::abort() }
}

// These functions are used by the compiler, but not
// for a bare-bones hello world. These are normally
// provided by libstd.
#[lang = "eh_personality"]
#[no_mangle]
pub extern fn rust_eh_personality() {
}

// This function may be needed based on the compilation target.
#[lang = "eh_unwind_resume"]
#[no_mangle]
pub extern fn rust_eh_unwind_resume() {
}

#[start]
#[no_mangle]
#[allow(non_snake_case)]
pub fn mainCRTStartup(argc: isize, _argv: *const *const u8) -> isize
{
    let foo = [1, 2, 3, 4];
    foo[argc as usize]
}

The text was updated successfully, but these errors were encountered:

gamozolabs · 2018-01-14T00:33:41Z

Further testing shows that building with -Z thinlto=off reverts to using old full LTO which on recent builds does behave as expected (deletes construction of core::fmt args to panic_fmt). This is now my current workaround and I'm no longer using an old version of Rust for my bootloader. My concern is in the future this may be unsupported, but I have no idea.

However I still think it's worth looking into. I dumped the LLVM IR --emit llvm-ir and with thinlto=on and thinlto=off it produces identical IR. This leads me to believe that the issue lies more in LLVM than Rust, however I'm not familiar enough with the internals of Rust to conclude this. Further is this something we could optimize at MIR level as a Rust-specific optimization pass?

Edit: Disregard above statement --emit llvm-ir seems to disable ThinLTO so I cannot use it to accurately determine if the issue is with Rust or LLVM.

-B

alexcrichton · 2018-01-14T05:38:47Z

Thanks for the report @gamozolabs!

This sounds like it does indeed get blamed to ThinLTO! The previous form of LTO threw everything into one giant module which allows LLVM to see as much as possible and optimize as aggressively as possible. Typically this isn't necessary for applications, but as you're seeing there's still some that require the old form of LTO!

I've tagged this as a regression from stable to beta for now to ensure we can deal with this before the next release. There's a few things going on here of note.

The compiler, when using -C lto, has switched to using ThinLTO by default rather than the previous "throw everything in a module" (what I'll call fat LTO) by default. This I believe is the source of your regression if you were previously compiling with LTO.
There is an option to disable ThinLTO via -Z thinlto=no, but as the name implies it's not a stable option. Additionally there's no Carog.toml option for specifying ThinLTO on or off.

So given that I think there's two primary questions to consider when evaluating this:

Should we turn "fat LTO" back on by default when LTO is requested? We'd still use ThinLTO if -C lto isn't passed (just amongst the codegen units we're producing) however though.
Should we stabilize -Z thinlto and add corresponding options to Cargo.toml?

Curious what others from @rust-lang/compiler might think.

Mark-Simulacrum · 2018-01-15T03:19:10Z

We could make -C lto take an additional parameter, so something like -C lto=auto (current behavior, default), -C lto=fat (old, "fat" LTO), -C lto=thin (current behavior again). We could document auto as not having a stable meaning, and expose these same options in Cargo's lto option.

eddyb · 2018-01-15T03:28:47Z

I don't feel like the distinction should matter in general, but rather this is more of a LLVM limitation.
OTOH, we already have LLVM-specific options in -C.

hanna-kruppe · 2018-01-15T07:36:38Z

Aside: It may be possible to improve ThinLTO here. My theory is that panic_fmt is not internalized because there are multiple modules, and thus deadargelim doesn't fire. Other interprocedural optimizations have similar problem and in those cases adding data to the module summary and teaching those optimizations about that has helped. For deadargelim, you probably can't eliminate the arguments (because you can't update all the callers unilaterally), but you could store the fact that an argument is unused in the module summary and callers who are aware of this can pass undef instead.

michaelwoerister · 2018-01-15T10:49:14Z

I think we should provide a way to get full LTO on stable/beta as it still provides better runtime performance overall.

alexcrichton · 2018-01-16T05:47:00Z

@Mark-Simulacrum your suggestion sounds great to me! We could do what we did with debuginfo and accept lto = 'fat' in Cargo.toml as well as lto = true.

nikomatsakis · 2018-01-17T17:29:35Z

I like @Mark-Simulacrum's suggestion. I guess an open question is whether the default should be LTO or not -- but I think we can probably change the default to LTO if we think that's better. I'd be curious to hear from @gamozolabs if they agree.

alexcrichton · 2018-01-17T17:54:02Z

Ok I've opened #47521 to implement @Mark-Simulacrum's suggestion, and if we decide to merge I can follow up with a Cargo.toml option as well soon after merging.

I'm sort of ambivalent about which form of LTO should be the default when specifying only -C lto (either fat or thin LTO), although I'd slightly prefer ThinLTO as it should have all the same the-artifact's-runtime-is-faster benefits with a huge benefit on compile times. The downside of ThinLTO, as seen here though, is that it doesn't have the same code-size benefits.

michaelwoerister · 2018-01-18T12:36:45Z

I'm fine with defaulting to ThinLTO as long as full LTO is available on stable.

nikomatsakis · 2018-01-18T15:21:53Z

triage: P-high

This should get into beta, but there is a pending PR #47521

This commit reverts a small portion of the switch to ThinLTO by default which changed the meaning of `-C lto` from "put the whole crate graph into one codegen unit" to "perform ThinLTO over the whole crate graph". This backport has no corresponding commit on master as rust-lang#47521 is making the same change but in a slightly different manner. This commit is intended to be a surgical change with low impact on beta. Closes rust-lang#47409

[beta] Turn back on "fat" LTO by default This commit reverts a small portion of the switch to ThinLTO by default which changed the meaning of `-C lto` from "put the whole crate graph into one codegen unit" to "perform ThinLTO over the whole crate graph". This backport has no corresponding commit on master as #47521 is making the same change but in a slightly different manner. This commit is intended to be a surgical change with low impact on beta. Closes #47409

alexcrichton · 2018-01-23T04:16:38Z

I think this is closed now with #47548 merged

This commit primarily adds the ability to control what kind of LTO happens when rustc performs LTO, namely allowing values to be specified to the `-C lto` option, such as `-C lto=thin` and `-C lto=fat`. (where "fat" is the previous kind of LTO, throw everything in one giant module) Along the way this also refactors a number of fields which store information about whether LTO/ThinLTO are enabled to unify them all into one field through which everything is dispatched, hopefully removing a number of special cases throughout. This is intended to help mitigate rust-lang#47409 but will require a backport as well, and this would unfortunately need to be an otherwise insta-stable option.

alexcrichton added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Jan 14, 2018

Mark-Simulacrum added this to the 1.24 milestone Jan 15, 2018

alexcrichton mentioned this issue Jan 17, 2018

rustc: Add -C lto=val option #47521

Merged

gamozolabs mentioned this issue Jan 17, 2018

Panic related strings are still in binary with custom panic_fmt #47526

Open

rust-highfive added the P-high High priority label Jan 18, 2018

alexcrichton mentioned this issue Jan 18, 2018

[beta] Turn back on "fat" LTO by default #47548

Merged

gamozolabs mentioned this issue Jan 19, 2018

RFC: Introduce panic_thin, a fmtless alternative to panic_fmt rust-lang/rfcs#2305

Closed

alexcrichton closed this as completed Jan 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some DCE issues with ThinLTO leaving in construction of unused parameters #47409

Some DCE issues with ThinLTO leaving in construction of unused parameters #47409

gamozolabs commented Jan 13, 2018 •

edited

Loading

gamozolabs commented Jan 14, 2018 •

edited

Loading

alexcrichton commented Jan 14, 2018

Mark-Simulacrum commented Jan 15, 2018

eddyb commented Jan 15, 2018

hanna-kruppe commented Jan 15, 2018

michaelwoerister commented Jan 15, 2018

alexcrichton commented Jan 16, 2018

nikomatsakis commented Jan 17, 2018

alexcrichton commented Jan 17, 2018

michaelwoerister commented Jan 18, 2018

nikomatsakis commented Jan 18, 2018

alexcrichton commented Jan 23, 2018

Some DCE issues with ThinLTO leaving in construction of unused parameters #47409

Some DCE issues with ThinLTO leaving in construction of unused parameters #47409

Comments

gamozolabs commented Jan 13, 2018 • edited Loading

gamozolabs commented Jan 14, 2018 • edited Loading

alexcrichton commented Jan 14, 2018

Mark-Simulacrum commented Jan 15, 2018

eddyb commented Jan 15, 2018

hanna-kruppe commented Jan 15, 2018

michaelwoerister commented Jan 15, 2018

alexcrichton commented Jan 16, 2018

nikomatsakis commented Jan 17, 2018

alexcrichton commented Jan 17, 2018

michaelwoerister commented Jan 18, 2018

nikomatsakis commented Jan 18, 2018

alexcrichton commented Jan 23, 2018

gamozolabs commented Jan 13, 2018 •

edited

Loading

gamozolabs commented Jan 14, 2018 •

edited

Loading