Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThinLTO bloats size of bare metal programs by up to 1200% #47770

Closed
japaric opened this issue Jan 26, 2018 · 3 comments
Closed

ThinLTO bloats size of bare metal programs by up to 1200% #47770

japaric opened this issue Jan 26, 2018 · 3 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@japaric
Copy link
Member

japaric commented Jan 26, 2018

Original report: japaric/stm32f103xx-hal#44

STR

$ git clone https://github.com/japaric/stm32f103xx-hal

$ cd stm32f103xx-hal

$ git checkout 9f80811a6026f0849ce2afee5baf0cf086563b65

$ xargo build --example reactive-serial-circ --release
(..)
    Finished release [optimized + debuginfo] target(s) in 13.52 secs
(..)
    Finished release [optimized + debuginfo] target(s) in 2.24 secs
(..)
    Finished release [optimized + debuginfo] target(s) in 38.37 secs

$ arm-none-eabi-size target/thumbv7m-none-eabi/release/examples/reactive-serial-circ
   text    data     bss     dec     hex filename
  12154       8      20   12182    2f96 (..)

$ xargo bloat --example reactive-serial-circ --release
File  .text   Size Name
0.0%   2.3%   170B [15 Others]
0.4%  18.9% 1.4KiB core::fmt::Formatter::pad_integral
0.2%  12.9%   962B core::fmt::Formatter::pad
0.2%   9.7%   720B stm32f103xx_hal::rcc::CFGR::freeze
0.2%   9.0%   672B core::str::slice_error_fail
0.1%   7.3%   544B core::fmt::write
0.1%   7.2%   538B <char as core::fmt::Debug>::fmt
0.1%   5.9%   440B _ZN20reactive_serial_circ4main17h416bbf939f973349E.llvm.E85F65E9
0.1%   4.6%   340B cortex_m::peripheral::nvic::<impl cortex_m::peripheral::NVIC>::enable
0.1%   3.6%   266B _ZN4core3fmt10ArgumentV110show_usize17h636fea4f06967745E.llvm.EA6B2883
0.1%   3.6%   266B core::fmt::num::<impl core::fmt::Display for usize>::fmt
0.1%   3.6%   266B core::fmt::num::<impl core::fmt::Debug for usize>::fmt
0.0%   2.4%   182B _ZN4core12char_private5check17h7f91014bbb6a3313E.llvm.9871CEB7
0.0%   1.6%   118B DMA1_CHANNEL5
0.0%   1.4%   106B cortex_m_rt::reset_handler
0.0%   1.2%    92B core::result::unwrap_failed
0.0%   1.0%    78B core::slice::slice_index_order_fail
0.0%   1.0%    78B core::slice::slice_index_len_fail
0.0%   1.0%    74B core::panicking::panic_bounds_check
0.0%   0.9%    70B <core::ops::range::Range<Idx> as core::fmt::Debug>::fmt
0.0%   0.8%    60B core::panicking::panic
1.9% 100.0% 7.3KiB .text section size, the file size is 376.2KiB

Changing profile.release.codegen-units to 1 (suggested in other issues) does not help. Fully
disabling ThinLTO passing -Z thinlto=no to rustc (is there a Cargo.toml setting for that?) fixes
the binary size problem without meaningfully affecting the compilation speed (if anything it's
slightly faster)

$ cat .cargo/config
[target.thumbv7m-none-eabi]
runner = 'arm-none-eabi-gdb'
rustflags = [
  "-C", "link-arg=-Tlink.x",
  "-C", "linker=arm-none-eabi-ld",
  "-Z", "linker-flavor=ld",
  "-Z", "thinlto=no", # <-
]

[build]
target = "thumbv7m-none-eabi"

$ xargo build --example reactive-serial-circ --release
(..)
    Finished release [optimized + debuginfo] target(s) in 12.69 secs
(..)
    Finished release [optimized + debuginfo] target(s) in 1.98 secs
(..)
    Finished release [optimized + debuginfo] target(s) in 38.54 secs

$ arm-none-eabi-size target/thumbv7m-none-eabi/release/examples/reactive-serial-circ
   text    data     bss     dec     hex filename
    890       8      20     918     396 (..)

$ xargo bloat --example reactive-serial-circ --release
File  .text Size Name
0.0%   0.0%   0B [0 Others]
0.1%  37.2% 250B reactive_serial_circ::init
0.1%  29.2% 196B cortex_m_rt::reset_handler
0.0%  16.1% 108B DMA1_CHANNEL5
0.0%   1.5%  10B core::result::unwrap_failed
0.0%   1.5%  10B SVCALL
0.0%   1.5%  10B MEM_MANAGE
0.0%   1.5%  10B DEBUG_MONITOR
0.0%   1.5%  10B PENDSV
0.0%   1.5%  10B USAGE_FAULT
0.0%   1.5%  10B SYS_TICK
0.0%   1.5%  10B NMI
0.0%   1.5%  10B DEFAULT_HANDLER
0.0%   1.5%  10B BUS_FAULT
0.0%   1.5%  10B HARD_FAULT
0.0%   0.6%   4B core::panicking::panic_fmt
0.0%   0.6%   4B cortex_m_rt::default_handler
0.3% 100.0% 672B .text section size, the file size is 261.6KiB

Meta

$ rustc -Vv
rustc 1.25.0-nightly (ae920dcc9 2018-01-22)

Given the severity of the issue I'm going to recommend my users to disable ThinLTO as I already do
with parallel codegen and incremental compilation.

cc @alexcrichton @aturon
cc #47745

@nagisa
Copy link
Member

nagisa commented Jan 26, 2018

Are you confident -Ccodegen-units=1 passed down to rustc? It is hard to imagine that formatting machinery would get (re-)included after ThinLTO.

@alexcrichton
Copy link
Member

This is currently expected as ThinLTO doesn't optimize for program size as much as full-program LTO does. The defaults are switching back in #47521, so this should be closed once that merges.

@nagisa nagisa added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jan 26, 2018
@alexcrichton
Copy link
Member

Ok I'm gonna close this now that #47521 is merged but thanks regardless for the report @japaric!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants