Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split a func into cold/hot parts, reducing binary size #80042

Merged
merged 1 commit into from
Jan 12, 2021

Conversation

sivadeilra
Copy link

I noticed that the Size::bits function is called in many places,
and is inlined into them. On x86_64-pc-windows-msvc, this function
is inlined 527 times, and compiled separately (non-inlined) 3 times.

Each of those inlined calls contains code that panics. This commit
moves the panic! call into a separate function and marks that
function with #[cold].

This reduces binary size by 24 KB. Not much, but it's something.
Changes like this often reduce pressure on instruction-caches,
since it reduces the amount of code that is inlined into hot code
paths. Or more precisely, it removes cold code from hot cache lines.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @oli-obk (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 14, 2020
@jyn514 jyn514 added I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 15, 2020
@camelid
Copy link
Member

camelid commented Dec 15, 2020

Does this reduce the size of the rustc binary only or every binary?

@sivadeilra
Copy link
Author

Does this reduce the size of the rustc binary only or every binary?

It reduces the size of the rustc_driver-xxx.dll binary. If you're asking about the intermediate .rlib files, then I assume it is going to reduce the size of the ones affected (rustc_target, rustc_middle, rustc_mir, rustc_codegen_llvm, mainly.) But I'm more interested in the size of the final output than in the size of the intermediate files.

@oli-obk
Copy link
Contributor

oli-obk commented Dec 15, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@bors
Copy link
Contributor

bors commented Dec 15, 2020

⌛ Trying commit bc8faf0cd652e410864acef7a511e3106a19b2fd with merge 145cc5b50bad8a74379045e8cadab5a126bcae1d...

@bors
Copy link
Contributor

bors commented Dec 15, 2020

☀️ Try build successful - checks-actions
Build commit: 145cc5b50bad8a74379045e8cadab5a126bcae1d (145cc5b50bad8a74379045e8cadab5a126bcae1d)

@rust-timer
Copy link
Collaborator

Queued 145cc5b50bad8a74379045e8cadab5a126bcae1d with parent 99baddb, future comparison URL.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 15, 2020
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (145cc5b50bad8a74379045e8cadab5a126bcae1d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 15, 2020
@bjorn3
Copy link
Member

bjorn3 commented Dec 15, 2020

The difference seems to be within noise.

@sivadeilra
Copy link
Author

I'm not expecting a big result from this. But eliminating 500 duplicates of cold panic code seems worth it, right?

@oli-obk
Copy link
Contributor

oli-obk commented Dec 16, 2020

But eliminating 500 duplicates of cold panic code seems worth it, right?

Indeed. Do you think we'll get the same benefit even when keeping the original checked_mul and unwrap_or_else logic and just using the new function within the unwrap closure?

@sivadeilra
Copy link
Author

But eliminating 500 duplicates of cold panic code seems worth it, right?

Indeed. Do you think we'll get the same benefit even when keeping the original checked_mul and unwrap_or_else logic and just using the new function within the unwrap closure?

Possibly. I can check the generated assembly. I've seen checked_mul generate fairly efficient code, so I'm not worried about that part. Mainly it's the panicking and formatting that can be surprisingly expensive. Not just to execute, but just the code itself can be surprisingly large, which can be painful when it's duplicated many times.

Ideally, we would notice that a particular basic block always leads to a #[cold] path, and would "outline" that BB (prevent it from being inlined, and so prevent it from being dup'd). That would solve the general case. However, that's not something that rustc does automatically at this point. If it is ever added, then a lot of hand-tuning like this could be removed.

@sivadeilra
Copy link
Author

I tried switching the code back to using checked_mul. The generated assembly code is a bit worse; it uses a mul instruction, rather than shifting, and a jo to branch on overflow.

Maybe a better approach would be for the Size struct to contain a u64 that counts the number of bits, not the number of bytes, in a size. That way, the bytes function would just be fn bytes(self) -> u64 { self.bits >> 3 }. No conditionals at all. The constructor (Size::new) would test for overflow. Which would basically never happen -- we'll never have a legit type whose Size is 0x3fff_ffff_ffff_ffff in bytes, that is, a size that would overflow when expressed as a bit-count but not overflow when expressed as a byte-count.

@oli-obk
Copy link
Contributor

oli-obk commented Dec 18, 2020

Hmm... it seems a bit wrong to store the bits if we can't actually encode the bit size, but must always encode a byte size.

One thing we could try is to make the Size struct use rustc_layout_scalar_valid_range_end to declare that the maximum value must be smaller than u64::max_value() >> 3 and see whether llvm can figure out that taking the raw field will thus allow us to multiply by 8 without causing any problems. This would also have the fun side effect of allowing the free bits of Size to be used in enum optimizations across the compiler. Additionally, we now need to check that the size is smaller than u64::max_value() >> 3 at Size construction time. Not sure if that gains us anything to checking it at the use site.

What do you think of that idea? We can also open a compiler MCP (major change proposal) to make sure that we get the input of the team on this, since it technically could be observed by users, but I don't think in any case in which some code currently compiles and won't compile after this change.

@bjorn3
Copy link
Member

bjorn3 commented Dec 18, 2020

One thing we could try is to make the Size struct use rustc_layout_scalar_valid_range_end to declare that the maximum value must be smaller than u64::max_value() >> 3 and see whether llvm can figure out that taking the raw field will thus allow us to multiply by 8 without causing any problems.

As far as I know LLVM is not told about rustc_layout_scalar_valid_range_end.

@sivadeilra
Copy link
Author

I think using rustc_layout_scalar_valid_range_end is interesting. Option<Size> is used in a few places. Not enough that I think the benefits are huge, but it's always good to enable more layout optimizations.

I think the key thing is to move the validation to the constructor of Size. Everything else is just "how do we represent this". It seems very unlikely that any code relies on constructing a very large Size and then carefully avoids calling my_size.bits() on it; the calls to bits() greatly outnumber the calls to bytes().

Me personally, I think this is below the threshold of the MCP process; the compiler team has much bigger issues facing them. If we make this change (verifying the lower maximum value for Size::from_*), and it turns out that something does depend on it, then we'll get a deterministic failure. That seems to fit with the philosophy of the nightly / beta release process, rather than the MCP process. We can commit it to master, and let it go through the usual stabilization process. If it causes problems, which I think is very unlikely, then it will be easy to undo this change, since it affects only the internals of a type, and not its publicly-visible API.

@oli-obk
Copy link
Contributor

oli-obk commented Dec 21, 2020

I agree on all points. Do you want to implement this as a replacement for this PR or do you have other steps in mind?

@sivadeilra
Copy link
Author

I agree on all points. Do you want to implement this as a replacement for this PR or do you have other steps in mind?

I'll implement this and submit an update to this PR. Sorry for the delay in responding -- holidays, etc.

I noticed that the Size::bits function is called in many places,
and is inlined into them. On x86_64-pc-windows-msvc, this function
is inlined 527 times, and compiled separately (non-inlined) 3 times.

Each of those inlined calls contains code that panics. This commit
moves the `panic!` call into a separate function and marks that
function with `#[cold]`.

This reduces binary size by 24 KB. By itself, that's not a substantial
reduction. However, changes like this often reduce pressure on
instruction-caches, since it reduces the amount of code that is inlined
into hot code paths. Or more precisely, it removes cold code from hot
cache lines. It also removes all conditionals from Size::bits(),
which is called in many places.
@sivadeilra
Copy link
Author

@oli-obk I think this is ready now.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 11, 2021

Thanks!

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Jan 11, 2021

📌 Commit 4721b65 has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 11, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 12, 2021
Rollup of 9 pull requests

Successful merges:

 - rust-lang#79997 (Emit a reactor for cdylib target on wasi)
 - rust-lang#79998 (Use correct ABI for wasm32 by default)
 - rust-lang#80042 (Split a func into cold/hot parts, reducing binary size)
 - rust-lang#80324 (Explain method-call move errors in loops)
 - rust-lang#80864 (std/core docs: fix wrong link in PartialEq)
 - rust-lang#80870 (resolve: Simplify built-in macro table)
 - rust-lang#80885 (rustdoc: Resolve `&str` as `str`)
 - rust-lang#80904 (Fix small typo)
 - rust-lang#80923 (Merge different function exits)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 56504a0 into rust-lang:master Jan 12, 2021
@rustbot rustbot added this to the 1.51.0 milestone Jan 12, 2021
@sivadeilra sivadeilra deleted the cold_bits branch January 12, 2021 20:21
@@ -238,22 +238,38 @@ pub enum Endian {
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug, Encodable, Decodable)]
#[derive(HashStable_Generic)]
pub struct Size {
// The top 3 bits are ALWAYS zero.
raw: u64,
Copy link
Member

@RalfJung RalfJung Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned by this comment that has been added here.

  • It doesn't seem actually enforced -- with Size::from_bytes I can easily construct a Size that violates this.
  • I would like to use this type to represent the size of any possible Rust object in Miri (including for things like size_of_val_raw, where the object does not actually have to exist in the address space). However, the size limit for that is isize::MAX. Something seems off if the compiler's Size type cannot be used to represent the size of all objects in the language...

Currently Miri enforces dl.obj_size_bound() as the size limit, but that is probably wrong -- it does not match what the reference and size_of_val_raw say the limit should be (they both say it is isize::MAX).

// This is the largest value of `bits` that does not cause overflow
// during rounding, and guarantees that the resulting number of bytes
// cannot cause overflow when multiplied by 8.
if bits > 0xffff_ffff_ffff_fff8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also very confused by this check. bits / 8 + ((bits % 8) + 7) / 8 will never overflow I think, so what does this have to do with "overflow during rounding"?

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Mar 30, 2022
allow large Size again

This basically reverts most of rust-lang#80042, and instead does the panic in `bits()` with a `#[cold]` function to make sure it does not get inlined.

rust-lang#80042 added a comment about an invariant ("The top 3 bits are ALWAYS zero") that is not actually enforced, and if it were enforced that would be a problem for rust-lang#95388. So I think we should not have that invariant, and I adjusted the code accordingly.

r? `@oli-obk` Cc `@sivadeilra`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 31, 2022
allow large Size again

This basically reverts most of rust-lang#80042, and instead does the panic in `bits()` with a `#[cold]` function to make sure it does not get inlined.

rust-lang#80042 added a comment about an invariant ("The top 3 bits are ALWAYS zero") that is not actually enforced, and if it were enforced that would be a problem for rust-lang#95388. So I think we should not have that invariant, and I adjusted the code accordingly.

r? `@oli-obk` Cc `@sivadeilra`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-heavy Issue: Problems and improvements with respect to binary size of generated code. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants