-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split a func into cold/hot parts, reducing binary size #80042
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @oli-obk (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Does this reduce the size of the |
It reduces the size of the |
@bors try @rust-timer queue |
Awaiting bors try build completion. |
⌛ Trying commit bc8faf0cd652e410864acef7a511e3106a19b2fd with merge 145cc5b50bad8a74379045e8cadab5a126bcae1d... |
☀️ Try build successful - checks-actions |
Queued 145cc5b50bad8a74379045e8cadab5a126bcae1d with parent 99baddb, future comparison URL. @rustbot label: +S-waiting-on-perf |
Finished benchmarking try commit (145cc5b50bad8a74379045e8cadab5a126bcae1d): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
The difference seems to be within noise. |
I'm not expecting a big result from this. But eliminating 500 duplicates of cold panic code seems worth it, right? |
Indeed. Do you think we'll get the same benefit even when keeping the original |
Possibly. I can check the generated assembly. I've seen Ideally, we would notice that a particular basic block always leads to a |
I tried switching the code back to using Maybe a better approach would be for the |
Hmm... it seems a bit wrong to store the bits if we can't actually encode the bit size, but must always encode a byte size. One thing we could try is to make the What do you think of that idea? We can also open a compiler MCP (major change proposal) to make sure that we get the input of the team on this, since it technically could be observed by users, but I don't think in any case in which some code currently compiles and won't compile after this change. |
As far as I know LLVM is not told about |
I think using I think the key thing is to move the validation to the constructor of Me personally, I think this is below the threshold of the MCP process; the compiler team has much bigger issues facing them. If we make this change (verifying the lower maximum value for |
I agree on all points. Do you want to implement this as a replacement for this PR or do you have other steps in mind? |
I'll implement this and submit an update to this PR. Sorry for the delay in responding -- holidays, etc. |
I noticed that the Size::bits function is called in many places, and is inlined into them. On x86_64-pc-windows-msvc, this function is inlined 527 times, and compiled separately (non-inlined) 3 times. Each of those inlined calls contains code that panics. This commit moves the `panic!` call into a separate function and marks that function with `#[cold]`. This reduces binary size by 24 KB. By itself, that's not a substantial reduction. However, changes like this often reduce pressure on instruction-caches, since it reduces the amount of code that is inlined into hot code paths. Or more precisely, it removes cold code from hot cache lines. It also removes all conditionals from Size::bits(), which is called in many places.
@oli-obk I think this is ready now. |
Thanks! @bors r+ rollup |
📌 Commit 4721b65 has been approved by |
Rollup of 9 pull requests Successful merges: - rust-lang#79997 (Emit a reactor for cdylib target on wasi) - rust-lang#79998 (Use correct ABI for wasm32 by default) - rust-lang#80042 (Split a func into cold/hot parts, reducing binary size) - rust-lang#80324 (Explain method-call move errors in loops) - rust-lang#80864 (std/core docs: fix wrong link in PartialEq) - rust-lang#80870 (resolve: Simplify built-in macro table) - rust-lang#80885 (rustdoc: Resolve `&str` as `str`) - rust-lang#80904 (Fix small typo) - rust-lang#80923 (Merge different function exits) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
@@ -238,22 +238,38 @@ pub enum Endian { | |||
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Debug, Encodable, Decodable)] | |||
#[derive(HashStable_Generic)] | |||
pub struct Size { | |||
// The top 3 bits are ALWAYS zero. | |||
raw: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned by this comment that has been added here.
- It doesn't seem actually enforced -- with
Size::from_bytes
I can easily construct aSize
that violates this. - I would like to use this type to represent the size of any possible Rust object in Miri (including for things like
size_of_val_raw
, where the object does not actually have to exist in the address space). However, the size limit for that isisize::MAX
. Something seems off if the compiler'sSize
type cannot be used to represent the size of all objects in the language...
Currently Miri enforces dl.obj_size_bound()
as the size limit, but that is probably wrong -- it does not match what the reference and size_of_val_raw
say the limit should be (they both say it is isize::MAX
).
// This is the largest value of `bits` that does not cause overflow | ||
// during rounding, and guarantees that the resulting number of bytes | ||
// cannot cause overflow when multiplied by 8. | ||
if bits > 0xffff_ffff_ffff_fff8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also very confused by this check. bits / 8 + ((bits % 8) + 7) / 8
will never overflow I think, so what does this have to do with "overflow during rounding"?
allow large Size again This basically reverts most of rust-lang#80042, and instead does the panic in `bits()` with a `#[cold]` function to make sure it does not get inlined. rust-lang#80042 added a comment about an invariant ("The top 3 bits are ALWAYS zero") that is not actually enforced, and if it were enforced that would be a problem for rust-lang#95388. So I think we should not have that invariant, and I adjusted the code accordingly. r? `@oli-obk` Cc `@sivadeilra`
allow large Size again This basically reverts most of rust-lang#80042, and instead does the panic in `bits()` with a `#[cold]` function to make sure it does not get inlined. rust-lang#80042 added a comment about an invariant ("The top 3 bits are ALWAYS zero") that is not actually enforced, and if it were enforced that would be a problem for rust-lang#95388. So I think we should not have that invariant, and I adjusted the code accordingly. r? `@oli-obk` Cc `@sivadeilra`
I noticed that the Size::bits function is called in many places,
and is inlined into them. On x86_64-pc-windows-msvc, this function
is inlined 527 times, and compiled separately (non-inlined) 3 times.
Each of those inlined calls contains code that panics. This commit
moves the
panic!
call into a separate function and marks thatfunction with
#[cold]
.This reduces binary size by 24 KB. Not much, but it's something.
Changes like this often reduce pressure on instruction-caches,
since it reduces the amount of code that is inlined into hot code
paths. Or more precisely, it removes cold code from hot cache lines.