Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline most of the code paths for conversions with boxed slices #49555

Merged
merged 3 commits into from
Apr 16, 2018

Conversation

nox
Copy link
Contributor

@nox nox commented Apr 1, 2018

This helps with the specific problem described in #49541, obviously without making any large change to how inlining works in the general case.

Everything involved in the conversions is made #[inline], except for the <Vec<T>>::into_boxed_slice entry point which is made #[inline(always)] after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with
#[inline].

For the record, that function was:

pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}

To help the inliner's job, we also hoist a self.capacity() != self.len check in <Vec<T>>::shrink_to_fit and mark it as #[inline] too.

This helps with the specific problem described in rust-lang#49541, obviously without
making any large change to how inlining works in the general case.

Everything involved in the conversions is made `#[inline]`, except for the
`<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]`
after checking that duplicating the function mentioned in the issue prevented
its inlining if I only annotate it with `#[inline]`.

For the record, that function was:

```rust
pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}
```

To help the inliner's job, we also hoist a `self.capacity() != self.len` check
in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.
@eddyb
Copy link
Member

eddyb commented Apr 1, 2018

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned KodrAus Apr 1, 2018
self.buf.shrink_to_fit(self.len);
if self.capacity() != self.len {
self.buf.shrink_to_fit(self.len);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these don't need the #[inline] attribute because they're already generic, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check and report back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this wasn't necessary. I removed it and compiled a module with foo duplicated, here is the result of compiling that to assembly:

	.section	__TEXT,__text,regular,pure_instructions
	.p2align	4, 0x90
__ZN55_$LT$alloc..heap..Heap$u20$as$u20$core..heap..Alloc$GT$3oom17h2c0e1b8273a8e186E:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	callq	___rust_oom
	ud2
	.cfi_endproc

	.p2align	4, 0x90
__ZN5alloc4heap15exchange_malloc28_$u7b$$u7b$closure$u7d$$u7d$17h1d897acb095d410dE:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$32, %rsp
	movq	16(%rdi), %rax
	movq	%rax, -8(%rbp)
	movq	(%rdi), %rax
	movq	8(%rdi), %rcx
	movq	%rcx, -16(%rbp)
	movq	%rax, -24(%rbp)
	leaq	-24(%rbp), %rdi
	callq	__ZN55_$LT$alloc..heap..Heap$u20$as$u20$core..heap..Alloc$GT$3oom17h2c0e1b8273a8e186E
	ud2
	.cfi_endproc

	.globl	_foo
	.p2align	4, 0x90
_foo:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$48, %rsp
	leaq	-40(%rbp), %rdx
	movl	$1, %edi
	movl	$1, %esi
	callq	___rust_alloc
	testq	%rax, %rax
	je	LBB2_1
	movb	$0, (%rax)
	movl	$1, %edx
	addq	$48, %rsp
	popq	%rbp
	retq
LBB2_1:
	movq	-32(%rbp), %rax
	movq	-24(%rbp), %rcx
	movq	%rcx, -8(%rbp)
	movq	%rax, -16(%rbp)
	movq	-16(%rbp), %rax
	movq	-8(%rbp), %rcx
	movq	%rcx, -24(%rbp)
	movq	%rax, -32(%rbp)
	leaq	-40(%rbp), %rdi
	callq	__ZN5alloc4heap15exchange_malloc28_$u7b$$u7b$closure$u7d$$u7d$17h1d897acb095d410dE
	ud2
	.cfi_endproc

	.globl	_bar
	.p2align	4, 0x90
_bar:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$48, %rsp
	leaq	-40(%rbp), %rdx
	movl	$1, %edi
	movl	$1, %esi
	callq	___rust_alloc
	testq	%rax, %rax
	je	LBB3_1
	movb	$0, (%rax)
	movl	$1, %edx
	addq	$48, %rsp
	popq	%rbp
	retq
LBB3_1:
	movq	-32(%rbp), %rax
	movq	-24(%rbp), %rcx
	movq	%rcx, -8(%rbp)
	movq	%rax, -16(%rbp)
	movq	-16(%rbp), %rax
	movq	-8(%rbp), %rcx
	movq	%rcx, -24(%rbp)
	movq	%rax, -32(%rbp)
	leaq	-40(%rbp), %rdi
	callq	__ZN5alloc4heap15exchange_malloc28_$u7b$$u7b$closure$u7d$$u7d$17h1d897acb095d410dE
	ud2
	.cfi_endproc


.subsections_via_symbols

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the code change here also still necessary to get the same size reductions?

@alexcrichton
Copy link
Member

I'm personally ok on adding #[inline] to functions that otherwise wouldn't be inlined across crates (e.g. the str-related shims here) but I'm hesitant to add #[inline] to annotations that are already inlined across crates (aka generics). Optimizations like ThinLTO should already take care of inlining there when necessary, and if something is determined to not be inlined then I'd err on the side of trusting LLVM rather than forcing it to inline which can run the risk of producing suboptimal code in terms of size and such.

@pietroalbini pietroalbini added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 2, 2018
@shepmaster shepmaster added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 7, 2018
@@ -636,6 +638,7 @@ impl<T> Vec<T> {
/// assert_eq!(slice.into_vec().capacity(), 3);
/// ```
#[stable(feature = "rust1", since = "1.0.0")]
#[inline(always)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, removing this one makes the into_boxed_slice calls not be optimised anymore when the function is duplicated. Even just #[inline] makes rustc not inline the calls anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so in general #[inline(always)] is an antipattern as it can cause compile time to baloon very quickly. Mind if I poke around a bit at the examples you were looking at to see what's going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

#![crate_type = "lib"]

#[no_mangle]
pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}

#[no_mangle]
pub fn bar() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok I'm tempted though to leave this as-is and let LLVM decide these sorts of things, we've had bad experiences in the past optimizing too much for microbenchmarks unfortunately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean this specific #[inline(always)] attribute, or the whole PR? Either is fine with me, to be heard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh just this one particular instance of the attribute. The function is generic so it's already candidate to be inlined everywhere, and it sounds like #[inline] doesn't otherwise work here so #[inline(always)] is going directly against LLVM's heuristics, which has caused a lot of problems for us historically

@nox nox added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 8, 2018
@alexcrichton
Copy link
Member

@bors: r+ rollup

Thanks!

@bors
Copy link
Contributor

bors commented Apr 16, 2018

📌 Commit b59fa0d has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 16, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Apr 16, 2018
Inline most of the code paths for conversions with boxed slices

This helps with the specific problem described in rust-lang#49541, obviously without making any large change to how inlining works in the general case.

Everything involved in the conversions is made `#[inline]`, except for the `<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]` after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with
`#[inline]`.

For the record, that function was:

```rust
pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}
```

To help the inliner's job, we also hoist a `self.capacity() != self.len` check in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.
bors added a commit that referenced this pull request Apr 16, 2018
Rollup of 8 pull requests

Successful merges:

 - #49555 (Inline most of the code paths for conversions with boxed slices)
 - #49606 (Prevent broken pipes causing ICEs)
 - #49646 (Use box syntax instead of Box::new in Mutex::remutex on Windows)
 - #49647 (Remove `underscore_lifetimes` and `match_default_bindings` from active feature list)
 - #49931 (Fix incorrect span in `&mut` suggestion)
 - #49959 (rustbuild: allow building tools with debuginfo)
 - #49965 (Remove warning about f64->f32 cast being potential UB)
 - #49994 (Remove unnecessary indentation in rustdoc book codeblock.)

Failed merges:
@bors bors merged commit b59fa0d into rust-lang:master Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants