Optimize `array::IntoIter` #100214

scottmcm · 2022-08-06T23:47:09Z

.into_iter() on arrays was slower than it needed to be (especially compared to slice iterator) since it uses Range<usize>, which needs to handle degenerate ranges like 10..4.

This PR adds an internal IndexRange type that's like Range<usize> but with a safety invariant that means it doesn't need to worry about those cases -- it only handles start <= end -- and thus can give LLVM more information to optimize better.

I added one simple demonstration of the improvement as a codegen test.

(vec::IntoIter uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed. array::IntoIter can't use pointers because that would keep it from being movable.)

rust-highfive · 2022-08-06T23:47:12Z

r? @thomcc

(rust-highfive has picked a reviewer for you, use r? to override)

timvermeulen · 2022-08-07T02:25:50Z

This regresses the vec::bench_flat_map_collect benchmark by a lot for me, over 10x.

When looking into this I noticed that array::IntoIter::fold calls fold on iter::ByRefSized(&mut self.alive), which as far as I can see doesn't call the actual fold implementation of IndexRange because ByRefSized would have to pass the iterator by value, which it can't. So using ByRefSized seems to be pointless here. However, it doesn't look like this is what's causing the slowdown.

scottmcm · 2022-08-07T03:14:18Z

Thanks, @timvermeulen, I'll look at that one.

I've sent PR #100220 to fix ByRefSized::fold, and will consider this PR blocked until that one goes in and I can confirm it solves the problem you raised.

@rustbot author

Update 2022-08-24: I've rebased atop the other PR, but still not investigated that perf test, so it's not really review-ready yet.

…riplett Properly forward `ByRefSized::fold` to the inner iterator cc `@timvermeulen,` who noticed this mistake in rust-lang#100214 (comment)

…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)

…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ```@timvermeulen,``` who noticed this mistake in rust-lang#100214 (comment)

…riplett Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang#100214 (comment)

library/core/src/ops/index_range.rs

Properly forward `ByRefSized::fold` to the inner iterator cc ``@timvermeulen,`` who noticed this mistake in rust-lang/rust#100214 (comment)

scottmcm · 2022-09-18T03:17:23Z

@timvermeulen My mistake turned out to be an obvious one: I'd forgotten to inline some methods, and because it's a non-generic type that meant the benches ended up needing to call it instead of inline it, with the obvious terrible consequences.

I now get exactly the same inner loop for that .flat_map(|color| color.rotate_left(8).to_be_bytes()) bench as without this PR:

.LBB0_5:
	mov	edx, dword ptr [rbx + rcx]
	rol	edx, 8
	bswap	edx
	mov	dword ptr [rax + rcx], edx
	add	rcx, 4
	cmp	rdi, rcx
	jne	.LBB0_5

@rustbot ready

thomcc · 2022-09-18T03:33:25Z

@bors try @rust-timer queue

rust-timer · 2022-09-18T03:33:26Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

thomcc · 2022-09-19T20:24:36Z

@rustbot author

`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`. This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better. I added one simple demonstration of the improvement as a codegen test.

scottmcm · 2022-09-20T07:22:03Z

Oh, it's failing on the UB trap check. I've ignored the codegen test for debug, since it's a -O test.

@bors r=thomcc

bors · 2022-09-20T07:22:05Z

📌 Commit 6dbd9a2 has been approved by thomcc

It is now in the queue for this repository.

bors · 2022-09-20T15:58:07Z

⌛ Testing commit 6dbd9a2 with merge da70c0543ba1f77142453c5c82e0a1c1e4e327d3...

oli-obk · 2022-09-20T16:01:05Z

@bors retry (cycling a higher priority PR)

#102028

bors · 2022-09-20T16:05:36Z

⌛ Testing commit 6dbd9a2 with merge ebbb1c6a0f71f7a95483d2fef3bed8c804b21cbe...

bors · 2022-09-20T16:10:09Z

💔 Test failed - checks-actions

scottmcm · 2022-09-20T22:28:27Z

Looks like the retry didn't put it back in the queue right? I'll try it again.

@bors retry

rust-log-analyzer · 2022-09-21T00:09:28Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

bors · 2022-09-21T00:41:36Z

⌛ Testing commit 6dbd9a2 with merge 4ecfdfa...

rust-log-analyzer · 2022-09-21T03:18:01Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

bors · 2022-09-21T03:22:40Z

☀️ Test successful - checks-actions
Approved by: thomcc
Pushing 4ecfdfa to master...

rust-timer · 2022-09-21T04:41:22Z

Finished benchmarking commit (4ecfdfa): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.4%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-0.4%, -0.4%]	1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.7%	[0.9%, 2.5%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.0%	[-2.0%, -2.0%]	1
Improvements ✅ (secondary)	-3.6%	[-3.6%, -3.6%]	1
All ❌✅ (primary)	-2.0%	[-2.0%, -2.0%]	1

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

scottmcm · 2022-09-21T19:25:35Z

Huh, interesting that the post-merge perf run is so different (thankfully better!) from the original: #100214 (comment)

rust-highfive assigned thomcc Aug 6, 2022

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 6, 2022

This comment was marked as resolved.

Sign in to view

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 6, 2022

scottmcm mentioned this pull request Aug 7, 2022

Properly forward ByRefSized::fold to the inner iterator #100220

Merged

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 7, 2022

scottmcm added the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Aug 23, 2022

scottmcm removed the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Aug 24, 2022

scottmcm force-pushed the strict-range branch 2 times, most recently from 84f0eaf to 14e0e7b Compare August 25, 2022 00:55

timvermeulen reviewed Aug 25, 2022

View reviewed changes

library/core/src/ops/index_range.rs Outdated Show resolved Hide resolved

the8472 mentioned this pull request Aug 27, 2022

Codegen weirdness for sum of count_ones over an array #101060

Open

scottmcm force-pushed the strict-range branch from 14e0e7b to c08ec39 Compare September 17, 2022 21:34

This comment has been minimized.

Sign in to view

scottmcm force-pushed the strict-range branch 2 times, most recently from 4104932 to c43b960 Compare September 18, 2022 03:07

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2022

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 19, 2022

scottmcm force-pushed the strict-range branch from fb3a7d8 to 6dbd9a2 Compare September 20, 2022 06:24

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 20, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 20, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 20, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 21, 2022

bors merged commit 4ecfdfa into rust-lang:master Sep 21, 2022

rustbot added this to the 1.66.0 milestone Sep 21, 2022

scottmcm deleted the strict-range branch September 21, 2022 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `array::IntoIter` #100214

Optimize `array::IntoIter` #100214

scottmcm commented Aug 6, 2022

This comment was marked as resolved.

rust-highfive commented Aug 6, 2022

timvermeulen commented Aug 7, 2022

scottmcm commented Aug 7, 2022 •

edited

Loading

This comment has been minimized.

scottmcm commented Sep 18, 2022

thomcc commented Sep 18, 2022

rust-timer commented Sep 18, 2022

thomcc commented Sep 19, 2022

scottmcm commented Sep 20, 2022

bors commented Sep 20, 2022

bors commented Sep 20, 2022

oli-obk commented Sep 20, 2022

bors commented Sep 20, 2022

bors commented Sep 20, 2022

scottmcm commented Sep 20, 2022

rust-log-analyzer commented Sep 21, 2022

bors commented Sep 21, 2022

rust-log-analyzer commented Sep 21, 2022

bors commented Sep 21, 2022

rust-timer commented Sep 21, 2022

scottmcm commented Sep 21, 2022

Optimize array::IntoIter #100214

Optimize array::IntoIter #100214

Conversation

scottmcm commented Aug 6, 2022

This comment was marked as resolved.

rust-highfive commented Aug 6, 2022

timvermeulen commented Aug 7, 2022

scottmcm commented Aug 7, 2022 • edited Loading

This comment has been minimized.

scottmcm commented Sep 18, 2022

thomcc commented Sep 18, 2022

rust-timer commented Sep 18, 2022

thomcc commented Sep 19, 2022

scottmcm commented Sep 20, 2022

bors commented Sep 20, 2022

bors commented Sep 20, 2022

oli-obk commented Sep 20, 2022

bors commented Sep 20, 2022

bors commented Sep 20, 2022

scottmcm commented Sep 20, 2022

rust-log-analyzer commented Sep 21, 2022

bors commented Sep 21, 2022

rust-log-analyzer commented Sep 21, 2022

bors commented Sep 21, 2022

rust-timer commented Sep 21, 2022

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

scottmcm commented Sep 21, 2022

Optimize `array::IntoIter` #100214

Optimize `array::IntoIter` #100214

scottmcm commented Aug 7, 2022 •

edited

Loading