A more efficient slice comparison implementation for T: !BytewiseEq #116846

krtab · 2023-10-17T14:49:54Z

(This is a follow up PR on #113654)

This PR changes the implementation for [T] slice comparison when T: !BytewiseEq. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that [Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice().

rustbot · 2023-10-17T14:50:03Z

r? @joshtriplett

(rustbot has picked a reviewer for you, use r? to override)

asquared31415 · 2023-10-17T17:42:34Z

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.

if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

the8472 · 2023-10-17T19:35:43Z

library/core/src/slice/cmp.rs

+        // SAFETY:
+        // This is sound because:
+        // - self.len == other.len
+        // - self.len <= isize::MAX


That isn't true for ZSTs. Though the result still happens to work because bumping zst pointers does nothing

Thanks for the review!

krtab · 2023-10-17T21:38:03Z

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.
if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

Oh ! Very nice catch ! A quick check on godbolt shows me that this is only properly optimized since 1.73.0, and I didn''t recheck this code from my previous PR before resubmitting.

I'll update this soon. Thanks.

asquared31415 · 2023-10-18T00:06:07Z

Wow, I'm shocked that this was not as well optimized for that long, this should have been easy enough for the optimizer. Oh well, sometimes they're tricky like that!

I did some history, and of note is that the zip implementation seems to have been created as a more concise way of writing the loop, not for any specific performance reasons: #61665 (comment)

the8472 · 2024-01-05T10:00:59Z

@bors try @rust-timer queue

bors · 2024-01-05T10:03:19Z

⌛ Trying commit a70613b with merge c12f891...

…=<try> A more efficient slice comparison implementation for T: !BytewiseEq (This is a follow up PR on rust-lang#113654) This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.

bors · 2024-01-05T11:29:52Z

☀️ Try build successful - checks-actions
Build commit: c12f891 (c12f8910a3463d1e5fa69bd857e9253878a9a990)

rust-timer · 2024-01-05T13:40:04Z

Finished benchmarking commit (c12f891): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-1.1%, -0.2%]	12
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.5%	[-1.1%, -0.2%]	12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.1%	[1.9%, 2.4%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.7%	[-3.2%, -0.3%]	2
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	0.2%	[-3.2%, 2.4%]	4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.3%	[-1.6%, -1.1%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.3%	[-1.6%, -1.1%]	3

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.4%]	5
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-2.1%, -0.0%]	24
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-2.1%, 0.4%]	29

Bootstrap: 669.803s -> 668.21s (-0.24%)
Artifact size: 311.12 MiB -> 311.14 MiB (0.01%)

the8472 · 2024-01-07T08:49:59Z

The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

That in itself seems like an issue... ah yes, #100124 last attempted to fix this but that stalled.
In the meantime this does seem fine.

Would you be willing to work out the critical difference in LLVM IR and add a codegen test? That's optional though, I can accept the PR without that.

The previous implementation was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

krtab · 2024-01-08T15:38:21Z

I had a look but couldn't figure out a way to characterize the difference between the two IR.
I added a comment hoping to prevent accidental regression.

the8472 · 2024-01-09T20:31:24Z

@bors r+

bors · 2024-01-09T20:31:26Z

📌 Commit 5b041ab has been approved by the8472

It is now in the queue for this repository.

bors · 2024-01-09T20:52:37Z

⌛ Testing commit 5b041ab with merge 190f4c9...

bors · 2024-01-09T22:49:43Z

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 190f4c9 to master...

rust-timer · 2024-01-10T00:02:35Z

Finished benchmarking commit (190f4c9): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-0.9%, -0.2%]	15
Improvements ✅ (secondary)	-0.6%	[-0.6%, -0.6%]	1
All ❌✅ (primary)	-0.5%	[-0.9%, -0.2%]	15

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.6%	[2.9%, 4.6%]	3
Regressions ❌ (secondary)	1.8%	[1.8%, 1.8%]	1
Improvements ✅ (primary)	-1.5%	[-3.6%, -0.4%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.0%	[-3.6%, 4.6%]	6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-1.1%, -0.9%]	2
Improvements ✅ (secondary)	-3.1%	[-3.5%, -2.3%]	4
All ❌✅ (primary)	-1.0%	[-1.1%, -0.9%]	2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 667.74s -> 666.209s (-0.23%)
Artifact size: 308.59 MiB -> 308.59 MiB (0.00%)

krtab · 2024-01-10T13:02:53Z

Thanks @the8472 👍

rustbot assigned joshtriplett Oct 17, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 17, 2023

krtab mentioned this pull request Oct 17, 2023

Non memcmp slice comparison optimization #113654

Closed

This comment has been minimized.

Sign in to view

krtab force-pushed the slice_compare_no_memcmp_opt branch from 9348c33 to 0cc5c97 Compare October 17, 2023 15:15

the8472 reviewed Oct 17, 2023

View reviewed changes

krtab force-pushed the slice_compare_no_memcmp_opt branch from 0cc5c97 to a70613b Compare October 18, 2023 09:32

the8472 assigned the8472 and unassigned joshtriplett Jan 5, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024

A more efficient slice comparison implementation for T: !BytewiseEq

5b041ab

The previous implementation was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

krtab force-pushed the slice_compare_no_memcmp_opt branch from a70613b to 5b041ab Compare January 8, 2024 15:37

bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 9, 2024

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 9, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 9, 2024

bors merged commit 190f4c9 into rust-lang:master Jan 9, 2024
12 checks passed

rustbot added this to the 1.77.0 milestone Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A more efficient slice comparison implementation for T: !BytewiseEq #116846

A more efficient slice comparison implementation for T: !BytewiseEq #116846

krtab commented Oct 17, 2023 •

edited

Loading

rustbot commented Oct 17, 2023

This comment has been minimized.

asquared31415 commented Oct 17, 2023

the8472 Oct 17, 2023

krtab Oct 17, 2023

krtab commented Oct 17, 2023

asquared31415 commented Oct 18, 2023

the8472 commented Jan 5, 2024

This comment has been minimized.

bors commented Jan 5, 2024

bors commented Jan 5, 2024

This comment has been minimized.

rust-timer commented Jan 5, 2024

the8472 commented Jan 7, 2024

krtab commented Jan 8, 2024

the8472 commented Jan 9, 2024

bors commented Jan 9, 2024

bors commented Jan 9, 2024

bors commented Jan 9, 2024

rust-timer commented Jan 10, 2024

krtab commented Jan 10, 2024

A more efficient slice comparison implementation for T: !BytewiseEq #116846

A more efficient slice comparison implementation for T: !BytewiseEq #116846

Conversation

krtab commented Oct 17, 2023 • edited Loading

rustbot commented Oct 17, 2023

This comment has been minimized.

asquared31415 commented Oct 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krtab commented Oct 17, 2023

asquared31415 commented Oct 18, 2023

the8472 commented Jan 5, 2024

This comment has been minimized.

bors commented Jan 5, 2024

bors commented Jan 5, 2024

This comment has been minimized.

rust-timer commented Jan 5, 2024

Overall result: ✅ improvements - no action needed

the8472 commented Jan 7, 2024

krtab commented Jan 8, 2024

the8472 commented Jan 9, 2024

bors commented Jan 9, 2024

bors commented Jan 9, 2024

bors commented Jan 9, 2024

rust-timer commented Jan 10, 2024

Overall result: ✅ improvements - no action needed

krtab commented Jan 10, 2024

krtab commented Oct 17, 2023 •

edited

Loading