Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A more efficient slice comparison implementation for T: !BytewiseEq #116846

Merged
merged 1 commit into from
Jan 9, 2024

Conversation

krtab
Copy link
Contributor

@krtab krtab commented Oct 17, 2023

(This is a follow up PR on #113654)

This PR changes the implementation for [T] slice comparison when T: !BytewiseEq. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that [Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice().

@rustbot
Copy link
Collaborator

rustbot commented Oct 17, 2023

r? @joshtriplett

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 17, 2023
@rust-log-analyzer

This comment has been minimized.

@krtab krtab force-pushed the slice_compare_no_memcmp_opt branch from 9348c33 to 0cc5c97 Compare October 17, 2023 15:15
@asquared31415
Copy link
Contributor

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.

if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

// SAFETY:
// This is sound because:
// - self.len == other.len
// - self.len <= isize::MAX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't true for ZSTs. Though the result still happens to work because bumping zst pointers does nothing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

@krtab
Copy link
Contributor Author

krtab commented Oct 17, 2023

The following code seems to generate identical code as this PR for most types and better code for float types. (also it's safe!) However, I haven't benchmarked it, so maybe there's something I'm not seeing.

if a.len() != b.len() {
    return false;
}

for idx in 0..a.len() {
    if a[idx] != b[idx] {
        return false;
    }
}
true

Oh ! Very nice catch ! A quick check on godbolt shows me that this is only properly optimized since 1.73.0, and I didn''t recheck this code from my previous PR before resubmitting.

I'll update this soon. Thanks.

@asquared31415
Copy link
Contributor

Wow, I'm shocked that this was not as well optimized for that long, this should have been easy enough for the optimizer. Oh well, sometimes they're tricky like that!

I did some history, and of note is that the zip implementation seems to have been created as a more concise way of writing the loop, not for any specific performance reasons: #61665 (comment)

@krtab krtab force-pushed the slice_compare_no_memcmp_opt branch from 0cc5c97 to a70613b Compare October 18, 2023 09:32
@the8472 the8472 assigned the8472 and unassigned joshtriplett Jan 5, 2024
@the8472
Copy link
Member

the8472 commented Jan 5, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024
@bors
Copy link
Contributor

bors commented Jan 5, 2024

⌛ Trying commit a70613b with merge c12f891...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 5, 2024
…=<try>

A more efficient slice comparison implementation for T: !BytewiseEq

(This is a follow up PR on rust-lang#113654)

This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.
@bors
Copy link
Contributor

bors commented Jan 5, 2024

☀️ Try build successful - checks-actions
Build commit: c12f891 (c12f8910a3463d1e5fa69bd857e9253878a9a990)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c12f891): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-1.1%, -0.2%] 12
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.5% [-1.1%, -0.2%] 12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [1.9%, 2.4%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-3.2%, -0.3%] 2
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) 0.2% [-3.2%, 2.4%] 4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-1.6%, -1.1%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.3% [-1.6%, -1.1%] 3

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.4%] 5
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-2.1%, -0.0%] 24
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-2.1%, 0.4%] 29

Bootstrap: 669.803s -> 668.21s (-0.24%)
Artifact size: 311.12 MiB -> 311.14 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 5, 2024
@the8472
Copy link
Member

the8472 commented Jan 7, 2024

The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.

That in itself seems like an issue... ah yes, #100124 last attempted to fix this but that stalled.
In the meantime this does seem fine.

Would you be willing to work out the critical difference in LLVM IR and add a codegen test? That's optional though, I can accept the PR without that.

The previous implementation was not optimized properly by the compiler,
which didn't leverage the fact that both length were equal.
@krtab krtab force-pushed the slice_compare_no_memcmp_opt branch from a70613b to 5b041ab Compare January 8, 2024 15:37
@krtab
Copy link
Contributor Author

krtab commented Jan 8, 2024

I had a look but couldn't figure out a way to characterize the difference between the two IR.
I added a comment hoping to prevent accidental regression.

@the8472
Copy link
Member

the8472 commented Jan 9, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Jan 9, 2024

📌 Commit 5b041ab has been approved by the8472

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 9, 2024
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 9, 2024
@bors
Copy link
Contributor

bors commented Jan 9, 2024

⌛ Testing commit 5b041ab with merge 190f4c9...

@bors
Copy link
Contributor

bors commented Jan 9, 2024

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 190f4c9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 9, 2024
@bors bors merged commit 190f4c9 into rust-lang:master Jan 9, 2024
12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 9, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (190f4c9): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-0.9%, -0.2%] 15
Improvements ✅
(secondary)
-0.6% [-0.6%, -0.6%] 1
All ❌✅ (primary) -0.5% [-0.9%, -0.2%] 15

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.6% [2.9%, 4.6%] 3
Regressions ❌
(secondary)
1.8% [1.8%, 1.8%] 1
Improvements ✅
(primary)
-1.5% [-3.6%, -0.4%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [-3.6%, 4.6%] 6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-1.1%, -0.9%] 2
Improvements ✅
(secondary)
-3.1% [-3.5%, -2.3%] 4
All ❌✅ (primary) -1.0% [-1.1%, -0.9%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 667.74s -> 666.209s (-0.23%)
Artifact size: 308.59 MiB -> 308.59 MiB (0.00%)

@krtab
Copy link
Contributor Author

krtab commented Jan 10, 2024

Thanks @the8472 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants