Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal generated code for testing that a &[u8; 4] contains all zeros #71602

Closed
dbdr opened this issue Apr 27, 2020 · 6 comments
Closed

Suboptimal generated code for testing that a &[u8; 4] contains all zeros #71602

dbdr opened this issue Apr 27, 2020 · 6 comments
Assignees
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@dbdr
Copy link

dbdr commented Apr 27, 2020

I expected comparing a [u8; 4] to [0; 4] to generate code identical to testing a u32 is 0, and indeed it is the case when comparing values. However when dereferencing is involved the code is longer:

pub fn test_u32_ref(data: &u32) -> bool {
    *data == 0
}

pub fn test_u8array_ref(data: &[u8; 4]) -> bool {
    *data == [0; 4]
}

gives:

example::test_u32_ref:
        cmp     dword ptr [rdi], 0
        sete    al
        ret

example::test_u8array_ref:
        lea     rax, [rip + .L__unnamed_1]
        cmp     rdi, rax
        je      .LBB1_1
        cmp     dword ptr [rdi], 0
        sete    al
        ret
.LBB1_1:
        mov     al, 1
        ret

.L__unnamed_1:
        .zero   4

https://godbolt.org/z/3GaUfy

Transmuting to u32 gives the optimized code:

pub fn test_u8array_ref_transmute(data: &[u8; 4]) -> bool {
    let data = unsafe { std::mem::transmute::<[u8; 4], u32>(*data) };
    data == 0
}

This issue has been assigned to @samrat via this comment.

@steffahn
Copy link
Member

As far as I can tell, this comes from an “optimization” in the source code of comparing slices:
https://doc.rust-lang.org/src/core/slice/mod.rs.html#5833

impl<A> SlicePartialEq<A> for [A]
where
    A: PartialEq<A> + BytewiseEquality,
{
    fn equal(&self, other: &[A]) -> bool {
        if self.len() != other.len() {
            return false;
        }
// **** THIS PART vv
        if self.as_ptr() == other.as_ptr() {
            return true;
        }
// **** THIS PART ^^
        unsafe {
            let size = mem::size_of_val(self);
            memcmp(self.as_ptr() as *const u8, other.as_ptr() as *const u8, size) == 0
        }
    }
}

@steffahn
Copy link
Member

If you call the function with (&[0;4]) (godbolt) this saves the need to read anything from memory (not that that’s really worth it, although it technically will be an unaligned read, I don’t know how slow those can be; also with an [u32; 1] it will even get rid of the conditional jump and always execute both things: https://godbolt.org/z/2KwUeQ, perhaps because of better alignment, who knows ^^).

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Apr 27, 2020
@Mark-Simulacrum
Copy link
Member

FWIW, you do not need a transmute, https://doc.rust-lang.org/nightly/std/primitive.u32.html#method.from_ne_bytes will do the same (and is safe).

I'm uncertain how much the ptr-equality check there buys us - obviously, in some cases, it can be a big win, but I suspect the common case in most code isn't comparing literally the same slice.

This was added in #61665 with libs team approval; unfortunately the benchmark results are gone now -- I would expect it to have little impact on the vast majority of cases though, since it's a really cheap comparison in general. In this case though it's obviously suboptimal. Personally I'd probably be in favor of dropping that case from std, if there's cases where it's important for people (e.g. interning slices or so) they can likely wrap things in a custom struct.

@ecstatic-morse ecstatic-morse added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Apr 29, 2020
@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Apr 29, 2020

Adding E-easy since the implementation is simple and explained above. We'll want to benchmark the change before merging it, however. I suspect rustc benefits more than most ecosystem crates from this because it interns slices so frequently.

@samrat
Copy link
Contributor

samrat commented Apr 30, 2020

@rustbot claim

@tesuji
Copy link
Contributor

tesuji commented May 1, 2020

Does rust have const prop for slice?

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 26, 2020
Remove pointer comparison from slice equality

This resurrects rust-lang#71735.

Fixes rust-lang#71602, helps with rust-lang#80140.

r? `@Mark-Simulacrum`
@bors bors closed this as completed in 733cb54 Dec 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
9 participants