Skip to content

Commit

Permalink
Merge pull request #471 from Demindiro/x86_64-fix-recursive-memcmp
Browse files Browse the repository at this point in the history
  • Loading branch information
Amanieu authored Jun 12, 2022
2 parents d79fa6e + 3c67f04 commit 364af45
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion src/mem/x86_64.rs
Original file line number Diff line number Diff line change
Expand Up @@ -143,5 +143,16 @@ pub unsafe fn compare_bytes(a: *const u8, b: *const u8, n: usize) -> i32 {
let c8 = |a: *const u64, b, n| cmp(a, b, n, c4);
let c16 = |a: *const u128, b, n| cmp(a, b, n, c8);
let c32 = |a: *const [u128; 2], b, n| cmp(a, b, n, c16);
c32(a.cast(), b.cast(), n)
// [u128; 2] internally uses raw_eq for comparisons, which may emit a call to memcmp
// above a certain size threshold. When SSE2 is enabled this threshold does not seem
// to be reached but without SSE2 a call is emitted, leading to infinite recursion.
//
// While replacing [u128; 2] with (u128, u128) fixes the issues it degrades performance
// severely. Likewise, removing c32() has a lesser but still significant impact. Instead the
// [u128; 2] case is only enabled when SSE2 is present.
if cfg!(target_feature = "sse2") {
c32(a.cast(), b.cast(), n)
} else {
c16(a.cast(), b.cast(), n)
}
}

0 comments on commit 364af45

Please sign in to comment.