Move some x86 intrinsics code to helper functions in `shims::x86` #3214

eduardosm · 2023-12-07T11:52:41Z

To make them reusable for intrinsics of other x86 features.

Splitted from #3192

RalfJung · 2023-12-08T07:12:59Z

src/shims/x86/mod.rs

+
+/// Conditionally multiplies the packed floating-point elements in
+/// `left` and `right` using the high 4 bits in `imm`, sums the four
+/// products, and conditionally stores the sum in `dest` using the low


Since this is "conditional", I assume these are not actually four products but up to four products?

That's right

RalfJung · 2023-12-08T07:14:12Z

src/shims/x86/mod.rs

+        let op = this.read_scalar(&op)?.to_uint(op.layout.size)?;
+        let mask = this.read_scalar(&mask)?.to_uint(mask.layout.size)?;
+        all_zero &= op & mask == 0;
+        masked_set &= op & mask == mask;


Suggested change

masked_set &= op & mask == mask;

masked_set &= (op & mask) == mask;

I first read this as op & (mask == mask) and was quite confused...

src/shims/x86/sse41.rs

RalfJung · 2023-12-08T14:02:28Z

src/shims/x86/mod.rs

+        all_zero &= (op & mask) == 0;
+        masked_set &= (op & mask) == mask;
+    }
+
+    Ok(f(all_zero, masked_set))


No that's not what I meant... what I mean was to do acc = f(acc, op, mask) inside the loop. That way the caller then decides how the results get accumulated.

Basically, this is folding in some way over the two vectors, and so the natural general interface is to allow any kind of folding.

No that's not what I meant... what I mean was to do acc = f(acc, op, mask) inside the loop. That way the caller then decides how the results get accumulated.

That would a fn(bool, Scalar, Scalar) -> bool, wouldn't it? Not a fn(bool, bool) -> bool.

Ah yeah -- I might have gotten the signature wrong.

Basically, whatever the type of f is in the current code. Or maybe something a bit more general.

This isn't working correctly for the (op & mask) != 0 && (op & mask) != mask case. We need to first calculate the all_zero and masked_set booleans (for the whole SIMD vector) and then !all_zero && !masked_set.

RalfJung · 2023-12-08T17:43:32Z

src/shims/x86/sse41.rs

+                            |acc, op, mask| {
+                                let op = op.to_scalar().to_uint(op.layout.size)?;
+                                let mask = mask.to_scalar().to_uint(mask.layout.size)?;
+                                Ok(acc && (op & mask) != 0 && (op & mask) != mask)


I think the code will be overall shorter if you move the match down inside the closure; then the let op and let mask can be outside the match.

RalfJung · 2023-12-08T19:36:05Z

Thanks a lot. :)
@bors r+

bors · 2023-12-08T19:36:07Z

📌 Commit 44bf5fc has been approved by RalfJung

It is now in the queue for this repository.

bors · 2023-12-08T19:37:18Z

⌛ Testing commit 44bf5fc with merge a5b9f54...

bors · 2023-12-08T20:20:50Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing a5b9f54 to master...

Fix x86 SSE4.1 ptestnzc Fixes ptestnzc by bringing back the original implementation of #3214. `(op & mask) != 0 && (op & mask) == !ask` need to be calculated for the whole vector. It cannot be calculated for each element and then folded. For example, given * `op = [0b100, 0b010]` * `mask = [0b100, 0b110]` The correct result would be: * `op & mask = [0b100, 0b010]` Comparisons are done on the vector as a whole: * `all_zero = (op & mask) == [0, 0] = false` * `masked_set = (op & mask) == mask = false` * `!all_zero && !masked_set = true` correct result The previous method: * `op & mask = [0b100, 0b010]` Comparisons are done element-wise: * `all_zero = (op & mask) == [0, 0] = [true, true]` * `masked_set = (op & mask) == mask = [true, false]` * `!all_zero && !masked_set = [true, false]` After folding with AND, the final result would be `false`, which is incorrect.

Fix x86 SSE4.1 ptestnzc Fixes ptestnzc by bringing back the original implementation of rust-lang/miri#3214. `(op & mask) != 0 && (op & mask) == !ask` need to be calculated for the whole vector. It cannot be calculated for each element and then folded. For example, given * `op = [0b100, 0b010]` * `mask = [0b100, 0b110]` The correct result would be: * `op & mask = [0b100, 0b010]` Comparisons are done on the vector as a whole: * `all_zero = (op & mask) == [0, 0] = false` * `masked_set = (op & mask) == mask = false` * `!all_zero && !masked_set = true` correct result The previous method: * `op & mask = [0b100, 0b010]` Comparisons are done element-wise: * `all_zero = (op & mask) == [0, 0] = [true, true]` * `masked_set = (op & mask) == mask = [true, false]` * `!all_zero && !masked_set = [true, false]` After folding with AND, the final result would be `false`, which is incorrect.

Implement x86 AVX intrinsics ~Blocked on <https://github.com/rust-lang/miri/pull/3214>~

eduardosm added 2 commits December 7, 2023 12:35

Move unary_op_* functions from shims::x86::sse module to shims::x86

38d5aed

Move round_* functions from shims::x86::sse41 module to shims::x86

18f9bbd

eduardosm mentioned this pull request Dec 7, 2023

Implement x86 AVX intrinsics #3192

Merged

RalfJung reviewed Dec 8, 2023

View reviewed changes

src/shims/x86/sse41.rs Show resolved Hide resolved

eduardosm force-pushed the move-x86-code branch from ebb0c97 to ad5739d Compare December 8, 2023 12:41

RalfJung reviewed Dec 8, 2023

View reviewed changes

eduardosm force-pushed the move-x86-code branch from ad5739d to fd860bd Compare December 8, 2023 16:11

RalfJung reviewed Dec 8, 2023

View reviewed changes

eduardosm added 2 commits December 8, 2023 19:09

Move implementation of SSE4.1 ptest* into a helper function

b1fcba4

Move implementation of SSE4.1 dpps/dppd to helper function

44bf5fc

eduardosm force-pushed the move-x86-code branch from fd860bd to 44bf5fc Compare December 8, 2023 18:10

bors merged commit a5b9f54 into rust-lang:master Dec 8, 2023
7 of 8 checks passed

eduardosm mentioned this pull request Dec 8, 2023

Fix x86 SSE4.1 ptestnzc #3216

Merged

eduardosm deleted the move-x86-code branch December 9, 2023 11:55

bors referenced this pull request Feb 16, 2024

Auto merge of #3192 - eduardosm:x86-avx-intrinsics, r=RalfJung

454f054

Implement x86 AVX intrinsics ~Blocked on <https://github.com/rust-lang/miri/pull/3214>~

RalfJung referenced this pull request in RalfJung/rust Feb 17, 2024

Auto merge of rust-lang#3192 - eduardosm:x86-avx-intrinsics, r=RalfJung

d2a4ef3

Implement x86 AVX intrinsics ~Blocked on <https://github.com/rust-lang/miri/pull/3214>~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move some x86 intrinsics code to helper functions in `shims::x86` #3214

Move some x86 intrinsics code to helper functions in `shims::x86` #3214

eduardosm commented Dec 7, 2023

RalfJung Dec 8, 2023

eduardosm Dec 8, 2023

RalfJung Dec 8, 2023

RalfJung Dec 8, 2023

eduardosm Dec 8, 2023

RalfJung Dec 8, 2023

eduardosm Dec 8, 2023

RalfJung Dec 8, 2023

RalfJung commented Dec 8, 2023

bors commented Dec 8, 2023

bors commented Dec 8, 2023

bors commented Dec 8, 2023

	masked_set &= op & mask == mask;
	masked_set &= (op & mask) == mask;

Move some x86 intrinsics code to helper functions in shims::x86 #3214

Move some x86 intrinsics code to helper functions in shims::x86 #3214

Conversation

eduardosm commented Dec 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Dec 8, 2023

bors commented Dec 8, 2023

bors commented Dec 8, 2023

bors commented Dec 8, 2023

Move some x86 intrinsics code to helper functions in `shims::x86` #3214

Move some x86 intrinsics code to helper functions in `shims::x86` #3214