-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move some x86 intrinsics code to helper functions in shims::x86
#3214
Conversation
src/shims/x86/mod.rs
Outdated
|
||
/// Conditionally multiplies the packed floating-point elements in | ||
/// `left` and `right` using the high 4 bits in `imm`, sums the four | ||
/// products, and conditionally stores the sum in `dest` using the low |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is "conditional", I assume these are not actually four products but up to four products?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right
src/shims/x86/mod.rs
Outdated
let op = this.read_scalar(&op)?.to_uint(op.layout.size)?; | ||
let mask = this.read_scalar(&mask)?.to_uint(mask.layout.size)?; | ||
all_zero &= op & mask == 0; | ||
masked_set &= op & mask == mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
masked_set &= op & mask == mask; | |
masked_set &= (op & mask) == mask; |
I first read this as op & (mask == mask)
and was quite confused...
ebb0c97
to
ad5739d
Compare
src/shims/x86/mod.rs
Outdated
all_zero &= (op & mask) == 0; | ||
masked_set &= (op & mask) == mask; | ||
} | ||
|
||
Ok(f(all_zero, masked_set)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that's not what I meant... what I mean was to do acc = f(acc, op, mask)
inside the loop. That way the caller then decides how the results get accumulated.
Basically, this is folding in some way over the two vectors, and so the natural general interface is to allow any kind of folding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that's not what I meant... what I mean was to do
acc = f(acc, op, mask)
inside the loop. That way the caller then decides how the results get accumulated.
That would a fn(bool, Scalar, Scalar) -> bool
, wouldn't it? Not a fn(bool, bool) -> bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah -- I might have gotten the signature wrong.
Basically, whatever the type of f
is in the current code. Or maybe something a bit more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't working correctly for the (op & mask) != 0 && (op & mask) != mask
case. We need to first calculate the all_zero
and masked_set
booleans (for the whole SIMD vector) and then !all_zero && !masked_set
.
ad5739d
to
fd860bd
Compare
src/shims/x86/sse41.rs
Outdated
|acc, op, mask| { | ||
let op = op.to_scalar().to_uint(op.layout.size)?; | ||
let mask = mask.to_scalar().to_uint(mask.layout.size)?; | ||
Ok(acc && (op & mask) != 0 && (op & mask) != mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code will be overall shorter if you move the match down inside the closure; then the let op
and let mask
can be outside the match
.
fd860bd
to
44bf5fc
Compare
Thanks a lot. :) |
☀️ Test successful - checks-actions |
Fix x86 SSE4.1 ptestnzc Fixes ptestnzc by bringing back the original implementation of #3214. `(op & mask) != 0 && (op & mask) == !ask` need to be calculated for the whole vector. It cannot be calculated for each element and then folded. For example, given * `op = [0b100, 0b010]` * `mask = [0b100, 0b110]` The correct result would be: * `op & mask = [0b100, 0b010]` Comparisons are done on the vector as a whole: * `all_zero = (op & mask) == [0, 0] = false` * `masked_set = (op & mask) == mask = false` * `!all_zero && !masked_set = true` correct result The previous method: * `op & mask = [0b100, 0b010]` Comparisons are done element-wise: * `all_zero = (op & mask) == [0, 0] = [true, true]` * `masked_set = (op & mask) == mask = [true, false]` * `!all_zero && !masked_set = [true, false]` After folding with AND, the final result would be `false`, which is incorrect.
Fix x86 SSE4.1 ptestnzc Fixes ptestnzc by bringing back the original implementation of rust-lang/miri#3214. `(op & mask) != 0 && (op & mask) == !ask` need to be calculated for the whole vector. It cannot be calculated for each element and then folded. For example, given * `op = [0b100, 0b010]` * `mask = [0b100, 0b110]` The correct result would be: * `op & mask = [0b100, 0b010]` Comparisons are done on the vector as a whole: * `all_zero = (op & mask) == [0, 0] = false` * `masked_set = (op & mask) == mask = false` * `!all_zero && !masked_set = true` correct result The previous method: * `op & mask = [0b100, 0b010]` Comparisons are done element-wise: * `all_zero = (op & mask) == [0, 0] = [true, true]` * `masked_set = (op & mask) == mask = [true, false]` * `!all_zero && !masked_set = [true, false]` After folding with AND, the final result would be `false`, which is incorrect.
Implement x86 AVX intrinsics ~Blocked on <https://github.com/rust-lang/miri/pull/3214>~
Implement x86 AVX intrinsics ~Blocked on <https://github.com/rust-lang/miri/pull/3214>~
To make them reusable for intrinsics of other x86 features.
Splitted from #3192