Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject OOB shufflevector intrinsics #73542

Closed
RalfJung opened this issue Jun 20, 2020 · 12 comments · Fixed by #76768
Closed

Reject OOB shufflevector intrinsics #73542

RalfJung opened this issue Jun 20, 2020 · 12 comments · Fixed by #76768
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-SIMD Area: SIMD (Single Instruction Multiple Data) E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added.

Comments

@RalfJung
Copy link
Member

Looks like there are some open questions regarding the semantics of shufflevector intrinsics in LLVM:

When the mask is out of bounds, the current semantics are that the resulting element is undef. If shufflevector is used to remove an element from the input vector, then the instruction cannot be removed, as the input might have been poison. The solution is to switch to give poison instead.

So my question is, what are the semantics of those intrinsics in MIR? We had some discussion on Zulip and it seems like it should be one of the two:

  • OOB shufflevector should be statically ruled out. We could even have the new MIR validation pass check that.
  • OOB shufflevector returns "uninitialized memory" for the affected elements. (Following this paper, MIR does not have a poison vs undef distinction; our "uninit" is closest to LLVM's poison.)

The former seems definitely the safest ;) but currently this is not enforced. @hanna-kruppe was concerned that stdarch might rely on OOB shufflevector somewhere. If we have consensus that for now, we should statically rule out OOB indices to avoid UB/uninit here, I guess the main open issue is (a) implementing that check and (b) fixing the fallout, if any.

Cc @rust-lang/lang @rust-lang/libs @gnzlbg @Lokathor (not sure who else to ping for stdarch)

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-SIMD Area: SIMD (Single Instruction Multiple Data) labels Jun 20, 2020
@Lokathor
Copy link
Contributor

Another option is that OOB values wrap to be inbounds.

I think that we should use static denial of bad values, but wrapping is at least possible I suppose.

@RalfJung
Copy link
Member Author

If the indices are always known statically then yes, we could statically wrap them.

@Lokathor
Copy link
Contributor

I mentioned it because that's how intel shuffles are defined to work by intel. The intrinsic looks at particular bits within the control value and ignores the rest of the bits, so they end up being wrapping.

I'm not yet familiar enough with ARM intrinsics to say how the hardware handles it.

@hanna-kruppe
Copy link
Contributor

We can restrict attention to constant shuffle indices. Architectures with "variable" shuffles exist but those capabilities are accessed by separate intrinsics (certainly today, and I think this makes sense in the long term too). However, I don't see any reason to build wrapping into the intrinsics. Hardware behavior is not a good reason IMO, since:

  1. The general shufflevector operation is different from and quite a lot more flexible than many real shuffle instructions, so there's no 1:1 correspondence anyway.
  2. The rustc intrinsics in question are purely internal and perma-unstable, so if index wraparound is desirable in code targeting a particular platform, it can just as well be built into the core::arch functions and other wrappers, without affecting all other platforms.

In addition, although I just said we can focus on constant shuffle indices, I do think it would be nice to have some consistency between the intrinsics with constant indices and those with variable indices (current or future), which is easier to achieve with "hard error"/UB semantics than with implicit wrapping.

@RalfJung
Copy link
Member Author

I mentioned it because that's how intel shuffles are defined to work by intel. The intrinsic looks at particular bits within the control value and ignores the rest of the bits, so they end up being wrapping.

That is not immediately helpful though as LLVM makes this undef/poison. We'd still have to specify and perform the wrapping ourselves.

Architectures with "variable" shuffles exist but those capabilities are accessed by separate intrinsics (certainly today, and I think this makes sense in the long term too).

Do we support something like that in Rust already, and are there any open questions around OOB semantics?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jun 20, 2020

Do we support something like that in Rust already, and are there any open questions around OOB semantics?

I am not sure. Certainly LLVM does not have a target-independent representation of such shuffles today, only target-specific intrinsics whose semantics may be unclear but probably follows the hardware in question (which generally have some defined behavior for those cases, e.g. ARM SVE and RISC-V V make those elements zero). If a target-independent representation is ever introduced, I expect the preferable behavior for LLVM will be "poison in the lanes where the index is OOB" and for Rust/MIR, anything that's at least as strict should be fine.

@Lokathor
Copy link
Contributor

So I'm not familiar with LLVM's details here, is it the case that if a shuffle constant is "OOB" then all SIMD lanes become poison after the operation?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jun 20, 2020

No, it is lane-wise undef (at least according to the quote in the first comment; I haven't verified this).

@Lokathor
Copy link
Contributor

Well, I guess I'm a little lost on when you could have one lane be OOB but another not be OOB.

For example, _mm_shuffle_ps uses two bits per lane index and lets you pick from among 4 lanes when shuffling. It's fairly plain that a two bit value can't be OOB within a 4 lane selection. You could of course set bits above the 8th bit, but that doesn't affect the bits used to pick the source lanes for output lanes 0 through 3.

So am I just looking at the wrong intrinsic when trying to come up with an example? Is there a different intrinsic I should be looking at? or put another way, could you show an example call to any intrinsic with a partial OOB control value?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jun 20, 2020

This is about the simd_shuffleN (for N = 2, 4, 8, ...) intrinsics, which is used in the implementation of many core::arch shuffle/permute intrinsics (including _mm_shuffle_ps). Like I said before, simd_shuffleN is a fair bit more general than many specific instructions (such as shufps), representing the shuffle indices as N i32 constants. See the example linked in the first post for a small self-contained example.

@hanna-kruppe
Copy link
Contributor

Oops, the example I constructed is actually wrong because I forgot about the consecutive numbering of lanes of the two input vectors. Indices 4 and 5 (in that example) aren't OOB, they just refer to the other input vector. Indices that are actually OOB are already rejected: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=c44648853090164522811e89c4937abc

@RalfJung
Copy link
Member Author

Do we have tests that ensure this? I could not find any.

@RalfJung RalfJung reopened this Jun 20, 2020
@RalfJung RalfJung added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Jun 20, 2020
@bors bors closed this as completed in 65e4488 Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-SIMD Area: SIMD (Single Instruction Multiple Data) E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants