Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_MM_SHUFFLE has incorrect return type #522

Open
Paul-E opened this issue Jul 12, 2018 · 25 comments
Open

_MM_SHUFFLE has incorrect return type #522

Paul-E opened this issue Jul 12, 2018 · 25 comments

Comments

@Paul-E
Copy link

Paul-E commented Jul 12, 2018

_MM_SHUFFLE was added recently in #479 . Currently _MM_SHUFFLE returns a u32, but the functions it's output is used with take an i32, eg _mm_shuffle_epi32.

Although normally this isn't an issue as you could just transmute the results, the input to _mm_shuffle_epi32 requires the shuffle argument to be constant.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 12, 2018

cc @bitshifter

@bitshifter
Copy link
Contributor

Oops. Will look into it.

@alexcrichton
Copy link
Member

Hm it also looks like this isn't unsafe and it's also const, could it follow the existing intrisnics and be both unsafe and non-const?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 12, 2018

@alexcrichton this is a helper macro in C. I think that making it const (or a macro in Rust) is not only fine, but necessary, since its result needs to be passed to other intrinsics taking immediate mode arguments, which only accept constant expressions (so if we don't make this const, or a macro, this becomes unusable).

Also, this is not an intrinsic per se, in the sense that it does not require target features (it is a helper macro in C), so I don't know why it should be unsafe. We map other helper macros to just const variables, the difference here is that this helper macro takes some arguments, but that's about it.

@alexcrichton
Copy link
Member

Hm ok if that's the case can it be marked as unstable for now? I'd prefer to not stabilize this behavior immediately personally

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 12, 2018

I'd prefer to not stabilize this behavior immediately personally

Sure.


FWIW, I think we screwed up. _MM_SHUFFLE is sometimes the recommended way of using the SSE and AVX shuffle intrinsics, but some of the Rust intrinsics take an u32 (e.g. _mm_shuffle_ps, which is a bug, this should be i32) , and all others take an i32 (_mm256_shuffle_ps, _mm_shuffle_pi{16,32}, ...).

Ideally, we should do a breaking change and fix the argument type of _mm_shuffle_ps. The alternative is to make _MM_SHUFFLE! a macro, so that the argument type can be infered from the expansion...

If we don't fix the argument type of _mm_shuffle_ps, changing the return type of _MM_SHUFFLE would just mean that one still would need transmute for using it with _mm_shuffle_ps (that's better than needing a transmute with all other intrinsics though).

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 12, 2018

FYI, the bug of _mm_shuffle_ps taking an u32 instead of i32 is not a Rust bug, but a bug in the intel intrinsics guide, as in, all other shuffle intrinsics take int, but that one takes unsigned int for some reason. In C that wouldn't really matter because implicit conversions, but in Rust that doesn't work...

Making _MM_SHUFFLE a macro appears to be the best way forward to me.

@alexcrichton
Copy link
Member

For now I'd leave this as unstable and maybe publish a crate on crates.io with the fixed intrisnics and/or a macro? I think we'll want to avoid for now exporting more macros from the standard library (stability is a hard thing there)

@bitshifter
Copy link
Contributor

This did start out as a macro as it is in C but I couldn't work out how to declare it in coresimd and rexport it to stdsimd.

So I'm not sure what the solution should be here.

@alexcrichton
Copy link
Member

I think let's originally start out with destabilizing the function and go from there? @bitshifter would you be up for sending that PR?

@bitshifter
Copy link
Contributor

Yep sure.

@Paul-E
Copy link
Author

Paul-E commented Aug 9, 2018

I think we'll want to avoid for now exporting more macros from the standard library (stability is a hard thing there)

I don't understand why we want to limit the number of macros in the standard library. Is the issue that the same macro's interface might not be stable from one rust release to the another?

@alexcrichton
Copy link
Member

@Paul-E oh it's mostly related to stability where macros have had to be insta-stable in the past and we can't have an unstable macro exported from libstd. Times may have changed though!

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 10, 2018

@alexcrichton so we can't mark macros as requiring a feature flag? Do you know who might know the state of support for this? (nrc maybe ?)

@Paul-E
Copy link
Author

Paul-E commented Aug 11, 2018

@nrc

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 12, 2018

I think we should just add _MM_SHUFFLE with a -> i32 return type. The inconvenience of doing a transmute for shuffle_ps is minimal, and adding a wrapper that does it for you is trivial and can be done in any stable crate.

A macro is a particular bad hack for this, the typed_simd crate contains wrappers about most simd intrinsics that use the much nicer portable vector types, we can fix _MM_SHUFFLE and the types of the other intrinsics there.

@Paul-E
Copy link
Author

Paul-E commented Aug 12, 2018

In the case _MM_SHUFFLE stays a function it would be nice to have rust-lang/rust/issues/49450 eventually.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 12, 2018 via email

@TheDan64
Copy link

TheDan64 commented Nov 2, 2018

Are there still plans to fix _mm_shuffle_ps's mask param to an i32? It is stable, but it seems like it's worth making a breaking change here.

Also, _MM_SHUFFLE seems to be a function returning a u32, despite almost all of the mask fields being a const i32s, making it a bit annoying to use having to always add a cast

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 3, 2018

@alexcrichton could we do a crater run of changing the argument type of _mm_shuffle_ps from u32 to i32 ? That would allow us to add _MM_SHUFFLE as a const fn instead of a macro, and would make the arguments of all shuffle intrinsics consistent.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 5, 2018

So we did a crater run (#586) and it appears that nothing breaks due to changing the argument type of _mm_shuffle_ps.

That does not mean that we should do that change. To summarize the changes, there is a bug in the Intel intrinsics guide: the _mm_shuffle_ps's mask argument is an u32 instead of an i32 (like for all other shuffle intrinsics). Currently, core::arch is "correct" here in the sense that we match the types of the Intel Intrinsic guide exactly, even when these are "buggy".

There are other intrinsics in the library that are buggy too, e.g., @GabrielMajeri pointed out that _rdtsc returns signed integers (which does not make sense at all) in #559 and Intel has acknowledge this as another bug in the spec.

We haven't fixed any of these bugs because:

  • they are API breaking changes
  • they fix no soundness issues - we have, however, performed breaking changes to fix argument types that do fix soundness issues (e.g. ptr write intrinsics taking a *const T instead of a *mut T).
  • they are easy to work around: as i32 would do the trick for the mask argument of _mm_shuffle_ps here

Our stance here has been that if these bugs annoy you enough, then just write a wrapper, put it in a crate, and use that instead.

However, this does not mean that these are not bugs, these bugs are annoying, they cost time to our users and to us, etc.

So I see a couple of ways to proceed here:

  • do nothing: stick to "no bugfixes unless soundness bug" and recommend users using third party wrappers to work around these - we could at some point go as far as deprecating std::arch and offer an officially supported and properly versioned wrapper in the nursery / rust-lang org, etc. for people to use that contains all of these bug fixes.
  • fix these bugs: if Intel has acknowledge them as bugs or some other policy. This means that code targeting a certain Rust version that uses these intrinsics will break when upgrading, build.rs scripts will be needed to work around this, etc. so a properly-versioned wrapper would be required here anyways even if we do fix these bugs.

So honestly, I don't think we should fix these bugs, breaking Rust users is not worth the value these fixes add, and a wrapper can add the same value if done right.

in particular, the intent of the RFC was never for core::arch to be used everywhere, but for people to use it through safe wrappers. Arguably, most usage of core::arch right now is people writing their own kind of wrappers over it, so the RFC wasn't really wrong about that, and those writing wrappers can live with these annoyances. Breaking Rust across versions is IMO not worth fixing these.

In the particular case of _MM_SHUFFLE, I think we should make it a const fn that returns i32 so that it works with most _mm_shuffle intrinsics as is, even if using it with _mm_shuffle_ps requires an as i32 cast (this is easy to discover and easy to work around), and to just go and stabilize that. EDIT: there is now a PR that does just this #588

In general, we should open an issue to track these types of bugs (EDIT: I've opened #587 for that), and add new ones there as they pop up so that anyone that wants to write a wrapper knows where to start.

@alexcrichton
Copy link
Member

Thanks for writing that up @gnzlbg! I think I agree with everything, although I'd perhaps personally take a less hard stance on not fixing bugs. I think that, like soundness issues, fixing bugs in intrinsics is worth it and should be pursued if there's enough motivation. There's thousands of intrinsics and likely more bugs in the spec than we've already discovered, so this is quite likely to keep coming up.

I think if we do crater runs, message changes, and work quickly with any unexpected fallout we can continue to make minor tweaks where necessary.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 5, 2018

although I'd perhaps personally take a less hard stance on not fixing bugs.

Maybe we can have a mini-FCP on this with the libs team? If they sign off on fixing these bugs I'd be fine with it. Maybe we can fix them as long as crater doesn't return any breakage, or minor breakage that we can send PRs to workaround (e.g. if its a single crate, and the author accepts PRs, then we might be able to avoid breakage, etc.).

@alexcrichton
Copy link
Member

Certainly! I do think it's good to have some requirements for breakage of course, and those could be something like:

  • A PR on stdsimd showing what the breaking change is (passing all CI)
  • A crater run using this PR showing either:
    • No breakage
    • Very small breakage and active communication and buy-in from crates which broke
  • (maybe) A thread on Intel's forums asking about the bug and seeing if there's a clarification
    • (maybe) Confirmation from Intel it's a bug

(something like that)

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jul 5, 2019
Pkgsrc changes:
 * NetBSD/sparc64 disabling of "packed" removed ("packed" removed upstream)
 * Adapt src_libstd_build.rs patch, update sed'ing of .cargo-checksum.json

Build verified on NetBSD 8.0/amd64.

Upstream changes:

Version 1.36.0 (2019-07-04)
==========================

Language
--------
- [Non-Lexical Lifetimes are now enabled on the 2015 edition.][59114]
- [The order of traits in trait objects no longer affects the semantics of that
  object.][59445] e.g. `dyn Send + fmt::Debug` is now equivalent to
  `dyn fmt::Debug + Send`, where this was previously not the case.

Libraries
---------
- [`HashMap`'s implementation has been replaced with `hashbrown::HashMap` implem
entation.][58623]
- [`TryFromSliceError` now implements `From<Infallible>`.][60318]
- [`mem::needs_drop` is now available as a const fn.][60364]
- [`alloc::Layout::from_size_align_unchecked` is now available as a const fn.][6
0370]
- [`String` now implements `BorrowMut<str>`.][60404]
- [`io::Cursor` now implements `Default`.][60234]
- [Both `NonNull::{dangling, cast}` are now const fns.][60244]
- [The `alloc` crate is now stable.][59675] `alloc` allows you to use a subset
  of `std` (e.g. `Vec`, `Box`, `Arc`) in `#![no_std]` environments if the
  environment has access to heap memory allocation.
- [`String` now implements `From<&String>`.][59825]
- [You can now pass multiple arguments to the `dbg!` macro.][59826] `dbg!` will
  return a tuple of each argument when there is multiple arguments.
- [`Result::{is_err, is_ok}` are now `#[must_use]` and will produce a warning if
  not used.][59648]

Stabilized APIs
---------------
- [`VecDeque::rotate_left`]
- [`VecDeque::rotate_right`]
- [`Iterator::copied`]
- [`io::IoSlice`]
- [`io::IoSliceMut`]
- [`Read::read_vectored`]
- [`Write::write_vectored`]
- [`str::as_mut_ptr`]
- [`mem::MaybeUninit`]
- [`pointer::align_offset`]
- [`future::Future`]
- [`task::Context`]
- [`task::RawWaker`]
- [`task::RawWakerVTable`]
- [`task::Waker`]
- [`task::Poll`]

Cargo
-----
- [Cargo will now produce an error if you attempt to use the name of a required
dependency as a feature.][cargo/6860]
- [You can now pass the `--offline` flag to run cargo without accessing the netw
ork.][cargo/6934]

You can find further change's in [Cargo's 1.36.0 release notes][cargo-1-36-0].

Clippy
------
There have been numerous additions and fixes to clippy, see [Clippy's 1.36.0 rel
ease notes][clippy-1-36-0] for more details.

Misc
----

Compatibility Notes
-------------------
- [`std::arch::x86::_rdtsc` returns `u64` instead of `i64`][stdsimd/559]
- [`std::arch::x86_64::_mm_shuffle_ps` takes an `i32` instead of `u32` for `mask
`][stdsimd/522]
- With the stabilisation of `mem::MaybeUninit`, `mem::uninitialized` use is no
  longer recommended, and will be deprecated in 1.38.0.

[60318]: rust-lang/rust#60318
[60364]: rust-lang/rust#60364
[60370]: rust-lang/rust#60370
[60404]: rust-lang/rust#60404
[60234]: rust-lang/rust#60234
[60244]: rust-lang/rust#60244
[58623]: rust-lang/rust#58623
[59648]: rust-lang/rust#59648
[59675]: rust-lang/rust#59675
[59825]: rust-lang/rust#59825
[59826]: rust-lang/rust#59826
[59445]: rust-lang/rust#59445
[59114]: rust-lang/rust#59114
[cargo/6860]: rust-lang/cargo#6860
[cargo/6934]: rust-lang/cargo#6934
[`VecDeque::rotate_left`]: https://doc.rust-lang.org/std/collections/struct.VecDeque.html#method.rotate_left
[`VecDeque::rotate_right`]: https://doc.rust-lang.org/std/collections/struct.VecDeque.html#method.rotate_right
[`Iterator::copied`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#tymethod.copied
[`io::IoSlice`]: https://doc.rust-lang.org/std/io/struct.IoSlice.html
[`io::IoSliceMut`]: https://doc.rust-lang.org/std/io/struct.IoSliceMut.html
[`Read::read_vectored`]: https://doc.rust-lang.org/std/io/trait.Read.html#method.read_vectored
[`Write::write_vectored`]: https://doc.rust-lang.org/std/io/trait.Write.html#method.write_vectored
[`str::as_mut_ptr`]: https://doc.rust-lang.org/std/primitive.str.html#method.as_mut_ptr
[`mem::MaybeUninit`]: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html
[`pointer::align_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.align_offset
[`future::Future`]: https://doc.rust-lang.org/std/future/trait.Future.html
[`task::Context`]: https://doc.rust-lang.org/beta/std/task/struct.Context.html
[`task::RawWaker`]: https://doc.rust-lang.org/beta/std/task/struct.RawWaker.html
[`task::RawWakerVTable`]: https://doc.rust-lang.org/beta/std/task/struct.RawWakerVTable.html
[`task::Waker`]: https://doc.rust-lang.org/beta/std/task/struct.Waker.html
[`task::Poll`]: https://doc.rust-lang.org/beta/std/task/enum.Poll.html
[clippy-1-36-0]: https://github.com/rust-lang/rust-clippy/blob/master/CHANGELOG.md#rust-136
[cargo-1-36-0]: https://github.com/rust-lang/cargo/blob/master/CHANGELOG.md#cargo-136-2019-07-04
[stdsimd/522]: rust-lang/stdarch#522
[stdsimd/559]: rust-lang/stdarch#559
@aloucks
Copy link

aloucks commented May 16, 2020

The arguments should probably also be renamed to match xmmintrin.h.

pub fn _MM_SHUFFLE(z: u32, y: u32, x: u32, w: u32) -> i32
#define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
 (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants