Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pub use core::simd; #89167

Merged
merged 307 commits into from
Nov 13, 2021
Merged

pub use core::simd; #89167

merged 307 commits into from
Nov 13, 2021

Conversation

workingjubilee
Copy link
Member

@workingjubilee workingjubilee commented Sep 22, 2021

A portable abstraction over SIMD has been a major pursuit in recent years for several programming languages. In Rust, std::arch offers explicit SIMD acceleration via compiler intrinsics, but it does so at the cost of having to individually maintain each and every single such API, and is almost completely unsafe to use. core::simd offers safe abstractions that are resolved to the appropriate SIMD instructions by LLVM during compilation, including scalar instructions if that is all that is available.

core::simd is enabled by the #![portable_simd] nightly feature tracked in #86656 and is introduced here by pulling in the https://github.com/rust-lang/portable-simd repository as a subtree. We built the repository out-of-tree to allow faster compilation and a stochastic test suite backed by the proptest crate to verify that different targets, features, and optimizations produce the same result, so that using this library does not introduce any surprises. As these tests are technically non-deterministic, and thus can introduce overly interesting Heisenbugs if included in the rustc CI, they are visible in the commit history of the subtree but do nothing here. Some tests are introduced via the documentation, but these use deterministic asserts.

There are multiple unsolved problems with the library at the current moment, including a want for better documentation, technical issues with LLVM scalarizing and lowering to libm, room for improvement for the APIs, and so far I have not added the necessary plumbing for allowing the more experimental or libm-dependent APIs to be used. However, I thought it would be prudent to open this for review in its current condition, as it is both usable and it is likely I am going to learn something else needs to be fixed when bors tries this out.

The major types are

  • core::simd::Simd<T, N>
  • core::simd::Mask<T, N>

There is also the LaneCount struct, which, together with the SimdElement and SupportedLaneCount traits, limit the implementation's maximum support to vectors we know will actually compile and provide supporting logic for bitmasks. I'm hoping to simplify at least some of these out of the way as the compiler and library evolve.

workingjubilee and others added 30 commits October 12, 2020 14:31
This reverts commit 3ad356d.
There are no platform intrinsics in the rustc for these functions yet,
so this removes them as a distinct commit for later reversion.
This PR removes the direct linkage to LLVM for trunc and round intrinsics, while replacing that link with rustc's platform intrinsics for floor and ceil functions, namely simd_floor and simd_ceil. Tests that are no longer testable are removed. In doing so it resolves the riscv64gc compilation problems.
min_const_generics ride the train to stable
@rust-log-analyzer

This comment has been minimized.

@workingjubilee
Copy link
Member Author

It looks like we pass thumbv6m-none-eabi and then... fail on thumbv7m-none-eabi?
...what.
And it's specifically on the int16x2_t ones... there is no way out of this except adding a check job for every single Arm target, huh.

…d1f0491954a99'

git-subtree-dir: library/portable-simd
git-subtree-mainline: efd0483
git-subtree-split: 1ce1c64
This enables programmers to use a safe alternative to the current
`extern "platform-intrinsics"` API for writing portable SIMD code.
This is `#![feature(portable_simd)]` as tracked in rust-lang#86656
These tests just verify some basic APIs of core::simd function, and
guarantees that attempting to access the wrong things doesn't work.
The majority of tests are stochastic, and so remain upstream, but
a few deterministic tests arrive in the subtree as doc tests.
@workingjubilee
Copy link
Member Author

This time the conversions available should match exactly with the types in std::arch. 🙈
@bors r=MarkSimulacrum

@bors
Copy link
Contributor

bors commented Nov 13, 2021

📌 Commit 7c3d72d has been approved by MarkSimulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 13, 2021
@bors
Copy link
Contributor

bors commented Nov 13, 2021

⌛ Testing commit 7c3d72d with merge 032dfe4...

@bors
Copy link
Contributor

bors commented Nov 13, 2021

☀️ Test successful - checks-actions
Approved by: MarkSimulacrum
Pushing 032dfe4 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 13, 2021
@bors bors merged commit 032dfe4 into rust-lang:master Nov 13, 2021
@rustbot rustbot added this to the 1.58.0 milestone Nov 13, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (032dfe4): comparison url.

Summary: This change led to very large relevant regressions 😿 in compiler performance.

  • Very large regression in instruction counts (up to 11.8% on full builds of helloworld)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

@therealprof
Copy link
Contributor

It looks like we pass thumbv6m-none-eabi and then... fail on thumbv7m-none-eabi? ...what. And it's specifically on the int16x2_t ones... there is no way out of this except adding a check job for every single Arm target, huh.

Probably because v6m only has scalar operations while v7m has DSP instructions which provide SIMD support for (u)int8x2_t and (u)int16x2_t.

@workingjubilee
Copy link
Member Author

workingjubilee commented Nov 16, 2021

Probably because v6m only has scalar operations

That is not correct. Arm v6-M processors can potentially include DSP instructions. They were introduced as early as Arm v5TE (the Nintendo DS being a somewhat famous example).

while v7m has DSP instructions which provide SIMD support for (u)int8x2_t and (u)int16x2_t.

That is also not the reason why the build failed.

While I mused aloud at first, there is no reason to speculate at this stage in the game. The target definition rustc uses for thumbv7m-none-eabi does not include the "dsp" feature by default, which is the problem, and the #[cfg] used in core::arch only exposes the typedefs for "M-class" Arm processors which also enable the DSP flag.

My first attempt was to simply restrict all the core::arch type conversions to v7 processors or later, which, at my first reading (and testing the build for a few Arm targets) seemed correct and sufficient, and involved the minimum possible change. On our side, there was still some debate as to what we should do. As I did not exhaust the possibility space, I missed the Arm v7-M case, where the architecture version was sufficient to pass the cfg I used and attempt to build the conversions but failed on not actually having them present (which is why it suggested the hypothetically-available Neon typedefs instead).

The second try which resolved the problem required more rigorously examining the core::arch code in question, and we have developed an inclination to move the relevant type conversion code out of core::simd in the future, since it is clear the code is at risk of becoming incorrect if anything in core::arch changes anyways.

@therealprof
Copy link
Contributor

That is not correct. Arm v6-M processors can potentially include DSP instructions. They were introduced as early as Arm v5TE (the Nintendo DS being a somewhat famous example).

Not sure how ARM v6-M and ARM v5TE are related. v6-m definitely can not have the DSP extension. I was wrong about v7m being able to use DSP instructions, too: For some reason I thought it would be optional in that architecture but actually it's only in Cortex-M4 and up (which is v7-EM) and there it is a standard feature.

LLVM can resolve SIMD types internally quite nicely, if the target doesn't support any operations on them it will simply turn them into scalar code and there're also some cases where it will use DSP (or other SIMD) instructions (if available) on scalar types if benefitial, e.g. on targets with the DSP instruction it can turn saturating operations on i8/u8/i16/u16 types into SIMD types ignoring remainder of the register.

This is all a bit shaky though and I totally expect that using core::simd will potentially expose some fancy bugs there.

Glad you hear you managed to get it sorted out and thanks for working on this important topic. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.