-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more ARM SIMD intrinsics #792
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @gnzlbg (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
This is in relation to #148 to catch up a bit on the delta. |
I believe it is:
|
that does fail with quite some errors sadly, not it did catch two syntax errors however! :D |
Did it before this PR? Those errors may be caused by the recent LLVM update. |
Something goes wrong before that I get errors like:
depsite rustup saying I have the toolchain installed:
|
Don't get me wrong, I know I won't be able to run the tests locally, but I was hoping to be able to compile it before tossing it on something slow to test. |
Ah I got that part it was:
|
A question, I'm planning to add some more intrinsics, namely this list (with it's variants): simd-lite/simd-json#32 (comment) Is it preferable to do this in one large PR or in smaller somewhat topiced ones (like |
09d1d12
to
e31f55a
Compare
As you wish. |
The following tests are still failing on my tests - I'm entirely unsure why. All of them share one thing in common: they have neither simd_* intrinsics nor llvm intrinsics to link to. That means it's up the programmer to figure out what combination of code causes simd to spit out the right command based on the ARM docs. I will go into them below section by section for some details.
The lane get commands - I brought them up earlier and every combiation of code I try ends up with something other then
This is quite the bugger, it combines two vectors but in simd* are structs. I tried a big match block but it doesn't like it a lot.
This translates to
For reasons unknown llvm changes |
Experimenting around I made an other interesting observation: using a array instead of a struct results in other (less) instructions. |
iximeow on twitter came to the rescue with shr: https://twitter.com/iximeow/status/1159935494202908672 It looks like shr take only Imitate values not registers - that clears up why the code compiles to what it does. Sadly I'm still unsure how to fix that. |
The same goes for |
You can take a look at stdarch/crates/core_arch/src/x86/sse.rs Lines 2409 to 2416 in 933a5e0
for a fix for the constant value problem. |
I've been experimenting with different approaches to the vget_lane code and what I get out of it is still quite odd. using the same code different targets return different. Am I missing something? |
The CI currently fails with:
can someone shed light on what that means? |
The type stdarch/crates/stdarch-verify/src/lib.rs Line 178 in de9a8ae
so it cant verify that the signatures of the intrinsics declared by you are correct. As a side note I noticed a spell error in the error: unspported. |
70e8146
to
3a04eea
Compare
Note I've removed the following (failing) intrinsics as I'm stuck on them and don't want to hold up the PR unnessesarily since I think there are already some good and useful additions in it as it stands :)
|
* feat: neon support * feat: temp stub replacements for neon intrinsics (pending rust-lang/stdarch#792) * fix: drone CI rustup nightly * feat: fix guards, use rust stdlib for bit count operations * fix: remove double semicolon * feat: fancy generic generator functions, thanks @Licenser
Looking at the CI errors, it seems that you're not handling the difference properly:
Here you are looking for the AArch64 instruction ( |
The code generation changed since the original PR, there were problems with wrong codes being generated so I suspect something fixed it don't ask me what or why no clue :) but I'll update the list. |
Okay got some kind of local reproduction I'll dig through them :) might take a bit
😭 |
By the way, you should include the code generator in the repository. Otherwise it will be difficult to modify the generated intrinsics or add new ones. |
I agree it would be nice to have the code generator in the repo, I had it all set including a Since I was burned with that a few times in the PR before (see the 200something comments above :/), so I want to make sure that what I'll put in is what is desired. Can you take a look at the generator: https://github.com/simd-lite/simd-lite and say if you're OK with it? Do you prefer it as a build.rs script that generates the file during compile time or would you rather have a sub crate for the generator that is called manually to update the generated code, or something entirely different? |
I would prefer avoiding build.rs because |
Will do! not sure how much I time I get prior to the weekend but I'll start cleaning things up then, having the generator in create will make it easier! |
☔ The latest upstream changes (presumably 1a577bd) made this pull request unmergeable. Please resolve the merge conflicts. |
9020b83
to
2a9aa70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a header to the generated code saying something along the lines of:
// This code is automatically generated. DO NOT MODIFY.
//
// Instead, modify <path to neon.spec> and run the following command to re-generate this file:
// <command to re-generate>
So I'm a bit stumped on this one: pub unsafe fn vget_lane_u64(v: uint64x1_t, imm5: i32) -> u64 {
if imm5 != 0 {
unreachable_unchecked()
}
simd_extract(v, 0)
} on aarch64 this [should] (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491c/BABJFCGC.html) generate a vmov instruction but it it seems to use fmov. I'm not sure there is a actual difference here given we move a 64 value into a 64 bit value but I wanted to double check.
|
The reference you are using is only giving you the names for the ARM instructions, not the AArch64 ones (which are completely different). As a general rule you can tell the difference by looking at the first letter of the instruction: on ARM all NEON/VFP instructions start with a |
In summary: |
for aarch64: which states |
Yes, basically |
awesome thanks :) I was hoping it was something along the line :D |
It looks like the issues are fixed :) I'd rebase and squash this so we don't blow up the repo w/ 110 commits is that a accepted practice for stdarch? |
Sure that's fine. How much of the ARM intrinsics is there still left to implement? |
Doing quick math based on https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?page=146 there are about A quick check on exported functions:
so I think about 4000 😭 |
A arm simd and and orr Improve data for test cses Fix numbers picked for test cases Remove boilerplate of over and over repeated impl Add exclusive or operation Add bitwise equality operations Add gt and lt Add lte and gte Add vmul_p64 Add some uget intrinsics Add some vdup commands adding reinterpret and updating vget_lane Add vld1q u8 and s8 Add vmovq_n_u8 add vpaddq_u8 Add vextq_s8 Add vqmovn_u64 add vqsubq_u8 add vshrq_n_u8 and vshlq_n_u8 add vst1q_u8 Add vscode to git ignore Fix shr using constify Move macros Improve guard Use imm5 for vget_lane - this solves vgetq_lane_u64 Fix incorrect types for compairiso operators Fix poly64_t Rmove vst1q_u8 Add poly64_t to stdarch-verify Fix typo in unsupported type check Add poly128_t to stdarch-verify Update vextq_s8 Come cleanup Fix up const values Fix vsh*q_n_u8 Remove unused import Remove failing intrinsics Remove extra line Remove now unused import Add vextq_u8 and vextq_s8 Add vextq_u8 add vgetq_lane_u16 Add vget_lane_u8 Add missing documentaiton Try using u32 for parameters return arguments to i32 Fix test for vpaddq_u8 Update docs in macros Add vget_lane_u64 Add code generation for neon intrinsics Add vgetq_lane_u32 (fmov) Skip generated modules for rustfmt Add dummy files for cargo fmt Don't re-generate files unless required. Add documentation to spec file and update syntax Add more docs for test variables in sepc Add generation for vqsub* intrinsics to demonstrate use of links Add vqadd Add hadd Fix missing test Fix unused imports and test Add a number of additional intrinsics adn move generation to an example tag vgetq_lane_u32 as fmov instead of umov Remove generator, it's all writen by hand, promised Remove comment and unused example Remove comments Format generated files Remove don't edit comment Improve tests for vmul_f Work around bug in simdarch-verify Remove quadd for the time being Add tests for vreinterpret Fix bug in stdarch-test and nop intrinsics feat: additional tests for comparison operations feat: additional tests, move tests to non-generated file chore: rustfmt, move tests to neon/mod.rs feat: tests for conditionals and bitwise operators feat: improved test coverage for ARM intrinsics fix: removing 64-bit comparison ops (noticed they're in AARCH64) fix: fix tests for removed comparison operators feat: move test support into own module feat: implementation of checks and test support for aarch64 Revert changes to generated files Re-add tests that got lost in the merge Fmt and fix test values Add some negatives Only run test_support for v7 and aarch cpus Fix mul intrinsics Include code generator Fix first hive of intrinsic changes escape intrinsics fix more generated code Update crates/stdarch-gen/neon.spec Co-Authored-By: bjorn3 <[email protected]> Update crates/stdarch-gen/neon.spec Co-Authored-By: bjorn3 <[email protected]> Update crates/stdarch-gen/neon.spec Co-Authored-By: bjorn3 <[email protected]> escape all intriniscs w/ a dot Fix typo Fix unsigned prefix i -> s regenerate code differentiate between signed and unsinged intriniscs Start cleaning up aarch64 Fix bad spec Fix imm passing Fix more aarch intriniscs Update more aarch64 intrinsics Fix last aarch intriniscs, hopefully Fix last armv7 intriniscs, hopefully Fix last armv7 intriniscs, hopefully Fix unused import in stdarch-gen
2dbeb2e
to
277c041
Compare
hm the number is probably wrong, in x86 and x86_64 there are only 1122 |
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491c/BABJFCGC.html seems to be a good list it might be a good idea to do it per topic, that way it's smaller more digestable chunks and not one mamuth task topics would be (given the headlines in the link above):
|
Can we re-trigger the BSD build? it timed out at creating the instance. |
Could you please open an or more issues about it? |
GCC 9's |
that figure is correct, here's a list of all neon intrinsics and which ones rust currently supports |
I'm trying to add some more SIMD intrinsics for arm. It's still very much WIP - I'm also not sure how to test them locally.