-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ARM intrinsics for thumbv6 / thumbv7 #437
Comments
@jcsoo Could you propose a module structure that makes sense ? Currently we have:
but it has evolved in an ad-hoc way (see issue #139 ). It might be better to have:
Having a module hierarchy consistent with the ISA extensions will help with automatic verification later on, where the tools to automatically verify ACLE might be different than the tools to verify |
This is a pretty tricky question because there are a couple of different things that are getting mixed together. NEON is an architecture extension that includes new instructions. ACLE is a set of C language extensions that provide a standardized way to access architecture features and extensions, including NEON. CMSIS is a software standard that defines a C language API that is a superset of ACLE. The Core Function / Instruction Header File is what we are really interested in. So the neon / acle / cmsis organization makes sense but is a little strange because it's really
There's also the issue that there are a lot of different rust targets that will fit in that hierarchy in different ways. Much of this has been covered in #139 as you mention. My thinking on this has evolved as I've learned more about the details of the various ARM architectures. I think the API that we present (the module organization) should be based on the underlying instruction sets and extensions not on the documents that we use for reference. Each instruction set module should have a complete set of intrinsics except for major extensions like NEON which will have their own module. CMSIS is a bit strange because it's sort of an instruction set extension for the "-M" variants but not officially treated as such. It also has a different style than everything else. Because of that I think it makes sense to add them as an separate module. We could also do the same with the "-A" and "-R" variants but I don't know enough about them to say whether it would be worth it. So we would have something that roughly matches up with ARM's architecture and instruction set hierarchy:
|
@jcsoo That sounds in general good to me. The only thing I find a bit strange is that the top-level names do not really match with Rust's
|
To be honest, I don't know all that much about 64-bit ARM and NEON/SIMD. If they are different enough, then maybe we should eliminate
The problem is that those names themselves are inconsistent and confusing. Here is the list of ARM target_archs that Rust currently supports:
So there is a mix of instruction sets, architecture versions, and something that is neither. Because of that, I think we're best off defining the public api using the ARM architecture versions and then documenting how Rust target architectures map to ARM architectures. For implementation we could have some kind of facade if that makes things easier from a #[cfg] standpoint. |
Do you know where the full list is? For
So right now, So maybe we should just rename
where I've used If something like that is fine, I think we are actually pretty close to it already and what we need is better documentation explaining what goes where, but that should become more clear as we add more intrinsics. Also, all these modules are there only to help us fight |
I believe this is where to find the list of all rustc targets, including the arch and available features for each target: https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec I was assuming the first element of the target triple (quad?) was the internal "arch" definition, so some of what I wrote earlier might not apply. |
This is really not quite as clear as it should be and unfortunately it'll become even more messy in the future thanks to ARM taking a cue from Intels SSE book. In a nutshell there's there're two main instruction sets/encodings/architectures now called aarch32, aarch64 and in addition there're two compressed ones called thumb1 and thumb2 plus a number of extensions like DSP, VFP, TrustZone, Jazelle, Neon... . Those instruction sets are the superset of all available instructions and constantly expanded. Now there're various profiles (ARMv5, ARMv6, ARMv6M, ARMv7, ARMv7M, ARMv8...) requiring the implementation of a subset of various instructions/encoding schemes yet still allowing to implement more or even custom ones. My idea would be to group them by main instruction set and then gate individual instructions by profile. The extensions are not part of most profiles (exceptions are e.g. ARMv5TE) and can always be added separately so they have their own feature gates. So maybe a structure like the following makes most sense:
NB: The assumption that arm is 32bit is flawed. Not sure whether there are already 64bit only implementations but for some applications (e.g. server and high end mobile) there's a strong desire to drop 32bit altogether. |
We are currently discussing amending the
Ideally, the architecture folders would re-export everything relevant for the current target being compiled to, so that from coresimd we can just re-export the |
@gnzlbg And feature gates for the various supported profiles? Works for me. Sill not sure what acle and cmsis means in that context, they're really just C API naming conventions we shouldn't have to care about in a Rust world. If anything they should be implemented separately from the intrinsics. |
When possible yes. This kind of depends on whether LLVM currently supports these or not, and what we want to do about the ones without one feature gate (or a combination thereof), but I am sure we can work sensible things out as we go along (e.g. like defining feature gates that enable multiple features if that is required).
|
It's a C API, not the C API.
I am not so sure about that. Also what we're talking about here is a tiny subset CMSIS-Core which itself is a subset of CMSIS so if anything it should have a more concise name than just CMSIS. It is also rather esoteric; I've worked a lot with CMSIS but never had to use the special intrinsics before. NB: ARM is describing them as "Intrinsic functions used to generate CPU instructions that are not supported by standard C functions." and I thought the plan was to provide intrinsics for all relevant opcodes anyway in which case this would mean that we have them twice? |
I used the reference to CMSIS-CORE because that seemed like the most authoritative source that I could find. It's also listed in the ARM Compiler toolchain Compiler Reference. As far as I can tell, each intrinsics corresponds with a single instruction with an appropriate flag, for instance We certainly could rename the intrinsics to I definitely could believe that you haven't ever needed to use these intrinsics directly, but is it possible that you were either calling higher-level abstractions that use them under the hood or using inline assembly? How else would you implement a critical section? |
Well, ARM maintains a lot of stuff around their ARM IP, including cross-compiler toolchains and even whole operating systems.
Using inline assembly, same as plenty of other guys (except for ARM mbedOS, of course). cf. https://github.com/zephyrproject-rtos/zephyr/blob/master/arch/arm/core/cpu_idle.S |
At least for If you want to expose intrinsics not part of the vendor C APIs, then
Note that you can keep doing that on nightly, but AFAICT there is no concrete proposal yet (nor anybody is putting the work) into landing this on stable any time soon. |
That feels weird and wrong on a number of levels. If we really want to do this then you can basically strike neon and dsp from the list because there're only CMSIS Core and ACLE to play with here in terms of C APIs. |
The motivation for including these intrinsics in stdsimd is to allow their eventual usage on stable: rust-embedded/wg#63 Stabilizing the existing |
I know and fully agree. But exposing some weirdo Intrinsic API loosely based on some existing C APIs only few people use really doesn't fill me with any kind of joy... |
These are the canonical C compiler intrinsics, defined by the vendor, and implemented by all of the major open source and commercial C compilers for the platform. I can understand why developers aren't familiar with the intrinsics if they are using inline assembly, but I've tried as hard as possible to find an authoritative (i.e. vendor-provided) source for them. I'm not aware of any alternative intrinsics. |
neon and dsp from the list because there're only CMSIS Core and ACLE to
play with here in terms of C APIs.
ARM provides a specification of the neon api for a32, a64, and v7 both in
human readable and in machine readable forms (which is good for automatic
verification).
API loosely based on some existing C APIs only few people use really
doesn't fill me with any kind of joy...
The only C APIs that can be provided in stdsimd are those provided by ARM
because the APIs provided in std::arch must match the vendor specification,
and in this case the vendor is ARM. This is the case for acle, cmsis, neon,
and probably for the dsp intrinsics as well if they are exposed by arm
compilers.
Occasional exceptions can be made, for example, if most compilers already
provide a useful intrinsic that is not covered by spec. But in ~1500 x86
exceptions we have < 5 of these, so these are really exceptions.
|
SIMD? Most of the stuff has nothing to do with SIMD at all and are low level building blocks (or not so low-level in case of NOPs) for system level programming. I'd very much prefer providing a good Rust mapping around the native assembler instructions rather than following inconsistent C APIs just "because they exist". Have you looked at the 74 pages of ACLE documentation? Most of it is general blah, then some C-only explanation how to use the provided include files, some grouping explanation and architecture differentiation and a couple of tables. I don't see anything of relevance for a Rust user in there; even with my 20+ years and of C experience and quite some time working with Cortex-M MCUs I didn't even know ACLE existed until this issue and never missed it... |
‘stdsimd’ is called that way for legacy reasons, it implements “vendor
APIs”, which are APIs defines by hardware vendors (often in the form of C
APIs).
I'd very much prefer providing a good Rust mapping around the native
assembler instructions rather than following inconsistent C APIs just
"because they exist".
That’s a noble goal and anybody is free to pursue this in a different crate
(many have done so). The purpose of ‘stdsimd’ is to allow those crates to
work on stable Rust, not to do their jobs.
The main reason ‘stdsimd’ is actually the first crate implementing vendor
intrinsics that has been able to ship something to stable Rust users is
because of its policy against “inventing new APIs”. This policy went
through the RFC process and was merged, so arguing about it here is kind of
moot.
If you want to change it, you need to submit an RFC.
even with my 20+ years and of C experience and quite some time working
with Cortex-M MCUs I didn't even know ACLE existed until this issue and
never missed it...
If you don’t need it, don’t use it? I don’t know what’s your point here. If
you prefer inline assembly to intrinsics, submit an RFC for that, push it
until it’s merged, and implement it. But that doesn’t help those who want
to use vendor APIs in Rust.
|
I'm trying to help getting embedded Rust to stable. This requires a handful of mnemonics to be available as Rust intrinsics and I could not care less about "vendor APIs in Rust" and am not even quite sure what the topic of this issue has to do with that. |
I could not care less about "vendor APIs in Rust" and am not even quite
sure what the topic of this issue has to do with that.
It’s an issue about implementing a vendors C API (ARM) a vendor in a
library that exposes vendor’s C APIs.
|
@gnzlbg Well, it seems so. I fail to read that from the topic and I also disagree that this is the best way to solve the issue at hand. |
I don’t know exactly what the issue at hand is, but exposing the C vendor
APIs is something that some of us want to do anyways. If what you want is
full inline assembly it won’t get you exactly where you want to be, but
maybe it will give you access to at least some part of the instruction set
in stable Rust pretty quickly. This might allow you to pursue a more
minimal solution somewhere else, like for example to add less intrinsics to
‘core::intrinsics’ or something.
If what you need is full inline assembly, this won’t give you that, but it
might reduce the places in which you need inline assembly.
Also, just because the vendor API isn’t very idiomatic to use in Rust, that
does not mean that you can write a thin Rusty wrapper on top that is
idiomatic. Such wrappers only have to be written once.
The main advantage of vendor APIs is that they stabilize quickly because
there is little to nothing to discuss about them. They are just the APIs
that vendors define, and that is the way it is. They might suck, but we did
not define them. This lack of controversy makes the RFCs for them “trivial”.
|
Regarding the module structure. ARMv8-M is a 32-bit architecture. The product page on the ARM
AFAICT it's just the 32-bit ARMv7-M (Cortex-M) architecture plus some security features (the The "ARM and Thumb-2 Instruction Set Quick Reference Card" may provide some guidance on
And among the instructions that will have intrinsics in stdsimd (we have functions for these in the
This reference card totally leaves out NEON instructions though. Other things to consider: Also, there's a silicon bug around the BASEPRI register that affects the Cortex-M7 processor Is progress on this issue mainly blocked on deciding on a module hierarchy? To me the flat structure from #437 (comment) makes most sense. |
I believe the contents of the module, |
Yes, like @alexcrichton this is just an internal detail, and the more instructions for more architectures are implemented the more clear it will be come. The constraints for the internal organization are a mixture of 1) what makes sense from the hardware point-of-view, 2) what makes sense from an LLVM point-of-view, 3) what makes sense for reusing code between As more intrinsics are added, the internal structure will evolve to make our lives easier. So IMO the best way to proceed is to just send PRs with newer intrinsics, newer targets, ci for those, ... and we'll figure it out as we go along. We can use what has been discussed here as a "guide", but as more intrinsics and arm targets are added new constraints will be introduced and we'll need to adapt. |
PR #518 adds these Cortex-M intrinsics |
ARM: expose the "mclass" target feature This let us differentiate, in conditional compilation context, between ARM Cortex-M targets, like the `thumbv*` targets, and other ARM targets, like the ARM Cortex-A Linux targets. r? @alexcrichton cc @gnzlbg cc rust-lang/stdarch#437
I think we should clarify whether just ACLE, or ACLE+CMSIS, or just CMSIS should be implemented in While clarifying this might generate extra work, this work won't be lost because it can be reused for the RFC that stabilizes these intrinsics, which is something that should happen before Rust 2018 anyways. |
See Implement all x86 vendor intrinsics for more information about implementing intrinsics.
Also see rust-embedded/wg#63 for more discussion.
There are two groups of intrinsics that need to be implemented for thumbv6 / thumbv7.
Core Register Access functions
Documentation of the core register functions:
https://www.keil.com/pack/doc/CMSIS/Core/html/group__Core__Register__gr.html
ARM CMSIS header file:
https://github.com/ARM-software/CMSIS/blob/master/CMSIS/Include/cmsis_armcc.h
CPSID
fn disable_fault_irq() // CPSID f
fn disable_irq() // CPSID i
fn enable_fault_irq() // CPSIE f
fn enable_irq() // CPSIE e
MRS
fn get_ASPR() -> u32
fn get_BASEPRI -> u32
fn get_CONTROL() -> u32
fn get_FAULTMASK() -> u32
fn get_FPSCR() -> u32 // - M4, M7
fn get_IPSR() -> u32
fn get_MSP() -> u32
fn get_PRIMASK() -> u32
fn get_PSP() -> u32
fn get_xPSR() -> u32
MSR
fn set_ASPR(u32)
fn set_BASEPRI(u32)
fn set_CONTROL(u32)
fn set_FAULTMASK(u32)
fn set_FPSCR(u32) // - M4, M7
fn set_IPSR(u32)
fn set_MSP(u32)
fn set_PRIMASK(u32)
fn set_PSP(u32)
fn set_xPSR(u32)
ARM ACLE Intrinsics
The ARM ACLE specification is here:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf
The Clang ARM ACLE header file is here:
https://github.com/llvm-mirror/clang/blob/master/lib/Headers/arm_acle.h
The compiler intrinsics available through LLVM can be found here:
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsARM.td
8.3 Memory Barriers
fn dmb(u32) // Data Memory Barrier
fn dsb(u32) // Data Synchronization Barrier
fn isb(u32) // Instruction Synchronziation Barrier
8.4 Hints
fn wfi() // Wait For Interrupt
fn wfe() // Wait for Event
fn sev() // Send Global Event
fn sevl() // Send Local Event
fn yield() // Yield
fn dbg(u32) // Debug
8.5 Swap
fn swp(u32, *mut u32) // Swap
8.7 NOP
fn nop() // No-op
9.2 Miscellaneous data-processing intrinsics
Note: These may have equivalents in
core
.fn ror(u32, u32) -> u32 // Rotate Right
fn clz(u32) -> u32 // Count Leading Zeros
fn cls(u32) -> u32 // Count Leading Sign Bits
fn rev(u32) -> u32 // Reverse Byte Order
fn rev16(u32) -> u32 // Reverse Byte Order (16 bit)
fn revsh(u32) -> u32 // Reverse Byte Order Signed (16 bit)
fn rbit(u32) -> u32 / Reverse Bits
10.1 Special register intrinsics
fn arm_rsr(special_register) -> u32 // Read System Register
fn arm_rsrp(special_register) -> *const () // Read System Register Containing Address
fn arm_wsr(special_register, u32) // Write System Register
fn arm_wsrp(special_register, *const ()) // Write System Register
The text was updated successfully, but these errors were encountered: