-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
32 bit targets in Cranelift #5572
Comments
Cool, it's nice to hear from people working on more Cranelift backends! I'm curious what architecture you're targeting; glancing at your GitHub, I'm guessing it's your A32 RISC ISA? I'd also love to hear some about your goal: Do you want to run Wasmtime on this target, or compile Rust to it with cg-clif, or something else? As you noticed, we don't currently have backends implemented for any 32-bit targets, so I'm not particularly surprised that you're running into situations where we didn't quite think through the implications for a 32-bit target. There are existing cases where backends use what we refer to as "pseudo-instructions", which are what I think you're describing as "non-existing instructions". You can find a bunch of examples with For the most part I think you'll just need to do for 64-bit types what the existing backends do for 128-bit types: split the value across two registers and figure out how to combine register pairs as needed. For some relatively simple examples, look at how various backends implement The only case where you should need four registers in The only I hope this helps give you a starting point. We're happy to answer questions if we can! |
Hey, thanks for that detailed answer. Yes you are right, I'm doing this for academic purposes atm. The ISA I'm targeting is similar to RV32I in a lot of ways so if you don't wanna read through it just pretend its RV32I. This is also the reason why I mostly used the RiscV backend as a template. My goal is to compile Rust. I messed around with LLVM but writing a backend for that turned out to be really difficult. And since I know my way around Rust a lot better than C++ I decided to give Cranelift a go. Since this is mostly about the learning experience I don't really care that the generated code is not quite on par with LLVM. Alright so I will ignore i128s for now. I guess as long as I don't use them in the code I want to compile they should never be emitted right? Same with floats probably. I'm gonna read through the other backedns ISLE code and see how it's being done there. I'm still finding my way around ISLE. |
If you want the rust standard library to compile without patches you need both i128 and floats. It is possible to patch them out though with a bit of work. |
I’d suggest looking at the aarch64 and x64 backends when looking for inspiration. |
Ok, I got a very simple function to correctly compile now. |
Nice! Most testing is done via file tests (‘cranelift/filetests/filetests/isa/*’), fuzzing, and wasmtime exercising the codegen. |
Ah thanks, that's what I was looking for. |
I have implemented all of the arithmetic and bitwise operations now, except for division and remainder. But since none of the other backends do this I don't know how to go about it. I can manage to emit a call but where does the function come from, how do I define it? |
@Artentus we have a mechanism for this with |
I saw that in the memcpy lowering but couldn't figure out how to actually define new ones. |
The enum is defined here. If you grep for one of the existing ones, e.g. |
(Forgot to finish: then you can follow that example to define a new one; also in general, when adding a new option to an enum, the compile errors will usually lead you to where you need to make changes.) |
Thanks! However I assume this only works within the runtime correct? |
Yes, you would need to somehow provide implementations of the libcalls that the relocation resolution can find. cc @bjorn3 for the best way to do that... |
Yes I mostly copied the code from x64 |
You could add it to the compiler-builtins rust crate I think. This crate is included in everything rustc compiles. |
I am currently working on lowering for loads and stores, and there are these two ir instructions I'm unsure about: I tried experimenting and defined this test function:
But this yields the following error: So do I just completely ignore these? |
uload and sload generally return a 64bit value by zero/sign extending a 32bit loaded value. |
Well,
|
Looks like
|
Ah, I guess that makes sense. So I should probably implement all of them for both 32 bit and 64 bit right? |
uload8 and sload8 should he implemented for i16, i32 and i64 (i16 can be skipped for wasm I think, uload16 and sload16 should be implemented for i32 and i64. Uload32 and sload32 should be implemented for i64. |
Unfortunately it seems like this is true even for the core library. I tried to compile it but am gettings loads of errors related to invalid SSA types (f32 and i128). So how do I go about patching that stuff out? |
I know there are usages in |
Thank you very much, I actually managed to have it compile the (modified) core library without errors. However I must say it is very awkward having to inject a custom core library. 128 bit integers on the other hand do pose a problem, I'm not sure what to do about that. |
For rustc_codegen_cranelift the patches directory contains various kinds of patches. Those with |
I think adding support for 128 bit integers in the backend is the better solution than patching the core library. |
I'm successfully compiling code now, including the entire Rust core and alloc libraries. However after inspecting the generated code I am seeing some very weird things being generated. I compiled this very basic hello world main function:
And this is the generated assembly (manually commented):
As you can see what should be loading two simple arguments into |
@Artentus I wonder if it would make sense to move some of this discussion to a Zulip thread? We're interested in helping you support your ISA, but normally we like to use GitHub issues more for targeted single issues. In any case, it's hard to say much without actually playing with your new backend. Are you enabling optimizations? How did you implement the ABI? What does the CLIF look like? The redundant ops could be coming from any one of several layers. |
🎉
Yeah, rustc generates terrible MIR with a lot of copies in it. I try to remove some of the copies in cg_clif, but many are passed through to Cranelift. Cranelift doesn't have the optimization passes necessary to remove them. If you enable optimizations for cg_clif you this will enable some optimizations working on MIR which may tidy it up a bit. |
I just checked it and this is indeed caused by the MIR containing copies:
The first store is This seems to be a case where cg_clif itself should be able to avoid stack allocation, but for some reason it doesn't. I'm currently investigating. |
Found it. This was a regression from bjorn3/rustc_codegen_cranelift@777d473. I've fixed it in bjorn3/rustc_codegen_cranelift@df04fd6 for a 12% runtime perf win on a benchmark I use. Thanks for pointing this out! |
That's great news! 12% from such a small change is huge. This patch did get rid of one of the unnecessary copies. I also figured out how to optimize those add+load/store sequences into a single instruction. Since everything is working I believe I can now close this. Thanks to everyone who helped me with this, I got way more and way faster responses than I ever expected. The codebase was also very pleasant to work with, especially compared to the mess that is LLVM. If you're interested I can PR the changes that are required to fully support 32 bit targets in cranelift_codegen. |
We actually had exactly this code before, when we had a (partial) arm32 backend in-tree! I think it's probably best to wait until we add a new target to bring this back, as we are also considering other ways now of supporting wide values (namely: legalization/narrowing rules in the mid-end, as Cranelift used to do actually) that might be simpler in the end. I'm happy you got this working though, it's a really neat demo of portability! And if you want to contribute any documentation regarding rough edges or confusing bits you hit, that might also be a good outcome from this :-) |
A pice of documentation I would have loved to have is a listing of all CLIF instructions, just to know what needs to be implemented instead of trying to compile something and then reading error messages about missing lowering rules. If you tell me where to put it I can start working on it. |
The closest we have to that is the InstBuilder trait docs that are generated from the source (that ultimately come from the doc-strings in We have a cranelift/docs/ directory as well that could use some attention in general; if you come up with any more general thoughts from your experience please do feel free to start a |
Hello,
I am currently trying to write a Cranelift backend that targets a 32 bit architecture but am encountering some issues.
Since all the existing backends are 64 bit am I not sure whether this is due to missing support or my lack of understanding.
For example,
codegen::machinst::valueregs::ValueRegs
states in its descriptionwe cap the capacity of this at four (when any 32-bit target is enabled) or two (otherwise)
, however the capacity is hardcoded to 2.Also, inside the
codegen::machinst::isle::isle_lower_prelude_methods
macro, the functiontemp_writable_reg
assumes only a single register is returned. However, this function is called inIsleContext<'_, '_, MInst, XYZBackend>::imm
, which takes 64 bit values, so on a 32 bit target two registers are required.In general, the
Context
trait generated by ISLE requires 64 bit quantities in most places. How is one expected to implement these for a 32 bit target? Do you invent non-existing instructions that operate on 64 bit and then lower them into equivalent real instructions?The text was updated successfully, but these errors were encountered: