-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Relocation truncated to fit" from branches in long functions #44
Comments
Here's a full example IR file:
The corresponding
leading to, when linking it with more stuff:
|
If I cut down on the IR file, the linking error can go away because it is the interaction of these branch-by-0 ops and overall function sizes that causes them. However, just to trigger these branch-by-0 ops, the following IR is enough:
Note the instruction at 0x08:
|
I looked at this last example where linking succeeds, and it looks like those branch-by-0 instructions are actually just placeholders to be filled in by the linker; for example, with the code in #44 (comment), the linked
(note the branch at 0xd8) So I guess these are not NOP branches, they are just to be filled in by the linker. The problem still remains though: the offset to fill in is sometimes too large. |
I would've assumed that the linker would see that the branch is out of range and automatically inserted a trampoline with a large branch instruction. It's difficult to do that pre-link time because it's unclear where each section will be placed and so we cannot really deduce the final target offset of a lot of the branches. We should investigate what GCC does here - we may need to implement some sort of trampolining ourselves. |
I'll look into reproducing a similar function in GCC and checking the resulting code later, as time permits. However, given these are intra-function branches represented by relative addresses, why do they need any link-time fixup at all? Is it just to implement a poor-man's-two stage assembler? Or does linking sometimes move parts of a function around, relative to its other parts? |
For this specific case, I'm not entirely sure. In the past, I made the backend very conservative on what it promoted to relocations, leading to us perming most fixups at compile time. I changed it a while ago so that the compiler output is more like GCC, which lowers almost all fixups to relocations. This allowed me to run a script to compile all of our machine code tests and compare the generated machine code to GCC. |
GCC generates a two-step branch exactly when needed. The following code compiles to a single
|
That's interesting, it looks like we'll have to implement that behaviour too. Now that I think about it, we don't do any kind of branch relaxation. I think we used to have a custom pass that did it, but I couldn't upstream it because the reviewer wanted us to use the 'generic LLVM branch relaxation' pass instead. |
Here's the patch where we decided to use the generic relaxation pass. |
Yeah, I'm looking at that right now. Being the total LLVM noob, I'm tripping up on how to define a "virtual" / "synthetic" AVR opcode that consists of a branch + a rjmp, just so I have something to put in |
I'm not too sure of the background but from my understanding, you want to define a synthetic instruction which gets expanded into a branch plus a jump? If so, the LLVM formally calls these 'pseudo' instructions. In order to define them, you can add new entries to
In order to write code to expand these pseudos, you can add some handling code to |
Yes I think that's exactly what I was looking for. |
Aaaaaaaargh! It seems |
There are a few places you could put it. One thing I am curious about though - why do you need a pseudo instruction in this case? Normally you'd want one if you needed something that the instruction selector could pattern match during instruction selection so you can write a custom pass to expand it into In this specific case, you're free build |
Yes, but the type of |
I'm looking into For context, here is how operations/instructions are represented at different parts of the pipeline.
The further along the pipeline you go, the less information you have. Pseudo instructions exist during the After pseudos are expanded and just before we're actually done generating the output, everything is lowered to It looks like the I haven't looked into it yet, but that doesn't sound like enough for the AVR target. I think we will always need to insert a trampoline in these cases, which is a minimum two instructions. It also doesn't really fit inside what Given that this looks to be the case, I don't see any problem with reinstating the old branch selection pass. Given that you've been looking at this problem @gergoerdi, does that sound correct to you? |
Yes exactly, that's what I was alluding to with "the type of This allowed me to make further progress on what I'm doing; however, I'll need to clean it up a bit before posting it. Also, I'll need to convince myself first that my patch is correct -- now that I can compile and link it fully, I'm seeing some strange behaviour from my program that I'm not sure yet if it is because of something I screwed up in the relaxation code, or some other, unrelated LLVM AVR or Rust AVR bug. |
For now, these relaxed branches are used unconditionally, even when the branch target is nearby. avr-rust/rust-legacy-fork#44
For now, these relaxed branches are used unconditionally, even when the branch target is nearby. avr-rust/rust-legacy-fork#44
For now, these relaxed branches are used unconditionally, even when the branch target is nearby. avr-rust/rust-legacy-fork#44
I'm looking at this again I'm not sure why we need pseudo instructions for this. Here is the original patch which partly added the relaxation framework. The main part is this function virtual unsigned insertUnconditionalBranch(MachineBasicBlock &MBB,
MachineBasicBlock &NewDestBB,
const DebugLoc &DL,
int64_t BrOffset = 0,
RegScavenger *RS = nullptr) const {
llvm_unreachable("target did not implement"); Because we have access to the basic block, we should be able to insert as many instructions as we need. Also, here is a WIP implementation of it that I wrote a year ago link. |
I can't seem to properly replicate this. I believe I can see the out of range relocations in the executable
But
|
For now, these relaxed branches are used unconditionally, even when the branch target is nearby. avr-rust/rust-legacy-fork#44
For now, these relaxed branches are used unconditionally, even when the branch target is nearby. avr-rust/rust-legacy-fork#44
I'm going to reinstate the |
Upstreamed in r307109 and cherry-picked in cd29cc0e8d264f3000bbbe39465e7a44bb78dbf4. |
That didn't seem to fix it, so I will revert.
|
@gergoerdi Can you please comment the output of |
I've got a patch which fully implements the target-independent branch relaxer for AVR. |
As mentioned in #36 (comment), I got around #36 by inlining lots of functions. However, the resulting code cannot be linked because, I quote
avr-gcc
:However, looking at those places, all those branches that the linker tries to rewrite to a long address, they look quite superfluous to me. For example, the first one at 0x432:
Isn't that a conditional branch to exactly the next instruction? A roundabout, two-byte NOP?
I looked at all the locations mentioned in the linker error messages, and they are all of this form (some
brne
, somebrcs
, somebrge
, but all jump to exactly the next instruction).The text was updated successfully, but these errors were encountered: