-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM greedy register allocator does not know how to split live variables across register classes #37
Comments
And the machine code when we are doing register allocation
|
This is almost certainly related to avr-llvm/llvm#1. The bug was really hard to fix (I had to figure out how the greedy register allocator worked and make a change to it so that it would attempt to split unspillable live ranges. I'm surprised to see this again. Hopefully it's easier to solve this time lol |
One thing I notice but don't know if it's important: When |
That may be because |
One thing I haven't been able to figure out yet: why would something be unspillable? With 32 registers, it seems like it should be conceptually pretty hard to actually run out... |
There may be 32 general purpose registers, but all of the IO instructions
only support using pointers from the X, Y, and Z registers.
If a function has at least one stack spill, the Y register becomes the
frame pointer, and that means we can only have two pointers in registers at
a time in some cases.
Pointer-heavy code will quite quickly require a lot of spilling and loading
of registers.
…On 30/04/2017 11:55 am, "Jake Goulding" ***@***.***> wrote:
One thing I haven't been able to figure out yet: *why* would something be
unspillable? With 32 registers, it seems like it should be conceptually
pretty hard to actually run out...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#37 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHXUr86nYMMyPanp69dZtBkX1Z76i5a2ks5r084MgaJpZM4NLe-4>
.
|
But in this case, it looks like we are using |
Nah, it looks like it's |
I've been trying to comment out parts of libcore to get to the next error, but it feels like this is occurring for every implementation of |
I've got an unfinished patch locally for #38 which may reduce the number of these (reserve the frame pointer in less cases). |
The |
I'm continuing to try to drive down the example size. This is what I have now: source_filename = "bugpoint-output-87a64d6.bc"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr"
%"fmt::Formatter" = type { i32, { i8*, void (i8*)** } }
@str.1b = external constant [0 x i8]
define void @"TryFromIntError::Debug"(%"fmt::Formatter"* dereferenceable(32)) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
start:
%builder = alloca i8, align 8
%1 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1
%2 = bitcast { i8*, void (i8*)** }* %1 to {}**
%3 = load {}*, {}** %2, align 2
%4 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1, i32 1
%5 = load void (i8*)**, void (i8*)*** %4, align 2
%6 = getelementptr inbounds void (i8*)*, void (i8*)** %5, i16 3
%7 = bitcast void (i8*)** %6 to i8 ({}*, i8*, i16)**
%8 = load i8 ({}*, i8*, i16)*, i8 ({}*, i8*, i16)** %7, align 2
%9 = tail call i8 %8({}* nonnull %3, i8* noalias nonnull readonly getelementptr inbounds ([0 x i8], [0 x i8]* @str.1b, i16 0, i16 0), i16 15)
unreachable
}
declare i32 @rust_eh_personality(...) unnamed_addr
attributes #0 = { uwtable } Things I've noticed:
|
Continuing to minimize: source_filename = "bugpoint-output-87a64d6.bc"
target datalayout = "e-p:16:16:16-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-n8"
target triple = "avr"
%"fmt::Formatter" = type { i32, {}*, i8 ({}*)** }
define void @"TryFromIntError::Debug"(%"fmt::Formatter"*) {
start:
%builder = alloca i8
%1 = getelementptr %"fmt::Formatter", %"fmt::Formatter"* %0, i8 0, i32 1
%2 = load {}*, {}** %1
%3 = getelementptr %"fmt::Formatter", %"fmt::Formatter"* %0, i8 0, i32 2
%4 = load i8 ({}*)**, i8 ({}*)*** %3
%5 = load i8 ({}*)*, i8 ({}*)** %4
%6 = tail call i8 %5({}* %2)
unreachable
}
|
@dylanmckay yo, why does Changing that locally allows this example code to compile (but I'm not sure it's correct as I get a rogue |
Also, I'm pretty sure that even with one register, the example LLVM IR should be able to be compiled to machine code. I feel like something's not quite right with spilling / reordering algorithm. I was trying to write out a proposed register allocation when I stumbled on the |
I'll have a look in a few hours and get back to you.
Yeah, it is a bug in LLVM. I'll get more details in a bit but from memory, when LLVM runs out of registers it isn't smart enough to perform a cross-class copy (i.e. copy a |
Unfortunately, the real code still fails; guess that that one extra register wasn't enough. |
How scary would implementing something like that be? It feels like changing the register allocator is A Big Deal. Beyond scariness, how complicated would you guess? |
Guys, I have been following changes. I have removed |
It's not too scary - there are a lot of test cases for the register allocator and so it is hard to break. I've worked on it before, but funnily enough I also caused a small regression that was discovered a few weeks later.
It depends on how much support there is for it. I'm sure there's some sort of mechanism for finding free registers in a register class and performing temporary copies to it, it shouldn't be too hard to make it work across class boundaries famous last words |
@dylanmckay did you want to put this on #44 ?
That does seem like the same root cause; too bad the earlier fix didn't solve it. 😿 |
@jackpot51 there's no easy way to know which Rust constructs will trigger a given LLVM issue. Essentially, my process has been:
You are welcome to try to comment out all the |
I was mixing up eviction and splitting |
As far as I understand the problem occurs when
Sure we could fix up the register allocator but we don't have the resources. We should just disallow machine instructions like 48B and live with the extra copy instruction here and there. Comments? |
Explained very well - I will edit the description to include this Perhaps the
That sounds like a reasonable tradeoff to me. I'm not sure what you mean by "like 48B" - what properties distinguish 48B from other wide load instructions? I wonder how the |
The target is |
Here is my workaround for this bug and #95: diff --git a/lib/Target/AVR/AVRInstrInfo.td b/lib/Target/AVR/AVRInstrInfo.td
index a2129cc0e2e..601cfa8d861 100644
--- a/lib/Target/AVR/AVRInstrInfo.td
+++ b/lib/Target/AVR/AVRInstrInfo.td
@@ -1222,7 +1222,7 @@ isReMaterializable = 1 in
// ldd Rd, P+q
// ldd Rd+1, P+q+1
let Constraints = "@earlyclobber $dst" in
- def LDDWRdPtrQ : Pseudo<(outs DREGS:$dst),
+ def LDDWRdPtrQ : Pseudo<(outs DREGS_WITHOUT_Z:$dst),
(ins memri:$memri),
"lddw\t$dst, $memri",
[(set i16:$dst, (load addr:$memri))]>,
diff --git a/lib/Target/AVR/AVRRegisterInfo.td b/lib/Target/AVR/AVRRegisterInfo.td
index 8162f12052b..c139ca6863d 100644
--- a/lib/Target/AVR/AVRRegisterInfo.td
+++ b/lib/Target/AVR/AVRRegisterInfo.td
@@ -157,6 +157,18 @@ def DREGS : RegisterClass<"AVR", [i16], 8,
R9R8, R7R6, R5R4, R3R2, R1R0
)>;
+// Main 16-bit pair register class.
+def DREGS_WITHOUT_Z : RegisterClass<"AVR", [i16], 8,
+ (
+ // Return value and arguments.
+ add R25R24, R19R18, R21R20, R23R22,
+ // Scratch registers.
+ R27R26,
+ // Callee saved registers.
+ R29R28, R17R16, R15R14, R13R12, R11R10,
+ R9R8, R7R6, R5R4, R3R2, R1R0
+ )>;
+
// 16-bit register class for immediate instructions.
def DLDREGS : RegisterClass<"AVR", [i16], 8,
(
diff --git a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir
index df69f5fffa5..72b20d39d68 100644
--- a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir
+++ b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ-same-src-dst.mir
@@ -25,11 +25,11 @@ body: |
; CHECK-LABEL: test_lddwrdptrq
- ; CHECK: ldd [[SCRATCH:r[0-9]+]], Z+10
+ ; CHECK: ldd [[SCRATCH:r[0-9]+]], Y+10
; CHECK-NEXT: push [[SCRATCH]]
- ; CHECK-NEXT: ldd [[SCRATCH]], Z+11
- ; CHECK-NEXT: mov r31, [[SCRATCH]]
- ; CHECK-NEXT: pop r30
+ ; CHECK-NEXT: ldd [[SCRATCH]], Y+11
+ ; CHECK-NEXT: mov r29, [[SCRATCH]]
+ ; CHECK-NEXT: pop r28
- early-clobber $r31r30 = LDDWRdPtrQ undef $r31r30, 10
+ early-clobber $r29r28 = LDDWRdPtrQ undef $r29r28, 10
...
diff --git a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir
index 59b3ce8b602..96d3809ed2d 100644
--- a/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir
+++ b/test/CodeGen/AVR/pseudo/LDDWRdPtrQ.mir
@@ -18,8 +18,8 @@ body: |
; CHECK-LABEL: test_lddwrdptrq
- ; CHECK: ldd r30, Y+10
- ; CHECK-NEXT: ldd r31, Y+11
+ ; CHECK: ldd r28, Z+10
+ ; CHECK-NEXT: ldd r29, Z+11
- early-clobber $r31r30 = LDDWRdPtrQ undef $r29r28, 10
+ early-clobber $r29r28 = LDDWRdPtrQ undef $r31r30, 10
...
diff --git a/test/CodeGen/AVR/rust-avr-bug-37.ll b/test/CodeGen/AVR/rust-avr-bug-37.ll
new file mode 100644
index 00000000000..9c269d3dab1
--- /dev/null
+++ b/test/CodeGen/AVR/rust-avr-bug-37.ll
@@ -0,0 +1,25 @@
+; RUN: llc < %s -march=avr | FileCheck %s
+
+%"fmt::Formatter" = type { i32, { i8*, void (i8*)** } }
+
+@str.1b = external constant [0 x i8]
+
+define void @"TryFromIntError::Debug"(%"fmt::Formatter"* dereferenceable(32)) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
+; CHECK-LABEL: "TryFromIntError::Debug"
+start:
+ %builder = alloca i8, align 8
+ %1 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1
+ %2 = bitcast { i8*, void (i8*)** }* %1 to {}**
+ %3 = load {}*, {}** %2, align 2
+ %4 = getelementptr inbounds %"fmt::Formatter", %"fmt::Formatter"* %0, i16 0, i32 1, i32 1
+ %5 = load void (i8*)**, void (i8*)*** %4, align 2
+ %6 = getelementptr inbounds void (i8*)*, void (i8*)** %5, i16 3
+ %7 = bitcast void (i8*)** %6 to i8 ({}*, i8*, i16)**
+ %8 = load i8 ({}*, i8*, i16)*, i8 ({}*, i8*, i16)** %7, align 2
+ %9 = tail call i8 %8({}* nonnull %3, i8* noalias nonnull readonly getelementptr inbounds ([0 x i8], [0 x i8]* @str.1b, i16 0, i16 0), i16 15)
+ unreachable
+}
+
+declare i32 @rust_eh_personality(...) unnamed_addr
+
+attributes #0 = { uwtable }
\ No newline at end of file
diff --git a/test/CodeGen/AVR/rust-avr-bug-95.ll b/test/CodeGen/AVR/rust-avr-bug-95.ll
new file mode 100644
index 00000000000..9534ceb26e7
--- /dev/null
+++ b/test/CodeGen/AVR/rust-avr-bug-95.ll
@@ -0,0 +1,37 @@
+; RUN: llc < %s -march=avr | FileCheck %s
+
+%"fmt::Formatter.1.77.153.229.305.381.1673" = type { [0 x i8], i32, [0 x i8], i32, [0 x i8], i8, [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], { {}*, {}* }, [0 x i8], { i8*, i8* }, [0 x i8], { [0 x { i8*, i8* }]*, i16 }, [0 x i8] }
+%"option::Option<usize>.0.76.152.228.304.380.1672" = type { [0 x i8], i8, [2 x i8] }
+
+@str.4S = external constant [5 x i8]
+
+; Function Attrs: uwtable
+define void @"_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E"(%"fmt::Formatter.1.77.153.229.305.381.1673"* dereferenceable(27) %__arg_0) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
+; CHECK-LABEL: "_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E"
+start:
+ %0 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 0
+ %1 = load {}*, {}** %0, align 1, !noalias !0, !nonnull !9
+ %2 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 1
+ %3 = bitcast {}** %2 to i1 ({}*, [0 x i8]*, i16)***
+ %4 = load i1 ({}*, [0 x i8]*, i16)**, i1 ({}*, [0 x i8]*, i16)*** %3, align 1, !noalias !0, !nonnull !9
+ %5 = getelementptr inbounds i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %4, i16 3
+ %6 = load i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %5, align 1, !invariant.load !9, !noalias !0, !nonnull !9
+ %7 = tail call zeroext i1 %6({}* nonnull %1, [0 x i8]* noalias nonnull readonly bitcast ([5 x i8]* @str.4S to [0 x i8]*), i16 5), !noalias !10
+ unreachable
+}
+
+declare i32 @rust_eh_personality(...) unnamed_addr
+
+attributes #0 = { uwtable }
+
+!0 = !{!1, !3, !5, !6, !8}
+!1 = distinct !{!1, !2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E: %data.0"}
+!2 = distinct !{!2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E"}
+!3 = distinct !{!3, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: argument 0"}
+!4 = distinct !{!4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E"}
+!5 = distinct !{!5, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: %name.0"}
+!6 = distinct !{!6, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: argument 0"}
+!7 = distinct !{!7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E"}
+!8 = distinct !{!8, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: %name.0"}
+!9 = !{}
+!10 = !{!3, !6}
\ No newline at end of file |
Is there a reason the above couldn't be merged? |
I'm happy with merging it, so long as we file an AVR backend bug at https://bugs.llvm.org/ |
I've written a bug report here. I should get around to merging this tomorrow. |
Should I convert this to a PR? What branch should I do it on? I've only compiled master, but I don't have time to test anyway (compiling is so slow...). |
Friendly ping :) @brainlag's patch something that could be submitted upstream or something to be kept around in Note that after applying this patch to fix one assertion (and already having some others applied for different issues) I'm getting the following assertion when compiling stock-
Could that be related to the proposed patch? |
I don't think this issue is related since after minimization it reproduces with an unpatched |
Upstreamed in r346114. Great work @brainlag, you even added tests Also raised LLVM bug PR39553 to track the workaround and a proper fix. |
This is an AVR-specific workaround for a limitation of the register allocator that only exposes itself on targets with high register contention like AVR, which only has three pointer registers. The three pointer registers are X, Y, and Z. In most nontrivial functions, Y is reserved for the frame pointer, as per the calling convention. This leaves X and Z. Some instructions, such as LPM ("load program memory"), are only defined for the Z register. Sometimes this just leaves X. When the backend generates a LDDWRdPtrQ instruction with Z as the destination pointer, it usually trips up the register allocator with this error message: LLVM ERROR: ran out of registers during register allocation This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction from ever using the Z register as an operand. This gives the register allocator a bit more space to allocate, fixing the regalloc exhaustion error. Here is a description from the patch author Peter Nimmervoll As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. More information about the bug can be found on the avr-rust issue tracker at avr-rust/rust-legacy-fork#37. A bug has raised to track the removal of this workaround and a proper fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553. Patch by Peter Nimmervoll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346114 91177308-0d34-0410-b5e6-96231b3b80d8
This is an AVR-specific workaround for a limitation of the register allocator that only exposes itself on targets with high register contention like AVR, which only has three pointer registers. The three pointer registers are X, Y, and Z. In most nontrivial functions, Y is reserved for the frame pointer, as per the calling convention. This leaves X and Z. Some instructions, such as LPM ("load program memory"), are only defined for the Z register. Sometimes this just leaves X. When the backend generates a LDDWRdPtrQ instruction with Z as the destination pointer, it usually trips up the register allocator with this error message: LLVM ERROR: ran out of registers during register allocation This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction from ever using the Z register as an operand. This gives the register allocator a bit more space to allocate, fixing the regalloc exhaustion error. Here is a description from the patch author Peter Nimmervoll As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. More information about the bug can be found on the avr-rust issue tracker at avr-rust/rust-legacy-fork#37. A bug has raised to track the removal of this workaround and a proper fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553. Patch by Peter Nimmervoll
This is an AVR-specific workaround for a limitation of the register allocator that only exposes itself on targets with high register contention like AVR, which only has three pointer registers. The three pointer registers are X, Y, and Z. In most nontrivial functions, Y is reserved for the frame pointer, as per the calling convention. This leaves X and Z. Some instructions, such as LPM ("load program memory"), are only defined for the Z register. Sometimes this just leaves X. When the backend generates a LDDWRdPtrQ instruction with Z as the destination pointer, it usually trips up the register allocator with this error message: LLVM ERROR: ran out of registers during register allocation This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction from ever using the Z register as an operand. This gives the register allocator a bit more space to allocate, fixing the regalloc exhaustion error. Here is a description from the patch author Peter Nimmervoll As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. More information about the bug can be found on the avr-rust issue tracker at avr-rust/rust-legacy-fork#37. A bug has raised to track the removal of this workaround and a proper fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553. Patch by Peter Nimmervoll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346114 91177308-0d34-0410-b5e6-96231b3b80d8
This is an AVR-specific workaround for a limitation of the register allocator that only exposes itself on targets with high register contention like AVR, which only has three pointer registers. The three pointer registers are X, Y, and Z. In most nontrivial functions, Y is reserved for the frame pointer, as per the calling convention. This leaves X and Z. Some instructions, such as LPM ("load program memory"), are only defined for the Z register. Sometimes this just leaves X. When the backend generates a LDDWRdPtrQ instruction with Z as the destination pointer, it usually trips up the register allocator with this error message: LLVM ERROR: ran out of registers during register allocation This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction from ever using the Z register as an operand. This gives the register allocator a bit more space to allocate, fixing the regalloc exhaustion error. Here is a description from the patch author Peter Nimmervoll As far as I understand the problem occurs when LDDWRdPtrQ uses the ptrdispregs register class as target register. This should work, but the allocator can't deal with this for some reason. So from my testing, it seams like (and I might be totally wrong on this) the allocator reserves the Z register for the ICALL instruction and then the register class ptrdispregs only has 1 register left and we can't use Y for source and destination. Removing the Z register from DREGS fixes the problem but removing Y register does not. More information about the bug can be found on the avr-rust issue tracker at avr-rust/rust-legacy-fork#37. A bug has raised to track the removal of this workaround and a proper fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553. Patch by Peter Nimmervoll llvm-svn=346114
The root cause has been fixed in upstream LLVM. Raise #126 to track removal of the hack. |
(See below for a further minimized example)
Error message
Reproduction
I'm not actually sure if Rust or LLVM should be handling this...
The text was updated successfully, but these errors were encountered: