Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor codegen for rtio_output #667

Closed
whitequark opened this issue Feb 3, 2017 · 6 comments
Closed

poor codegen for rtio_output #667

whitequark opened this issue Feb 3, 2017 · 6 comments

Comments

@whitequark
Copy link
Contributor

whitequark commented Feb 3, 2017

l.ori r12,r8,0x4c 
l.sw 0(r12),r6 

instead of

l.sw 0x4c(r12),r6 
@whitequark
Copy link
Contributor Author

Here's the LLVM patch that fixes this:

diff --git a/lib/Target/OR1K/OR1KISelDAGToDAG.cpp b/lib/Target/OR1K/OR1KISelDAGToDAG.cpp
index 0f4820db2ad..10a39e86d6a 100644
--- a/lib/Target/OR1K/OR1KISelDAGToDAG.cpp
+++ b/lib/Target/OR1K/OR1KISelDAGToDAG.cpp
@@ -143,6 +143,18 @@ SelectAddr(SDValue Addr, SDValue &Base, SDValue &Offset) {
     }
   }
 
+  // Fold the ORI in MOVHI->ORI->LW/SW chains into LW/SW, if possible.
+  if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(Addr.getNode())) {
+    uint64_t AddrImm = CN->getZExtValue();
+    // The LW/SW offset is sign-extended, and we want to avoid subtraction.
+    if(AddrImm & 0x8000 == 0) {
+      SDValue AddrHigh = CurDAG->getTargetConstant(AddrImm >> 16, dl, MVT::i32);
+      Base = SDValue(CurDAG->getMachineNode(OR1K::MOVHI, dl, MVT::i32, AddrHigh), 0);
+      Offset = CurDAG->getTargetConstant(AddrImm & 0x7FFF, dl, MVT::i32);
+      return true;
+    }
+  }
+
   Base   = Addr;
   Offset = CurDAG->getTargetConstant(0, dl, MVT::i32);
   return true;

Unfortunately it (while correctly changing codegen) pessimizes test_pulse_rate_dds for unknown reasons, and has a negligible effect on test_pulse_rate. Therefore it doesn't seem worth integrating.

@jordens
Copy link
Member

jordens commented Feb 4, 2017

Doesn't this indicate that there is a bigger bug somewhere else?

@whitequark
Copy link
Contributor Author

whitequark commented Feb 4, 2017

@jordens Maybe, depending on how you define "bug". As @sbourdeauducq has mentioned elsewhere the root cause of this could be cache aliasing. Unfortunately MiSoC CPU &c cores do not currently provide any insight into their operation--there are no performance counters or anything. The most I could do is a sampling profiler, based on libunwind, but I don't think that will help this issue.

Arguably we should look into it. However, I'm not sure how this fits into the overall roadmap, and this is definitely a nontrivial change--we're looking, at the very least, at a fork of mor1kx and accompanying infrastructure changes, which is annoying enough already.

@whitequark whitequark reopened this Feb 5, 2017
@whitequark
Copy link
Contributor Author

whitequark commented Feb 5, 2017

Per discussion with @jordens the fix should be merged and test condition relaxed (until the other bug is fixed).

@whitequark
Copy link
Contributor Author

Before:

_ZN8ksupport4rtio6output17h8c7b66fd3869315aE:
        l.sw    -4(r1), r9
        l.sw    -8(r1), r2
        l.addi  r2, r1, 0
        l.addi  r1, r1, -8
        l.movhi r8, 40960
        l.ori   r12, r8, 8
        l.sw    0(r12), r5
        l.ori   r12, r8, 80
        l.sw    0(r12), r3
        l.ori   r12, r8, 84
        l.sw    0(r12), r4
        l.ori   r12, r8, 76
        l.sw    0(r12), r6
        l.ori   r6, r8, 72
        l.sw    0(r6), r7
        l.ori   r6, r8, 88
        l.addi  r7, r0, 1
        l.sw    0(r6), r7
        l.ori   r6, r8, 92
        l.lwz   r6, 0(r6)
        l.sfeqi r6, 0
        l.bf    .LBB0_2
        l.nop   0                       # in delay slot
        l.jal   _ZN8ksupport4rtio26process_exceptional_status17h3e4c7308429223f3E
        l.nop   0                       # in delay slot
        l.addi  r1, r2, 0
        l.lwz   r9, -4(r1)
        l.jr    r9
        l.lwz   r2, -8(r1)              # in delay slot

After:

_ZN8ksupport4rtio6output17h8c7b66fd3869315aE:
        l.sw    -4(r1), r9
        l.sw    -8(r1), r2
        l.addi  r2, r1, 0
        l.addi  r1, r1, -8
        l.movhi r8, 40960
        l.sw    8(r8), r5
        l.sw    80(r8), r3
        l.sw    84(r8), r4
        l.sw    76(r8), r6
        l.sw    72(r8), r7
        l.addi  r6, r0, 1
        l.sw    88(r8), r6
        l.lwz   r6, 92(r8)
        l.sfeqi r6, 0
        l.bf    .LBB0_2
        l.nop   0                       # in delay slot
        l.jal   _ZN8ksupport4rtio26process_exceptional_status17h3e4c7308429223f3E
        l.nop   0                       # in delay slot
        l.addi  r1, r2, 0
        l.lwz   r9, -4(r1)
        l.jr    r9
        l.lwz   r2, -8(r1)              # in delay slot

@sbourdeauducq sbourdeauducq added this to the ARTIQ-7 milestone May 17, 2022
@sbourdeauducq
Copy link
Member

RISC-V code seems fine.

4500ae88 <_ZN8ksupport4rtio3imp6output17he428221a06ac3e71E>:
4500ae88: 37 06 00 a0   lui     a2, 655360
4500ae8c: 23 20 a6 00   sw      a0, 0(a2)
4500ae90: 23 28 b6 08   sw      a1, 144(a2)
4500ae94: 83 25 86 09   lw      a1, 152(a2)
4500ae98: 13 f6 f5 0f   andi    a2, a1, 255
4500ae9c: 63 08 06 00   beqz    a2, 16 <_ZN8ksupport4rtio3imp6output17he428221a06ac3e71E+0x24>
4500aea0: 13 55 85 40   srai    a0, a0, 8
4500aea4: 17 03 00 00   auipc   t1, 0
4500aea8: 67 00 c3 00   jr      12(t1)
4500aeac: 67 80 00 00   ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants