-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix negative immediate offset for unbuffered smem loads #79553
[AMDGPU] Fix negative immediate offset for unbuffered smem loads #79553
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: None (vangthao95) ChangesFor unbuffered smem loads, It is illegal and undefined for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. As a workaround for this issue, if there is no SGPR[Offset] and the immediate offset is negative, subtract the absolute value of the immediate offset from the base address. Then change the immediate offset to 0. Patch is 41.48 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79553.diff 10 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
index 152f495a452ba2..0017f51ee5d925 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
@@ -116,6 +116,10 @@ def gi_smrd_sgpr_imm :
GIComplexOperandMatcher<s64, "selectSmrdSgprImm">,
GIComplexPatternEquiv<SMRDSgprImm>;
+def gi_smrd_prefetch_imm :
+ GIComplexOperandMatcher<s64, "selectSmrdPrefetchImm">,
+ GIComplexPatternEquiv<SMRDPrefetchImm>;
+
def gi_flat_offset :
GIComplexOperandMatcher<s64, "selectFlatOffset">,
GIComplexPatternEquiv<FlatOffset>;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 4f7bf3f7d35e71..d57314310184e3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -2072,13 +2072,16 @@ SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {
// true, match only 32-bit immediate offsets available on CI.
bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,
SDValue *SOffset, SDValue *Offset,
- bool Imm32Only,
- bool IsBuffer) const {
+ bool Imm32Only, bool IsBuffer,
+ bool IsPrefetch,
+ bool HasSOffset) const {
if (SOffset && Offset) {
assert(!Imm32Only && !IsBuffer);
SDValue B;
- return SelectSMRDBaseOffset(Addr, B, nullptr, Offset) &&
- SelectSMRDBaseOffset(B, SBase, SOffset, nullptr);
+ return SelectSMRDBaseOffset(Addr, B, nullptr, Offset, false, false,
+ IsPrefetch, true) &&
+ SelectSMRDBaseOffset(B, SBase, SOffset, nullptr, false, false,
+ IsPrefetch, true);
}
// A 32-bit (address + offset) should not cause unsigned 32-bit integer
@@ -2097,12 +2100,39 @@ bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,
}
if (!N0 || !N1)
return false;
+
+ bool Selected = false;
if (SelectSMRDOffset(N1, SOffset, Offset, Imm32Only, IsBuffer)) {
SBase = N0;
- return true;
+ Selected = true;
}
+
if (SelectSMRDOffset(N0, SOffset, Offset, Imm32Only, IsBuffer)) {
SBase = N1;
+ Selected = true;
+ }
+
+ if (Selected) {
+ // For unbuffered smem loads, it is illegal and undefined for the Immediate
+ // Offset to be negative if the resulting (Offset + (M0 or SOffset or zero)
+ // is negative. Handle the case where the Immediate Offset is negative and
+ // there is no SOffset.
+ //
+ // FIXME: Also handle M0 or SOffset case?
+ if (Offset && !HasSOffset && !IsBuffer && !IsPrefetch &&
+ Subtarget->getGeneration() >= AMDGPUSubtarget::GFX11) {
+ if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(*Offset)) {
+ if (C->getSExtValue() < 0) {
+ SDLoc SL(SBase);
+ *Offset = CurDAG->getTargetConstant(std::abs(C->getSExtValue()), SL,
+ MVT::i32);
+ const SDValue Ops[] = {SBase, *Offset};
+ SBase = SDValue(
+ CurDAG->getMachineNode(AMDGPU::S_SUB_U64, SL, MVT::i64, Ops), 0);
+ *Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);
+ }
+ }
+ }
return true;
}
return false;
@@ -2110,8 +2140,8 @@ bool AMDGPUDAGToDAGISel::SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase,
bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,
SDValue *SOffset, SDValue *Offset,
- bool Imm32Only) const {
- if (SelectSMRDBaseOffset(Addr, SBase, SOffset, Offset, Imm32Only)) {
+ bool Imm32Only, bool IsPrefetch) const {
+ if (SelectSMRDBaseOffset(Addr, SBase, SOffset, Offset, Imm32Only, IsPrefetch)) {
SBase = Expand32BitAddress(SBase);
return true;
}
@@ -2170,6 +2200,11 @@ bool AMDGPUDAGToDAGISel::SelectSMRDBufferSgprImm(SDValue N, SDValue &SOffset,
/* IsBuffer */ true);
}
+bool AMDGPUDAGToDAGISel::SelectSMRDPrefetchImm(SDValue Addr, SDValue &SBase,
+ SDValue &Offset) const {
+ return SelectSMRD(Addr, SBase, /* SOffset */ nullptr, &Offset, false, true);
+}
+
bool AMDGPUDAGToDAGISel::SelectMOVRELOffset(SDValue Index,
SDValue &Base,
SDValue &Offset) const {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
index 3b42d88df0c246..5328ba985474dd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.h
@@ -194,11 +194,13 @@ class AMDGPUDAGToDAGISel : public SelectionDAGISel {
SDValue *Offset, bool Imm32Only = false,
bool IsBuffer = false) const;
SDValue Expand32BitAddress(SDValue Addr) const;
- bool SelectSMRDBaseOffset(SDValue Addr, SDValue &SBase, SDValue *SOffset,
- SDValue *Offset, bool Imm32Only = false,
- bool IsBuffer = false) const;
- bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue *SOffset,
- SDValue *Offset, bool Imm32Only = false) const;
+ bool SelectSMRDBaseOffset(SDValue Addr, SDValue & SBase, SDValue * SOffset,
+ SDValue * Offset, bool Imm32Only = false,
+ bool IsBuffer = false, bool IsPrefetch = false,
+ bool HasSOffset = false) const;
+ bool SelectSMRD(SDValue Addr, SDValue & SBase, SDValue * SOffset,
+ SDValue * Offset, bool Imm32Only = false,
+ bool IsPrefetch = false) const;
bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &SOffset) const;
@@ -208,6 +210,8 @@ class AMDGPUDAGToDAGISel : public SelectionDAGISel {
bool SelectSMRDBufferImm32(SDValue N, SDValue &Offset) const;
bool SelectSMRDBufferSgprImm(SDValue N, SDValue &SOffset,
SDValue &Offset) const;
+ bool SelectSMRDPrefetchImm(SDValue Addr, SDValue & SBase, SDValue & Offset)
+ const;
bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;
bool SelectVOP3ModsImpl(SDValue In, SDValue &Src, unsigned &SrcMods,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index f255d098b631c7..c7cc701c63dc54 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -4221,7 +4221,8 @@ AMDGPUInstructionSelector::selectVINTERPModsHi(MachineOperand &Root) const {
bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root,
Register &Base,
Register *SOffset,
- int64_t *Offset) const {
+ int64_t *Offset,
+ bool IsPrefetch) const {
MachineInstr *MI = Root.getParent();
MachineBasicBlock *MBB = MI->getParent();
@@ -4257,6 +4258,27 @@ bool AMDGPUInstructionSelector::selectSmrdOffset(MachineOperand &Root,
if (Offset && GEPI.SgprParts.size() == 1 && EncodedImm) {
Base = GEPI.SgprParts[0];
*Offset = *EncodedImm;
+ // For unbuffered smem loads, it is illegal and undefined for the Immediate
+ // Offset to be negative if the resulting (Offset + (M0 or SOffset or zero)
+ // is negative. Handle the case where the Immediate Offset is negative and
+ // there is no SOffset.
+ //
+ // FIXME: Also handle M0 or SOffset case?
+ if (!IsPrefetch && *Offset < 0 &&
+ STI.getGeneration() >= AMDGPUSubtarget::GFX11) {
+ // Subtract the absolute value of the offset from the base register and
+ // set the immediate offset to 0.
+ Register SubtractReg =
+ MRI->createVirtualRegister(&AMDGPU::SReg_64RegClass);
+
+ BuildMI(*MBB, MI, MI->getDebugLoc(), TII.get(AMDGPU::S_SUB_U64),
+ SubtractReg)
+ .addReg(Base)
+ .addImm(std::abs(*Offset));
+ Base = SubtractReg;
+ *Offset = 0;
+ }
+
return true;
}
@@ -4339,6 +4361,17 @@ AMDGPUInstructionSelector::selectSmrdSgprImm(MachineOperand &Root) const {
[=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); }}};
}
+InstructionSelector::ComplexRendererFns
+AMDGPUInstructionSelector::selectSmrdPrefetchImm(MachineOperand &Root) const {
+ Register Base;
+ int64_t Offset;
+ if (!selectSmrdOffset(Root, Base, /* SOffset= */ nullptr, &Offset, true))
+ return std::nullopt;
+
+ return {{[=](MachineInstrBuilder &MIB) { MIB.addReg(Base); },
+ [=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); }}};
+}
+
std::pair<Register, int>
AMDGPUInstructionSelector::selectFlatOffsetImpl(MachineOperand &Root,
uint64_t FlatVariant) const {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
index ef7630f137aca6..573bc9260c765b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
@@ -220,8 +220,10 @@ class AMDGPUInstructionSelector final : public InstructionSelector {
InstructionSelector::ComplexRendererFns
selectVINTERPModsHi(MachineOperand &Root) const;
- bool selectSmrdOffset(MachineOperand &Root, Register &Base, Register *SOffset,
- int64_t *Offset) const;
+ bool selectSmrdOffset(MachineOperand & Root, Register & Base,
+ Register * SOffset, int64_t * Offset,
+ bool IsPrefetch = false) const;
+
InstructionSelector::ComplexRendererFns
selectSmrdImm(MachineOperand &Root) const;
InstructionSelector::ComplexRendererFns
@@ -230,6 +232,8 @@ class AMDGPUInstructionSelector final : public InstructionSelector {
selectSmrdSgpr(MachineOperand &Root) const;
InstructionSelector::ComplexRendererFns
selectSmrdSgprImm(MachineOperand &Root) const;
+ InstructionSelector::ComplexRendererFns
+ selectSmrdPrefetchImm(MachineOperand &Root) const;
std::pair<Register, int> selectFlatOffsetImpl(MachineOperand &Root,
uint64_t FlatVariant) const;
diff --git a/llvm/lib/Target/AMDGPU/SMInstructions.td b/llvm/lib/Target/AMDGPU/SMInstructions.td
index f082de35b6ae9a..8c9e3fa5b9b738 100644
--- a/llvm/lib/Target/AMDGPU/SMInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SMInstructions.td
@@ -859,6 +859,7 @@ def SMRDSgprImm : ComplexPattern<iPTR, 3, "SelectSMRDSgprImm">;
def SMRDBufferImm : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm">;
def SMRDBufferImm32 : ComplexPattern<iPTR, 1, "SelectSMRDBufferImm32">;
def SMRDBufferSgprImm : ComplexPattern<iPTR, 2, "SelectSMRDBufferSgprImm">;
+def SMRDPrefetchImm : ComplexPattern<iPTR, 2, "SelectSMRDPrefetchImm">;
multiclass SMRD_Pattern <string Instr, ValueType vt, bit immci = true> {
@@ -1080,7 +1081,7 @@ def i32imm_one : TImmLeaf <i32, [{
multiclass SMPrefetchPat<string type, TImmLeaf cache_type> {
def : GCNPat <
- (smrd_prefetch (SMRDImm i64:$sbase, i32:$offset), timm, timm, cache_type),
+ (smrd_prefetch (SMRDPrefetchImm i64:$sbase, i32:$offset), timm, timm, cache_type),
(!cast<SM_Prefetch_Pseudo>("S_PREFETCH_"#type) $sbase, $offset, (i32 SGPR_NULL), (i8 0))
>;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir
index c44477273dad09..b7010e4c65beb9 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-constant.mir
@@ -3,7 +3,7 @@
# RUN: llc -amdgpu-global-isel-new-legality -mtriple=amdgcn -mcpu=hawaii -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s | FileCheck -check-prefix=GFX7 %s
# RUN: llc -amdgpu-global-isel-new-legality -mtriple=amdgcn -mcpu=fiji -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s | FileCheck -check-prefix=GFX8 %s
# RUN: llc -amdgpu-global-isel-new-legality -mtriple=amdgcn -mcpu=gfx1010 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s | FileCheck -check-prefix=GFX10 %s
-# RUN: llc -amdgpu-global-isel-new-legality -mtriple=amdgcn -mcpu=gfx1100 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s | FileCheck -check-prefix=GFX10 %s
+# RUN: llc -amdgpu-global-isel-new-legality -mtriple=amdgcn -mcpu=gfx1100 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s | FileCheck -check-prefix=GFX11 %s
---
@@ -44,6 +44,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]], 0, 0 :: (load (s32), addrspace 4)
; GFX10-NEXT: $sgpr0 = COPY [[S_LOAD_DWORD_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_s32_from_4
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]], 0, 0 :: (load (s32), addrspace 4)
+ ; GFX11-NEXT: $sgpr0 = COPY [[S_LOAD_DWORD_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(s32) = G_LOAD %0 :: (load (s32), align 4, addrspace 4)
$sgpr0 = COPY %1
@@ -89,6 +96,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]], 0, 0 :: (load (<2 x s16>), addrspace 4)
; GFX10-NEXT: $sgpr0 = COPY [[S_LOAD_DWORD_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_v2s16_from_4
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM [[COPY]], 0, 0 :: (load (<2 x s16>), addrspace 4)
+ ; GFX11-NEXT: $sgpr0 = COPY [[S_LOAD_DWORD_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(<2 x s16>) = G_LOAD %0 :: (load (<2 x s16>), align 4, addrspace 4)
$sgpr0 = COPY %1
@@ -133,6 +147,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (<2 x s32>), addrspace 4)
; GFX10-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_v2s32
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (<2 x s32>), addrspace 4)
+ ; GFX11-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(<2 x s32>) = G_LOAD %0 :: (load (<2 x s32>), align 8, addrspace 4)
$sgpr0_sgpr1 = COPY %1
@@ -176,6 +197,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (<2 x s32>), align 4, addrspace 4)
; GFX10-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_v2s32_align4
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (<2 x s32>), align 4, addrspace 4)
+ ; GFX11-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(<2 x s32>) = G_LOAD %0 :: (load (<2 x s32>), align 4, addrspace 4)
$sgpr0_sgpr1 = COPY %1
@@ -219,6 +247,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (<4 x s16>), align 4, addrspace 4)
; GFX10-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_v4s16_align4
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (<4 x s16>), align 4, addrspace 4)
+ ; GFX11-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(<4 x s16>) = G_LOAD %0 :: (load (<4 x s16>), align 4, addrspace 4)
$sgpr0_sgpr1 = COPY %1
@@ -263,6 +298,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY]], 0, 0 :: (load (<4 x s32>), align 4, addrspace 4)
; GFX10-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[S_LOAD_DWORDX4_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_v4s32_align4
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY]], 0, 0 :: (load (<4 x s32>), align 4, addrspace 4)
+ ; GFX11-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[S_LOAD_DWORDX4_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(<4 x s32>) = G_LOAD %0 :: (load (<4 x s32>), align 4, addrspace 4)
$sgpr0_sgpr1_sgpr2_sgpr3 = COPY %1
@@ -307,6 +349,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (s64), addrspace 4)
; GFX10-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_s64
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (s64), addrspace 4)
+ ; GFX11-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(s64) = G_LOAD %0 :: (load (s64), align 8, addrspace 4)
$sgpr0_sgpr1 = COPY %1
@@ -351,6 +400,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (s64), align 4, addrspace 4)
; GFX10-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_s64_align4
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]], 0, 0 :: (load (s64), align 4, addrspace 4)
+ ; GFX11-NEXT: $sgpr0_sgpr1 = COPY [[S_LOAD_DWORDX2_IMM]]
%0:sgpr(p4) = COPY $sgpr0_sgpr1
%1:sgpr(s64) = G_LOAD %0 :: (load (s64), align 4, addrspace 4)
$sgpr0_sgpr1 = COPY %1
@@ -395,6 +451,13 @@ body: |
; GFX10-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
; GFX10-NEXT: [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY]], 0, 0 :: (load (<2 x s64>), align 4, addrspace 4)
; GFX10-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[S_LOAD_DWORDX4_IMM]]
+ ;
+ ; GFX11-LABEL: name: load_constant_v2s64
+ ; GFX11: liveins: $sgpr0_sgpr1
+ ; GFX11-NEXT: {{ $}}
+ ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_64 = COPY $sgpr0_sgpr1
+ ; GFX11-NEXT: [[S_LOAD_DWORDX4_IMM:%[0-9]+]]:sgpr_128 = S_LOAD_DWORDX4_IMM [[COPY]], 0, 0 :: (load (<2 x s64>), align 4, addrspace 4)
+ ; GFX11-NEXT: $sgpr0_sgpr1_sgpr2_sgpr3 = COPY [[S_LOAD_...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
0880df2
to
8194ef1
Compare
ping |
def gi_smrd_prefetch_imm : | ||
GIComplexOperandMatcher<s64, "selectSmrdPrefetchImm">, | ||
GIComplexPatternEquiv<SMRDPrefetchImm>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like new globalisel support for prefetch, this should be done separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is splitting the handling of prefetch from general handling of smrd selection. There wasn't a distinction between prefetch and s_load instruction so I had to create a way to distinguish between the two. It should be a NFC change. Should this still be done separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Hi @jayfoad @Pierre-vh @rampitec, Matt is on vacation this week, may I get some help on reviewing this patch? Thanks |
ping |
Selected = true; | ||
} | ||
|
||
if (Selected) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing, you're allowing the invalid selection and then later trying to adjust the offset. It would be clearer if you adjusted the initial selection to reject cases where the immediate cannot be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this code inside SelectSMRDOffset() where the offset is selected to make it clearer. Although for when sgpr[offset] (SOffset) is present, the offset is selected first before SOffset is selected so I had to adjust it after.
ping |
Hi @arsenm or @Pierre-vh , Can you help take a look at the latest changes? Thanks |
|
||
KnownBits SKnown = CurDAG->computeKnownBits(*SOffset); | ||
if (ByteOffset + SKnown.getMinValue().getSExtValue() < 0) | ||
return subtractOffsetFromBase(&SBase, Offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still trying to fixup illegal offsets. Can you have the pattern fail to select in that case, and allow the standalone pointer operands select independently?
if (EncodedOffset >= 0 || IsBuffer || HasSOffset || | ||
!Subtarget->hasSignedSMRDImmOffset()) | ||
return true; | ||
// For unbuffered smem loads, it is illegal and undefined for the Immediate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just say illegal, not "illegal and undefined"
unsigned Opc; | ||
|
||
if (Subtarget->hasScalarAddSub64()) | ||
Opc = AMDGPU::S_SUB_U64; | ||
else | ||
Opc = AMDGPU::S_SUB_U64_PSEUDO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize Opc with ternary
Co-authored-by: Wang Pengcheng <[email protected]>
Co-authored-by: Wang Pengcheng <[email protected]>
Co-authored-by: Wang Pengcheng <[email protected]>
…nstructions by SEW Co-authored-by: Wang Pengcheng <[email protected]>
…uctions by SEW Co-authored-by: Wang Pengcheng <[email protected]>
Co-authored-by: Wang Pengcheng <[email protected]>
…l_for specialization (llvm#85611) This pull request implements LWG3736: move_iterator missing disable_sized_sentinel_for specialization.
…hes with Objective-C types (llvm#87807) Depends on llvm#87767
…ing::SimplifySetCC These test cases show some miscomplies for big-endian when dealing with non byte-sized loads. One part of the problem is that LLVM IR isn't really telling where the padding goes for non byte-sized loads/stores. So currently TargetLowering::SimplifySetCC can't assume anything about it. But the implementation also do not consider that the TypeStoreSize could be larger than the TypeSize, resulting in the offset calculation being wrong for big-endian. Pre-commit for llvm#87646
) The load narrowing part of TargetLowering::SimplifySetCC is updated according to this: 1) The offset calculation (for big endian) did not work properly for non byte-sized types. This is basically solved by an early exit if the memory type isn't byte-sized. But the code is also corrected to use the store size when calculating the offset. 2) To still allow some optimizations for non-byte-sized types the TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is added. By default it assumes that scalar integer types are padded starting at the most significant bits, if the type needs padding when being stored to memory. 3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is true, as that hook makes it possible for TargetLowering to know how the non byte-sized value is aligned in memory. 4) Update the algorithm to always search for a narrowed load with a power-of-2 byte-sized type. In the past the algorithm started with the the width of the original load, and then divided it by two for each iteration. But for a type such as i48 that would just end up trying to narrow the load into a i24 or i12 load, and then we would fail sooner or later due to not finding a newVT that fulfilled newVT.isRound(). With this new approach we can narrow the i48 load into either an i8, i16 or i32 load. By checking if such a load is allowed (e.g. alignment wise) for any "multiple of 8 offset", then we can find more opportunities for the optimization to trigger. So even for a byte-sized type such as i32 we may now end up narrowing the load into loading the 16 bits starting at offset 8 (if that is allowed by the target). The old algorithm did not even consider that case. 5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset when creating the new ptr. This way we get "nsw" on the add.
Move bitfield access clipping to bitfield access computation.
In llvm#88317, the clang resource headers was converted to an interface library. Update LLDB and fix the Xcode standalone build. Thanks Evan for the help!
llvm#79940 put calls to recomputeLiveIns into a loop, to repeatedly call the function until the computation converges. However, this repeats a lot of code. This changes moves the loop into a function to simplify the handling. Note that this changes the order in which recomputeLiveIns is called. For example, ``` bool anyChange = false; do { anyChange = recomputeLiveIns(*ExitMBB) || recomputeLiveIns(*LoopMBB); } while (anyChange); ``` only begins to recompute the live-ins for LoopMBB after the computation for ExitMBB has converged. With this change, all basic blocks have a recomputation of the live-ins for each loop iteration. This can result in less or more calls, depending on the situation.
Test, as expected, fails with Asan on system with 5lvl page tables. Disabling the test to migrate buildbot.
…ommonInvocationForModuleBuild` into its own function (llvm#88447) The new function is about clearing out benign codegen options and can be applied for PCH invocations as well.
Links to `llvm.mlir.global` and `llvm.mlir.addressof` in the ["Globals" section of LLVM dialect documentation](https://mlir.llvm.org/docs/Dialects/LLVM/#globals) are broken.
… explicit instructions
Rebased got messed up. Sorry for the noise. |
For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553.
…lim) This is a cherry-pick of an unmerged upstream change llvm#89165 Once the upstream change is finished and merged, this will need reverting. Original commit message: For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553. Change-Id: I235ac5d0de5da2a1544760ab3c9749665340310d
) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of #79553.
…m#89165) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553. Change-Id: Ic920b497c378cbc6a7cc2bbca300e9391893b3a0
…m#89165) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553.
…m#89165) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm#79553. Change-Id: Ieb5681777d517fbc6ea59014af8babc9980c2b1e
For unbuffered smem loads, It is illegal and undefined for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. As a workaround for this issue, if there is no SGPR[Offset] and the immediate offset is negative, subtract the absolute value of the immediate offset from the base address. Then change the immediate offset to 0.