-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand indirection cell calls in IR instead of doing ad-hoc optimizations in codegen #65388
Expand indirection cell calls in IR instead of doing ad-hoc optimizations in codegen #65388
Conversation
When the JIT calls a function that requires an indirection cell argument it is beneficial to load the call target through the already placed arg. Previously this was done as an ad-hoc optimization during codegen (with LSRA making sure to allocate an extra register). Instead of this, expand it as regular IR and remove the ad-hoc optimization. This change makes the optimization that were previously done work for CFG as well and generally seems to have (albeit small) positive diffs. Fix dotnet#65076
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWhen the JIT calls a function that requires an indirection cell argument This change makes the optimization that were previously done work for Fix #65076
|
Codegen diff for a simple interface call with CFG enabled: [MethodImpl(MethodImplOptions.NoInlining)]
public static void M(I i)
{
i.Foo();
} Arm64BeforeG_M10449_IG02: ;; offset=0008H
D280010F movz x15, #8
F2ADACCF movk x15, #0x6d66 LSL #16
F2CFFF2F movk x15, #0x7ff9 LSL #32
F94001EF ldr x15, [x15]
94000000 bl CORINFO_HELP_VALIDATE_INDIRECT_CALL
AA0F03E1 mov x1, x15
D280010B movz x11, #8
F2ADACCB movk x11, #0x6d66 LSL #16
F2CFFF2B movk x11, #0x7ff9 LSL #32
D63F0020 blr x1
AfterG_M10449_IG02: ;; offset=0008H
D2800101 movz x1, #8
F2ADACA1 movk x1, #0x6d65 LSL #16
F2CFFF21 movk x1, #0x7ff9 LSL #32
F940002F ldr x15, [x1]
94000000 bl CORINFO_HELP_VALIDATE_INDIRECT_CALL
AA0F03E2 mov x2, x15
AA0103EB mov x11, x1
D63F0040 blr x2 x64BeforeG_M10449_IG02: ;; offset=0009H
488B0D3833EDFF mov rcx, qword ptr [(reloc 0x7ff96d650008)]
E8EBAC9F5F call CORINFO_HELP_VALIDATE_INDIRECT_CALL
488BC1 mov rax, rcx
488B4C2430 mov rcx, gword ptr [rsp+30H]
49BB0800656DF97F0000 mov r11, 0x7FF96D650008
FFD0 call rax ; I:Foo():this
90 nop AfterG_M10449_IG02: ;; offset=0009H
49BA0800646DF97F0000 mov r10, 0x7FF96D640008
498B0A mov rcx, qword ptr [r10]
E8E5ACA05F call CORINFO_HELP_VALIDATE_INDIRECT_CALL
488BC1 mov rax, rcx
488B4C2430 mov rcx, gword ptr [rsp+30H]
4D8BDA mov r11, r10
FFD0 call rax ; I:Foo():this
90 nop (I will need to benchmark this to see if it is better to disable the expansion in this case) |
// | ||
assert(genIsValidIntReg(targetAddrReg)); | ||
assert(addr != nullptr); | ||
|
||
// Non-virtual direct call to known addresses | ||
#ifdef TARGET_ARM | ||
if (!validImmForBL((ssize_t)addr)) | ||
{ | ||
regNumber tmpReg = call->GetSingleTempReg(); | ||
instGen_Set_Reg_To_Imm(EA_HANDLE_CNS_RELOC, tmpReg, (ssize_t)addr); | ||
// clang-format off | ||
genEmitCall(emitter::EC_INDIR_R, | ||
methHnd, | ||
INDEBUG_LDISASM_COMMA(sigInfo) | ||
nullptr, // addr | ||
retSize | ||
MULTIREG_HAS_SECOND_GC_RET_ONLY_ARG(secondRetSize), | ||
NULL, | ||
retSize, | ||
di, | ||
targetAddrReg, | ||
tmpReg, | ||
call->IsFastTailCall()); | ||
// clang-format on | ||
} | ||
else | ||
{ | ||
// Generate a direct call to a non-virtual user defined or helper method | ||
assert(call->gtCallType == CT_HELPER || call->gtCallType == CT_USER_FUNC); | ||
|
||
void* addr = nullptr; | ||
#ifdef FEATURE_READYTORUN | ||
if (call->gtEntryPoint.addr != NULL) | ||
{ | ||
assert(call->gtEntryPoint.accessType == IAT_VALUE); | ||
addr = call->gtEntryPoint.addr; | ||
} | ||
else | ||
#endif // FEATURE_READYTORUN | ||
if (call->gtCallType == CT_HELPER) | ||
{ | ||
CorInfoHelpFunc helperNum = compiler->eeGetHelperNum(methHnd); | ||
noway_assert(helperNum != CORINFO_HELP_UNDEF); | ||
|
||
void* pAddr = nullptr; | ||
addr = compiler->compGetHelperFtn(helperNum, (void**)&pAddr); | ||
assert(pAddr == nullptr); | ||
} | ||
else | ||
{ | ||
// Direct call to a non-virtual user function. | ||
addr = call->gtDirectCallAddress; | ||
} | ||
|
||
assert(addr != nullptr); | ||
|
||
// Non-virtual direct call to known addresses | ||
#ifdef TARGET_ARM | ||
if (!validImmForBL((ssize_t)addr)) | ||
{ | ||
regNumber tmpReg = call->GetSingleTempReg(); | ||
instGen_Set_Reg_To_Imm(EA_HANDLE_CNS_RELOC, tmpReg, (ssize_t)addr); | ||
// clang-format off | ||
genEmitCall(emitter::EC_INDIR_R, | ||
methHnd, | ||
INDEBUG_LDISASM_COMMA(sigInfo) | ||
NULL, | ||
retSize, | ||
di, | ||
tmpReg, | ||
call->IsFastTailCall()); | ||
// clang-format on | ||
} | ||
else | ||
#endif // TARGET_ARM | ||
{ | ||
// clang-format off | ||
genEmitCall(emitter::EC_FUNC_TOKEN, | ||
methHnd, | ||
INDEBUG_LDISASM_COMMA(sigInfo) | ||
addr, | ||
retSize | ||
MULTIREG_HAS_SECOND_GC_RET_ONLY_ARG(secondRetSize), | ||
di, | ||
REG_NA, | ||
call->IsFastTailCall()); | ||
// clang-format on | ||
} | ||
{ | ||
// clang-format off | ||
genEmitCall(emitter::EC_FUNC_TOKEN, | ||
methHnd, | ||
INDEBUG_LDISASM_COMMA(sigInfo) | ||
addr, | ||
retSize | ||
MULTIREG_HAS_SECOND_GC_RET_ONLY_ARG(secondRetSize), | ||
di, | ||
REG_NA, | ||
call->IsFastTailCall()); | ||
// clang-format on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff is hard to read but the only change here is that I removed the first if case (with the ad-hoc optimization) and unindented the rest.
- Helpers can use indir cell args so we do not only see CT_USER_FUNC - Fix an incorrect redefinition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Seems like arm is not yet happy though.
…ion-cell-calls-as-ir
The callee-saved register used for the indir cell arg was not being killed at the call site leaving the arg register marked as busy.
After previous fix I believe this should no longer be necessary.
The ARM issue seemed to be an LSRA bug: for R2R calls we were not killing the register containing the indirection cell address at the call site which led to that register being marked busy for the remainder of the function. ARM is a bit special there since it uses a callee-preserved register. There was a hack in morph to work around this problem (I assume) that I have removed as well. |
…ion-cell-calls-as-ir
On x64 with CFG it looks like it's better to use the constant in both places: ; MyBenchmarks.SimpleIfaceCall.Call()
push rax
xor eax,eax
mov [rsp],rax
mov rcx,[rcx+8]
mov [rsp],rcx
- mov r10,7FF972ED0510
- mov rcx,[r10]
+ mov rcx,[7FF9736E0510]
call CORINFO_HELP_VALIDATE_INDIRECT_CALL
mov rax,rcx
mov rcx,[rsp]
- mov r11,r10
+ mov r11,7FF972EA0510
cmp [rcx],ecx
add rsp,8
jmp rax
-; Total bytes of code 52
+; Total bytes of code 53 we get BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1526 (21H2)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.100-alpha.1.21558.2
[Host] : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
Job-DBANTR : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
Job-BFNMJK : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
EnvironmentVariables=COMPlus_JitForceControlFlowGuard=1
|
Not totally sure if this change can replace the old ad-hoc optimization. A few observations:
|
Closing this, don't think we can teach RA/liveness to deal well with the extra locals required for this, and the regressions seem too large to ignore. |
When the JIT calls a function that requires an indirection cell argument
it is beneficial to load the call target through the already placed arg.
Previously this was done as an ad-hoc optimization during codegen (with
LSRA making sure to allocate an extra register). Instead of this, expand
it as regular IR and remove the ad-hoc optimization.
This change makes the optimization that were previously done work for
CFG as well and generally seems to have (albeit small) positive diffs.
Fix #65076