-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: "Scaled" addressing mode on ARM #60808
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsAddressing modes on ARM are tricky, it only supports:
while on x64 we can do things like Codegen improvement example: void GetSet(int* data, int i, int j)
=> data[i] = data[j]; diff: ; Method Prog:GetSet(long,int,int):this
G_M13801_IG01:
stp fp, lr, [sp,#-16]!
mov fp, sp
G_M13801_IG02:
sxtw x0, w3
- lsl x0, x0, #2
- ldr w0, [x1, x0]
+ ldr w0, [x1, x0, LSL #2]
sxtw x2, w2
- lsl x2, x2, #2
- str w0, [x1, x2]
+ str w0, [x1, x2, LSL #2]
G_M13801_IG03:
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code: 40 Also, I removed coreclr_tests.pmi.Linux.arm64.checked.mch:
Detail diffs
libraries.crossgen2.Linux.arm64.checked.mch:
Detail diffs
libraries.pmi.Linux.arm64.checked.mch:
Detail diffs
libraries_tests.pmi.Linux.arm64.checked.mch:
Detail diffs
I have a more complicated improvement with bigger diffs but it's not ready yet.
|
It doesn't handle "array access via index" yet, e.g.: int a = array[i]; // int[] array, int i currently emits: mov w1, w1
lsl x1, x1, #2
add x1, x1, #16
ldr w0, [x0, x1] while could be: add x8, x0, w1, uxtw #2
ldr w0, [x8, #16] |
/azp run runtime-coreclr jitstressregs, runtime-coreclr outerloop, runtime-coreclr jitstress2-jitstressregs, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 4 pipeline(s). |
PTAL @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -3176,12 +3174,12 @@ bool Compiler::gtMarkAddrMode(GenTree* addr, int* pCostEx, int* pCostSz, var_typ | |||
assert(op1 != op1Save); | |||
assert(op2 != nullptr); | |||
|
|||
#if defined(TARGET_XARCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here explaining why arm64 doesn't/shouldn't do this walk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
#ifdef TARGET_ARMARCH | ||
if ((scale > 0) && (genTypeSize(targetType) != scale)) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases now where we look for and find a scale, but it is not the appropriate number, and so we bail out on creating an addressing mode that we would previously find?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: https://godbolt.org/z/xG4TsYYTG - in case of Test2
we bail out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but in that case we would also bail out in the baseline as well as with this change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding genCreateAddrMode is called from two places (gtSetEvalOrder and in Lower) and these checks are needed in both otherwise it leads to asserts.
@BruceForstall could you please take a look one more time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Long-term, I'm worried about one comment you added:
"For now we only handle MUL and LSH because
arm doesn't support both scale and offset at the same. Offset is handled
at the emitter as a peephole optimization."
There's been controversy over the years about what GenTreeAddrMode should represent; it is a fully general x86 addressing mode, as the comments seem to indicate? Or is it "target-specific" as the comments also say? Should it be extended to include the arm64 "sxt/uxt" shifters, for arm? It seems like handling "offset as a peephole optimization" is the wrong place for that; the full address mode should be properly represented in the GenTree.
Also, can you verify there are no spmi asm diffs on x86/x64? |
Just checked, no diffs.
I agree, I want to try to refactor it as it looks like a proper fix here will have ~200kb diffs on arm64 (for many indirect loads we emit up to 4 instructions where we could do it in 1 or 2), this PR was just a low-hanging fruit. |
@EgorBo I think this has regressed rolling builds. See i.e.: essentially every rolling build since then is consistently failing with that assert. If fix is unclear please consider temporarily reverting. Note: I'm not 100% sure this PR is responsible but looking at the list of commits: https://dev.azure.com/dnceng/public/_traceability/runview/changes?currentRunId=1446655 this looks most probable. |
improvements in linux-arm64 dotnet/perf-autofiling-issues#2148 and dotnet/perf-autofiling-issues#2141 and dotnet/perf-autofiling-issues#2253 |
Addressing modes on ARM are tricky, it only supports:
while on x64 we can do things like
[reg1 + 8 * reg2 + icon]
Codegen improvement example:
diff:
Also, I removed
SCALED_ADDR_MODES
- it's always defined for all targets and I don't think it simplifies porting to new platforms as before this PR this flag was set but scaled addr modes were disabled anyway.coreclr_tests.pmi.Linux.arm64.checked.mch:
Detail diffs
libraries.crossgen2.Linux.arm64.checked.mch:
Detail diffs
libraries.pmi.Linux.arm64.checked.mch:
Detail diffs
libraries_tests.pmi.Linux.arm64.checked.mch:
Detail diffs
I have a more complicated improvement with bigger diffs but it's not ready yet (see #60813 (comment))