-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: checked/release asm diff in GitHub_17777 #64793
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsFrom #61335 (comment): The pipeline caught two asm diffs - Link.
It's the same test in each case:
windows: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch Maybe related to #64162? One difference is the Checked build saves and restores all the callee-saved registers x19-x28 (but never uses them), whereas Release doesn't save/restore or use them. Unfortunately, even though it's a small test, it generates almost 300,000 arm64 instructions.
|
The Release build is hitting this NO_WAY assert and dropping back to MinOpts:
with:
the igNum is an unsigned int, so we're not overflowing that. So maybe there's some corruption somewhere? |
Actually, this does look like #64162, where we're overflowing the single allowed prolog IG. We just need to increase this:
Perhaps some of the additional instructions generated recently for zero initialization, stack probing, etc., are eating up the static size that was available. This number doesn't ensure the same number of instructions fit it both DEBUG and non-DEBUG builds, which is why DEBUG might not be failing. |
This also impacts OSR on this test ... a fix would be nice. |
Do you have any guidance on how many additional instructions OSR might require in the prolog, in the worst case? |
Current IG buffer stats for ARM64: DEBUG build:
Release build:
|
I propose to basically double the arm32/arm64 insGroup buffer, to:
Which gives (for this test case, for the per-test stats): DEBUG build:
Release Build:
There is nothing scientific about this number, other than it fixes this problem, and leaves significant additional headroom for OSR or other scenarios. |
What's the downside of doing this? Is it that we need a new IG per "superblock" (join-free span of blocks) during regular emission, so we may have a lot of wasted IG space in very branchy methods? |
The downside is the single, global IG buffer is larger. So in the example above, we allocate 3200 bytes instead of 1712. If we only ever use 2000 bytes, we "waste" 1200 bytes. All individual IGs are precisely sized when they are "saved" (when we reach a label, typically). On the plus side, we will have fewer "extension" / "overflow" groups, because fewer blocks will reach the global buffer size. |
Doesn't sound too bad then. |
We require that the maximum number of prolog instructions all fit in one instruction group. Recent changes appear to have increased the number of instructions we are generating the prolog, leading to NOWAY assert on Release builds and test failure on linux-arm64. Bump up the number to avoid this problem, and leave some headroom for possible additional needs. Fixes dotnet#64162, dotnet#64793.
We require that the maximum number of prolog instructions all fit in one instruction group. Recent changes appear to have increased the number of instructions we are generating the prolog, leading to NOWAY assert on Release builds and test failure on linux-arm64. Bump up the number to avoid this problem, and leave some headroom for possible additional needs. Fixes #64162, #64793.
Fixed by #65153 |
From #61335 (comment):
The pipeline caught two asm diffs - Link.
-- ISSUE: <ASM_DIFF> main method 246505 of size 2418 differs
-- ISSUE: <ASM_DIFF> main method 249554 of size 2418 differs
It's the same test in each case:
Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int (MethodHash=8ad14d6d)
windows: ccb0c159-04b3-47f6-993e-79114c9cbef8.windows.arm64\coreclr_tests.pmi.windows.arm64.checked.mch
Linux: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch
Maybe related to #64162?
One difference is the Checked build saves and restores all the callee-saved registers x19-x28 (and allocates arguments/locals to them, used as w19-w28), whereas Release doesn't save/restore or use them.
Unfortunately, even though it's a small test, it generates almost 300,000 arm64 instructions.
The text was updated successfully, but these errors were encountered: