JIT: checked/release asm diff in GitHub_17777 #64793

BruceForstall · 2022-02-04T06:18:50Z

The pipeline caught two asm diffs - Link.

unix-arm64
-- ISSUE: <ASM_DIFF> main method 246505 of size 2418 differs
win-arm64
-- ISSUE: <ASM_DIFF> main method 249554 of size 2418 differs

It's the same test in each case:

Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int (MethodHash=8ad14d6d)

windows: ccb0c159-04b3-47f6-993e-79114c9cbef8.windows.arm64\coreclr_tests.pmi.windows.arm64.checked.mch
Linux: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch

Maybe related to #64162?

One difference is the Checked build saves and restores all the callee-saved registers x19-x28 (and allocates arguments/locals to them, used as w19-w28), whereas Release doesn't save/restore or use them.

Unfortunately, even though it's a small test, it generates almost 300,000 arm64 instructions.

The text was updated successfully, but these errors were encountered:

ghost · 2022-02-04T06:18:55Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

From #61335 (comment):

The pipeline caught two asm diffs - Link.

unix-arm64
-- ISSUE: <ASM_DIFF> main method 246505 of size 2418 differs
win-arm64
-- ISSUE: <ASM_DIFF> main method 249554 of size 2418 differs

It's the same test in each case:

Repro.Program:Test(int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int):int (MethodHash=8ad14d6d)

windows: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch
Linux: ccb0c159-04b3-47f6-993e-79114c9cbef8.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch

Maybe related to #64162?

One difference is the Checked build saves and restores all the callee-saved registers x19-x28 (but never uses them), whereas Release doesn't save/restore or use them.

Unfortunately, even though it's a small test, it generates almost 300,000 arm64 instructions.

Author:	BruceForstall
Assignees:	BruceForstall
Labels:	`arch-arm64`, `area-CodeGen-coreclr`
Milestone:	7.0.0

BruceForstall · 2022-02-04T08:03:33Z

The Release build is hitting this NO_WAY assert and dropping back to MinOpts:

        /* Are we overflowing? */
        if (ig->igNext && (ig->igNum + 1 != ig->igNext->igNum))
        {
            NO_WAY("Too many instruction groups");
        }

with:

		ig->igNum	1	unsigned int
		ig->igNext->igNum	4237	unsigned int

the igNum is an unsigned int, so we're not overflowing that. So maybe there's some corruption somewhere?

BruceForstall · 2022-02-04T08:16:35Z

Actually, this does look like #64162, where we're overflowing the single allowed prolog IG.

We just need to increase this:

#ifdef TARGET_ARMARCH
// The only place where this limited instruction group size is a problem is
// in the prolog, where we only support a single instruction group. We should really fix that.
// ARM32 and ARM64 both can require a bigger prolog instruction group. One scenario is where
// a function uses all the incoming integer and single-precision floating-point arguments,
// and must store them all to the frame on entry. If the frame is very large, we generate
// ugly code like "movw r10, 0x488; add r10, sp; vstr s0, [r10]" for each store, which
// eats up our insGroup buffer.
#define SC_IG_BUFFER_SIZE (100 * sizeof(emitter::instrDesc) + 14 * SMALL_IDSC_SIZE)
#else // !TARGET_ARMARCH
#define SC_IG_BUFFER_SIZE (50 * sizeof(emitter::instrDesc) + 14 * SMALL_IDSC_SIZE)
#endif // !TARGET_ARMARCH

Perhaps some of the additional instructions generated recently for zero initialization, stack probing, etc., are eating up the static size that was available.

This number doesn't ensure the same number of instructions fit it both DEBUG and non-DEBUG builds, which is why DEBUG might not be failing.

AndyAyersMS · 2022-02-05T23:45:44Z

we only support a single instruction group. We should really fix that.

This also impacts OSR on this test ... a fix would be nice.

BruceForstall · 2022-02-07T18:21:54Z

This also impacts OSR on this test ... a fix would be nice.

Do you have any guidance on how many additional instructions OSR might require in the prolog, in the worst case?

BruceForstall · 2022-02-10T19:23:02Z

Current IG buffer stats for ARM64:

DEBUG build:

SC_IG_BUFFER_SIZE             = 2624
SMALL_IDSC_SIZE per IG buffer = 164
instrDesc per IG buffer       = 109
instrDescJmp per IG buffer    = 46
instrDescCns per IG buffer    = 82
instrDescDsp per IG buffer    = 82
instrDescCnsDsp per IG buffer = 65
instrDescCGCA per IG buffer   = 41

Release build:

SC_IG_BUFFER_SIZE             = 1712
SMALL_IDSC_SIZE per IG buffer = 214
instrDesc per IG buffer       = 107
instrDescJmp per IG buffer    = 35
instrDescCns per IG buffer    = 71
instrDescDsp per IG buffer    = 71
instrDescCnsDsp per IG buffer = 53
instrDescCGCA per IG buffer   = 30

BruceForstall · 2022-02-10T19:57:44Z

I propose to basically double the arm32/arm64 insGroup buffer, to:

#define SC_IG_BUFFER_SIZE (200 * sizeof(emitter::instrDesc))

Which gives (for this test case, for the per-test stats):

DEBUG build:

SC_IG_BUFFER_SIZE             = 4800
SMALL_IDSC_SIZE per IG buffer = 300
instrDesc per IG buffer       = 200
instrDescJmp per IG buffer    = 85
instrDescCns per IG buffer    = 150
instrDescDsp per IG buffer    = 150
instrDescCnsDsp per IG buffer = 120
instrDescCGCA per IG buffer   = 75
...
Max prolog instrDesc count:       83
Max prolog insGroup size  :     2448

Release Build:

SC_IG_BUFFER_SIZE             = 3200
SMALL_IDSC_SIZE per IG buffer = 400
instrDesc per IG buffer       = 200
instrDescJmp per IG buffer    = 66
instrDescCns per IG buffer    = 133
instrDescDsp per IG buffer    = 133
instrDescCnsDsp per IG buffer = 100
instrDescCGCA per IG buffer   = 57
...
Max prolog instrDesc count:       83
Max prolog insGroup size  :     1784

There is nothing scientific about this number, other than it fixes this problem, and leaves significant additional headroom for OSR or other scenarios.

AndyAyersMS · 2022-02-10T20:00:42Z

What's the downside of doing this? Is it that we need a new IG per "superblock" (join-free span of blocks) during regular emission, so we may have a lot of wasted IG space in very branchy methods?

BruceForstall · 2022-02-10T20:05:25Z

The downside is the single, global IG buffer is larger. So in the example above, we allocate 3200 bytes instead of 1712. If we only ever use 2000 bytes, we "waste" 1200 bytes.

All individual IGs are precisely sized when they are "saved" (when we reach a label, typically).

On the plus side, we will have fewer "extension" / "overflow" groups, because fewer blocks will reach the global buffer size.

AndyAyersMS · 2022-02-10T20:20:50Z

Doesn't sound too bad then.

We require that the maximum number of prolog instructions all fit in one instruction group. Recent changes appear to have increased the number of instructions we are generating the prolog, leading to NOWAY assert on Release builds and test failure on linux-arm64. Bump up the number to avoid this problem, and leave some headroom for possible additional needs. Fixes dotnet#64162, dotnet#64793.

We require that the maximum number of prolog instructions all fit in one instruction group. Recent changes appear to have increased the number of instructions we are generating the prolog, leading to NOWAY assert on Release builds and test failure on linux-arm64. Bump up the number to avoid this problem, and leave some headroom for possible additional needs. Fixes #64162, #64793.

BruceForstall · 2022-03-01T05:50:58Z

Fixed by #65153

BruceForstall added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 4, 2022

BruceForstall added this to the 7.0.0 milestone Feb 4, 2022

BruceForstall self-assigned this Feb 4, 2022

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 4, 2022

BruceForstall removed the untriaged New issue has not been triaged by the area owner label Feb 4, 2022

BruceForstall mentioned this issue Feb 4, 2022

SPMI no ASMDiff between Checked and Release #61335

Merged

BruceForstall mentioned this issue Feb 10, 2022

Increase arm32/arm64 maximum instruction group size #65153

Merged

BruceForstall closed this as completed Mar 1, 2022

ghost locked as resolved and limited conversation to collaborators Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: checked/release asm diff in GitHub_17777 #64793

JIT: checked/release asm diff in GitHub_17777 #64793

BruceForstall commented Feb 4, 2022 •

edited

Loading

ghost commented Feb 4, 2022

BruceForstall commented Feb 4, 2022

BruceForstall commented Feb 4, 2022

AndyAyersMS commented Feb 5, 2022

BruceForstall commented Feb 7, 2022

BruceForstall commented Feb 10, 2022

BruceForstall commented Feb 10, 2022

AndyAyersMS commented Feb 10, 2022

BruceForstall commented Feb 10, 2022

AndyAyersMS commented Feb 10, 2022

BruceForstall commented Mar 1, 2022

JIT: checked/release asm diff in GitHub_17777 #64793

JIT: checked/release asm diff in GitHub_17777 #64793

Comments

BruceForstall commented Feb 4, 2022 • edited Loading

ghost commented Feb 4, 2022

BruceForstall commented Feb 4, 2022

BruceForstall commented Feb 4, 2022

AndyAyersMS commented Feb 5, 2022

BruceForstall commented Feb 7, 2022

BruceForstall commented Feb 10, 2022

BruceForstall commented Feb 10, 2022

AndyAyersMS commented Feb 10, 2022

BruceForstall commented Feb 10, 2022

AndyAyersMS commented Feb 10, 2022

BruceForstall commented Mar 1, 2022

BruceForstall commented Feb 4, 2022 •

edited

Loading