Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: contained memory safety analysis fixes and improvements #64843

Merged

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Feb 5, 2022

Fixes a couple of issues exposed by forward sub, where containment analysis
was allowing unsafe reordering of operands. More checks of this sort may be
needed.

See issues in #64828.

Generalize the safety check so that a store to a local not live into a handler
can be reordered with respect to node causing exceptions. Happily this leads
to almost uniformly better code despite the more stringent checking added above.

Add special handling for "transparent" nodes like CreateScalarUnsafe that allow
their children to be contained by their parent.

Add a workaround for the late callbacks into the containment checker made on
unlinked nodes. Assume these are always safe.

Fixes a couple of issues exposed by forward sub, where containment analysis
was allowing unsafe reordering of operands. More checks of this sort may be
needed.

See issues in dotnet#64828.

Generalize the safety check so that a store to a local not live into a handler
can be reordered with respect to node causing exceptions. Happily this leads
to almost uniformly better code despite the more stringent checking added above.

Add a workaround for the late callbacks into the containment checker made on
unlinked nodes. Assume these are always safe.
@ghost ghost assigned AndyAyersMS Feb 5, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 5, 2022
@ghost
Copy link

ghost commented Feb 5, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes a couple of issues exposed by forward sub, where containment analysis
was allowing unsafe reordering of operands. More checks of this sort may be
needed.

See issues in #64828.

Generalize the safety check so that a store to a local not live into a handler
can be reordered with respect to node causing exceptions. Happily this leads
to almost uniformly better code despite the more stringent checking added above.

Add a workaround for the late callbacks into the containment checker made on
unlinked nodes. Assume these are always safe.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

@@ -4949,7 +4949,7 @@ void Lowering::ContainCheckCompare(GenTreeOp* cmp)
// we can treat the MemoryOp as contained.
if (op1Type == op2Type)
{
if (IsContainableMemoryOp(op1))
if (IsContainableMemoryOp(op1) && IsSafeToContainMem(cmp, op1))
Copy link
Member

@tannergooding tannergooding Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added checks.

I'll probably leave that last bit as is, IsContainableMemoryOp is not that expensive.

//
if (transparentParentNode != nullptr)
{
canBeContained = IsSafeToContainMem(containingNode, transparentParentNode, node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this for the Load* intrinsic cases below because its known to always be safe, is that right?

Copy link
Member Author

@AndyAyersMS AndyAyersMS Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes. But not 100% sure.

By my understanding, for unary containing nodes, the safety check should immediately return safe, as the child's gtNext is the node, so there is nothing "in between" that can interfere. For higher arity operations the children can interfere and also there can be other interference in between them from COMMA expansion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add a quick early out to IsSafeToContainMem for the unary node case, so we are less tempted to optimize out the safety check call and possibly miss something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my understanding, for unary containing nodes, the safety check should immediately return safe, as the child's gtNext is the node, so there is nothing "in between" that can interfere

Do you mean that for unary operators operand->gtNext == parent? But that's not true for the linear order.

t1 = IND(addr)
     STOREIND(addr, ...) // Updates [addr] to t2
     UNARY_USER(t1) // Should observe t1, not t2.

Is valid LIR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know of any way we'd create such a pattern? It would be nice to have more examples that fail if we mess this up.

At any rate, best not to be too clever here. With the added safety calls and follow-up check in MakeSrcContained we should have things covered, I hope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know of any way we'd create such a pattern?

Yep:

private static uint Problem(uint a, uint* b)
{
    uint zero = 0;
    return *b + Bmi2.MultiplyNoFlags(a, a, b) * zero;
}
N003 (???,???) [000015] ------------                 IL_OFFSET void   INLRT @ 0x002[E-] REG NA
N005 (  1,  1) [000003] -----------Z         t3 =    LCL_VAR   int    V01 arg1         u:1 edx REG edx $81
                                                  /--*  t3     int
N007 (  3,  2) [000004] *--XG-------         t4 = *  IND       int    REG eax <l:$1c1, c:$1c0>
N009 (  1,  1) [000005] ------------         t5 =    LCL_VAR   int    V00 arg0         u:1 ecx REG ecx $80
N011 (  1,  1) [000006] ------------         t6 =    LCL_VAR   int    V00 arg0         u:1 ecx (last use) REG ecx $80
N013 (  1,  1) [000007] -----------z         t7 =    LCL_VAR   int    V01 arg1         u:1 esi (last use) REG esi $81
                                                  /--*  t5     int
                                                  +--*  t6     int
                                                  +--*  t7     int
N015 (  4,  4) [000008] ---XG-------         t8 = *  HWINTRINSIC int     MultiplyNoFlags REG edx $83
                                                  /--*  t4     int
N017 ( 10,  9) [000012] ---XG-------              *  RETURN    int    REG NA $181

@AndyAyersMS
Copy link
Member Author

Given that Lowering::MakeSrcContained takes the parent node as an arg it seems like we can add memory safety verification there and catch cases where the caller missed checks.

Not sure if this will be super-costly for checked builds, but it seems worth trying.

@AndyAyersMS
Copy link
Member Author

Small number of diffs from the extra checks. Here's one: we think we should be null checking this before bounds checking and this leads to some messy codegen.

;; BEFORE

; Assembly listing for method System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; Final local variable assignments
;
;  V00 this         [V00,T01] (  3,  3   )   byref  ->  rcx         this single-def
;  V01 arg1         [V01,T00] (  4,  4   )     int  ->  rdx         single-def
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 40

G_M59860_IG01:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
       sub      rsp, 40
       vzeroupper 
						;; bbWeight=1    PerfScore 1.25
G_M59860_IG02:        ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
       ; byrRegs +[rcx]
       cmp      edx, 16
       jae      SHORT G_M59860_IG04
       movzx    rax, byte  ptr [rcx+rdx]
						;; bbWeight=1    PerfScore 3.25
G_M59860_IG03:        ; , epilog, nogc, extend
       add      rsp, 40
       ret      
						;; bbWeight=1    PerfScore 1.25
G_M59860_IG04:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
       ; byrRegs -[rcx]
       call     CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
       ; gcr arg pop 0
       int3

;; AFTER

; Assembly listing for method System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; Final local variable assignments
;
;  V00 this         [V00,T01] (  3,  3   )   byref  ->  rcx         this single-def
;  V01 arg1         [V01,T00] (  4,  4   )     int  ->  rdx         single-def
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V03 rat0         [V03    ] (  1,  1   )  simd16  ->  [rsp+28H]   "SIMDInitTempVar"
;
; Lcl frame size = 56

G_M59860_IG01:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
       sub      rsp, 56
       vzeroupper 
						;; bbWeight=1    PerfScore 1.25
G_M59860_IG02:        ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
       ; byrRegs +[rcx]
       vmovupd  xmm0, xmmword ptr [rcx]
       cmp      edx, 16
       jae      SHORT G_M59860_IG04
       vmovupd  xmmword ptr [rsp+28H], xmm0
       movzx    rax, byte  ptr [rsp+rdx+28H]
						;; bbWeight=1    PerfScore 8.25
G_M59860_IG03:        ; , epilog, nogc, extend
       add      rsp, 56
       ret      
						;; bbWeight=1    PerfScore 1.25
G_M59860_IG04:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
       ; byrRegs -[rcx]
       call     CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
       ; gcr arg pop 0
       int3  

Full set of regressions from the extra checks is

aspnet.run.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 19733334 (overridden on cmd)
Total bytes of diff: 19733350 (overridden on cmd)
Total bytes of delta: 16 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           2 : 46659.dasm (1.49% of base)
           2 : 50687.dasm (0.32% of base)
           2 : 51092.dasm (1.49% of base)
           2 : 51969.dasm (0.35% of base)
           2 : 45076.dasm (1.49% of base)
           2 : 55877.dasm (1.49% of base)
           2 : 48677.dasm (0.37% of base)
           2 : 44556.dasm (0.32% of base)

8 total files with Code Size differences (0 improved, 8 regressed), 0 unchanged.

Top method regressions (bytes):
           2 ( 1.49% of base) : 46659.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 1.49% of base) : 51092.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 1.49% of base) : 45076.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 1.49% of base) : 55877.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 0.32% of base) : 50687.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool
           2 ( 0.35% of base) : 51969.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool
           2 ( 0.37% of base) : 48677.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool
           2 ( 0.32% of base) : 44556.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool

Top method regressions (percentages):
           2 ( 1.49% of base) : 46659.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 1.49% of base) : 51092.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 1.49% of base) : 45076.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 1.49% of base) : 55877.dasm - PathString:StartsWithSegments(PathString,int):bool:this
           2 ( 0.37% of base) : 48677.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool
           2 ( 0.35% of base) : 51969.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool
           2 ( 0.32% of base) : 50687.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool
           2 ( 0.32% of base) : 44556.dasm - RuntimeType:FilterApplyMethodBase(MethodBase,int,int,int,ref):bool

8 total methods with Code Size differences (0 improved, 8 regressed), 0 unchanged.


coreclr_tests.pmi.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 128321453 (overridden on cmd)
Total bytes of diff: 128321565 (overridden on cmd)
Total bytes of delta: 112 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
          18 : 239319.dasm (3.01% of base)
          18 : 239317.dasm (2.95% of base)
          18 : 239318.dasm (2.85% of base)
          11 : 84869.dasm (42.31% of base)
          11 : 84868.dasm (39.29% of base)
          11 : 84867.dasm (40.74% of base)
          11 : 84896.dasm (40.74% of base)
          11 : 84870.dasm (39.29% of base)
           2 : 239321.dasm (0.32% of base)
           2 : 89419.dasm (0.09% of base)

Top file improvements (bytes):
          -1 : 239320.dasm (-0.16% of base)

11 total files with Code Size differences (1 improved, 10 regressed), 0 unchanged.

Top method regressions (bytes):
          18 ( 2.95% of base) : 239317.dasm - VectorArrayTest`1[Byte][System.Byte]:VectorArray(ubyte):int
          18 ( 2.85% of base) : 239318.dasm - VectorArrayTest`1[Int16][System.Int16]:VectorArray(short):int
          18 ( 3.01% of base) : 239319.dasm - VectorArrayTest`1[Int32][System.Int32]:VectorArray(int):int
          11 (40.74% of base) : 84867.dasm - System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
          11 (39.29% of base) : 84870.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:get_Item(int):double:this
          11 (39.29% of base) : 84868.dasm - System.Runtime.Intrinsics.Vector128`1[Int16][System.Int16]:get_Item(int):short:this
          11 (42.31% of base) : 84869.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:get_Item(int):int:this
          11 (40.74% of base) : 84896.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:get_Item(int):long:this
           2 ( 0.09% of base) : 89419.dasm - ILGEN_0x537f7b0:Method_0x323f83b5(double,byte,int,short,ubyte,double,short,ushort,double,long,int,long,ushort,long,int):float
           2 ( 0.32% of base) : 239321.dasm - VectorArrayTest`1[Int64][System.Int64]:VectorArray(long):int

Top method improvements (bytes):
          -1 (-0.16% of base) : 239320.dasm - VectorArrayTest`1[Double][System.Double]:VectorArray(double):int

Top method regressions (percentages):
          11 (42.31% of base) : 84869.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:get_Item(int):int:this
          11 (40.74% of base) : 84867.dasm - System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
          11 (40.74% of base) : 84896.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:get_Item(int):long:this
          11 (39.29% of base) : 84870.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:get_Item(int):double:this
          11 (39.29% of base) : 84868.dasm - System.Runtime.Intrinsics.Vector128`1[Int16][System.Int16]:get_Item(int):short:this
          18 ( 3.01% of base) : 239319.dasm - VectorArrayTest`1[Int32][System.Int32]:VectorArray(int):int
          18 ( 2.95% of base) : 239317.dasm - VectorArrayTest`1[Byte][System.Byte]:VectorArray(ubyte):int
          18 ( 2.85% of base) : 239318.dasm - VectorArrayTest`1[Int16][System.Int16]:VectorArray(short):int
           2 ( 0.32% of base) : 239321.dasm - VectorArrayTest`1[Int64][System.Int64]:VectorArray(long):int
           2 ( 0.09% of base) : 89419.dasm - ILGEN_0x537f7b0:Method_0x323f83b5(double,byte,int,short,ubyte,double,short,ushort,double,long,int,long,ushort,long,int):float

Top method improvements (percentages):
          -1 (-0.16% of base) : 239320.dasm - VectorArrayTest`1[Double][System.Double]:VectorArray(double):int

11 total methods with Code Size differences (1 improved, 10 regressed), 0 unchanged.


libraries.pmi.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 46583443 (overridden on cmd)
Total bytes of diff: 46583502 (overridden on cmd)
Total bytes of delta: 59 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
          11 : 20673.dasm (40.74% of base)
          11 : 20675.dasm (42.31% of base)
          11 : 20674.dasm (39.29% of base)
          11 : 20676.dasm (39.29% of base)
          11 : 20701.dasm (40.74% of base)
           2 : 58129.dasm (0.41% of base)
           2 : 58130.dasm (0.71% of base)

7 total files with Code Size differences (0 improved, 7 regressed), 0 unchanged.

Top method regressions (bytes):
          11 (40.74% of base) : 20673.dasm - System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
          11 (39.29% of base) : 20676.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:get_Item(int):double:this
          11 (39.29% of base) : 20674.dasm - System.Runtime.Intrinsics.Vector128`1[Int16][System.Int16]:get_Item(int):short:this
          11 (42.31% of base) : 20675.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:get_Item(int):int:this
          11 (40.74% of base) : 20701.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:get_Item(int):long:this
           2 ( 0.71% of base) : 58130.dasm - Microsoft.CodeAnalysis.VisualBasic.LocalRewriter:AppendToBlock(Microsoft.CodeAnalysis.VisualBasic.BoundBlock,Microsoft.CodeAnalysis.VisualBasic.BoundStatement):Microsoft.CodeAnalysis.VisualBasic.BoundBlock:this
           2 ( 0.41% of base) : 58129.dasm - Microsoft.CodeAnalysis.VisualBasic.LocalRewriter:Concat(Microsoft.CodeAnalysis.VisualBasic.BoundStatement,Microsoft.CodeAnalysis.VisualBasic.BoundStatement):Microsoft.CodeAnalysis.VisualBasic.BoundStatement:this

Top method regressions (percentages):
          11 (42.31% of base) : 20675.dasm - System.Runtime.Intrinsics.Vector128`1[Int32][System.Int32]:get_Item(int):int:this
          11 (40.74% of base) : 20673.dasm - System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
          11 (40.74% of base) : 20701.dasm - System.Runtime.Intrinsics.Vector128`1[Int64][System.Int64]:get_Item(int):long:this
          11 (39.29% of base) : 20676.dasm - System.Runtime.Intrinsics.Vector128`1[Double][System.Double]:get_Item(int):double:this
          11 (39.29% of base) : 20674.dasm - System.Runtime.Intrinsics.Vector128`1[Int16][System.Int16]:get_Item(int):short:this
           2 ( 0.71% of base) : 58130.dasm - Microsoft.CodeAnalysis.VisualBasic.LocalRewriter:AppendToBlock(Microsoft.CodeAnalysis.VisualBasic.BoundBlock,Microsoft.CodeAnalysis.VisualBasic.BoundStatement):Microsoft.CodeAnalysis.VisualBasic.BoundBlock:this
           2 ( 0.41% of base) : 58129.dasm - Microsoft.CodeAnalysis.VisualBasic.LocalRewriter:Concat(Microsoft.CodeAnalysis.VisualBasic.BoundStatement,Microsoft.CodeAnalysis.VisualBasic.BoundStatement):Microsoft.CodeAnalysis.VisualBasic.BoundStatement:this

7 total methods with Code Size differences (0 improved, 7 regressed), 0 unchanged.


libraries_tests.pmi.windows.x64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 119636782 (overridden on cmd)
Total bytes of diff: 119637034 (overridden on cmd)
Total bytes of delta: 252 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
          33 : 305492.dasm (6.53% of base)
          27 : 305489.dasm (5.76% of base)
          27 : 305491.dasm (5.81% of base)
          27 : 305494.dasm (5.72% of base)
          27 : 305504.dasm (5.57% of base)
          27 : 305490.dasm (5.73% of base)
          21 : 305503.dasm (4.67% of base)
          21 : 305502.dasm (4.61% of base)
          21 : 305501.dasm (4.64% of base)
          21 : 305506.dasm (4.59% of base)

10 total files with Code Size differences (0 improved, 10 regressed), 0 unchanged.

Top method regressions (bytes):
          33 ( 6.53% of base) : 305492.dasm - <>c__DisplayClass309_0`1[Double][System.Double]:<TestAdditionOverflow>b__0(int,double):this
          27 ( 5.76% of base) : 305489.dasm - <>c__DisplayClass309_0`1[Byte][System.Byte]:<TestAdditionOverflow>b__0(int,ubyte):this
          27 ( 5.73% of base) : 305490.dasm - <>c__DisplayClass309_0`1[Int16][System.Int16]:<TestAdditionOverflow>b__0(int,short):this
          27 ( 5.81% of base) : 305491.dasm - <>c__DisplayClass309_0`1[Int32][System.Int32]:<TestAdditionOverflow>b__0(int,int):this
          27 ( 5.72% of base) : 305494.dasm - <>c__DisplayClass309_0`1[Int64][System.Int64]:<TestAdditionOverflow>b__0(int,long):this
          27 ( 5.57% of base) : 305504.dasm - <>c__DisplayClass329_0`1[Double][System.Double]:<TestSubtractionOverflow>b__0(int,double):this
          21 ( 4.64% of base) : 305501.dasm - <>c__DisplayClass329_0`1[Byte][System.Byte]:<TestSubtractionOverflow>b__0(int,ubyte):this
          21 ( 4.61% of base) : 305502.dasm - <>c__DisplayClass329_0`1[Int16][System.Int16]:<TestSubtractionOverflow>b__0(int,short):this
          21 ( 4.67% of base) : 305503.dasm - <>c__DisplayClass329_0`1[Int32][System.Int32]:<TestSubtractionOverflow>b__0(int,int):this
          21 ( 4.59% of base) : 305506.dasm - <>c__DisplayClass329_0`1[Int64][System.Int64]:<TestSubtractionOverflow>b__0(int,long):this

Top method regressions (percentages):
          33 ( 6.53% of base) : 305492.dasm - <>c__DisplayClass309_0`1[Double][System.Double]:<TestAdditionOverflow>b__0(int,double):this
          27 ( 5.81% of base) : 305491.dasm - <>c__DisplayClass309_0`1[Int32][System.Int32]:<TestAdditionOverflow>b__0(int,int):this
          27 ( 5.76% of base) : 305489.dasm - <>c__DisplayClass309_0`1[Byte][System.Byte]:<TestAdditionOverflow>b__0(int,ubyte):this
          27 ( 5.73% of base) : 305490.dasm - <>c__DisplayClass309_0`1[Int16][System.Int16]:<TestAdditionOverflow>b__0(int,short):this
          27 ( 5.72% of base) : 305494.dasm - <>c__DisplayClass309_0`1[Int64][System.Int64]:<TestAdditionOverflow>b__0(int,long):this
          27 ( 5.57% of base) : 305504.dasm - <>c__DisplayClass329_0`1[Double][System.Double]:<TestSubtractionOverflow>b__0(int,double):this
          21 ( 4.67% of base) : 305503.dasm - <>c__DisplayClass329_0`1[Int32][System.Int32]:<TestSubtractionOverflow>b__0(int,int):this
          21 ( 4.64% of base) : 305501.dasm - <>c__DisplayClass329_0`1[Byte][System.Byte]:<TestSubtractionOverflow>b__0(int,ubyte):this
          21 ( 4.61% of base) : 305502.dasm - <>c__DisplayClass329_0`1[Int16][System.Int16]:<TestSubtractionOverflow>b__0(int,short):this
          21 ( 4.59% of base) : 305506.dasm - <>c__DisplayClass329_0`1[Int64][System.Int64]:<TestSubtractionOverflow>b__0(int,long):this

10 total methods with Code Size differences (0 improved, 10 regressed), 0 unchanged.


Going to push this next batch of changes and will come back to addressing regressions when I have time.

@AndyAyersMS
Copy link
Member Author

Seems plausible that for value type instance methods we could allow exceptions from an indir based on this to reorder with other exceptions in containment safety checks (at least for small offsets?).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 5, 2022

So at least per innerloop we're not missing any safety checks.

The checks still seem a bit heavy handed in places, like the regression noted above. Not sure it's worth trying to relax constraints for these isolated calls though; typically such methods will inline and the problem will likely go away.

The cases I've seen so far where we might want to relax:

  • two indir trees that can only cause NREs -- possibly allow them to reorder. This would fix the small regressions in the Test_Sse_CompareScalarOrderedLessThan_* tests, which are passed two byref arguments.
  • a byref this indir tree vs a bounds check for SIMD -- get_Item as noted above

Am going to look at some of the other regressions in libraries_tests next.

Turns out <>c__DisplayClass309_0`1[Double][System.Double]:<TestAdditionOverflow>b__0(int,double):this and friends are methods with multiple get_Item calls, so same case as above.

Another thought on how to deal with the SIMD case -- split the INDIR into an upfront nullcheck and a containable non-faulting INDIR. That should avoid the need to "spill" the SIMD value like we're seeing. I have been tempted to do similar things for inlining args that could be forward subbed but may cause exceptions (but no other side effects) -- just evaluate them twice, the first time just for effect. I may give this a try since it seems to be specific to just this one intrinsic.

Or (even more far out there perhaps) for this particular case we could do the bounds check first, then when we branch on failure, we could do the null check before calling the bounds check helper.

@AndyAyersMS
Copy link
Member Author

Also fixing an outerloop test I broke in #64828. Will next try running outerloop.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr outerloop

@AndyAyersMS
Copy link
Member Author

Hmm, not clear where to try and "fix" the get_Item issues.

Perhaps better to live with the small regressions for now and close the correctness loophole.

@dotnet/jit-contrib PTAL

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It would be nice to get this in before the fuzzers run tomorrow.

Comment on lines +122 to +133
//------------------------------------------------------------------------
// IsSafeToContainMem: Checks for conflicts between childNode and grandParentNode
// and returns 'true' iff memory operand childNode can be contained in ancestorNode
//
// Arguments:
// grandParentNode - any non-leaf node
// parentNode - parent of `childNode` and an input to `grandParentNode`
// childNode - some node that is an input to `parentNode`
//
// Return value:
// true if it is safe to make childNode a contained memory operand.
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's a few argument names here that need to be updated, ancestorNode and grandParentNode => grandparentNode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- will fix this subsequently so I don't retrigger all the tests.

@AndyAyersMS
Copy link
Member Author

Libraries test failure (Linux x64 Debug):

Process terminated. Assertion failed.
   at System.Number.FormatHalf(ValueStringBuilder& sb, Half value, ReadOnlySpan`1 format, NumberFormatInfo info)

I have seen this before (#61359).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 6, 2022

x86 crossgen2 failure on windows, may be related?

      Fatal error. 0xC0000005
         at Internal.JitInterface.CorInfoImpl.JitCompileMethod(IntPtr ByRef, IntPtr, IntPtr, IntPtr, Internal.JitInterface.CORINFO_METHOD_INFO ByRef, UInt32, IntPtr ByRef, UInt32 ByRef)
         at Internal.JitInterface.CorInfoImpl.CompileMethodInternal(ILCompiler.DependencyAnalysis.IMethodNode, Internal.IL.MethodIL)
         at Internal.JitInterface.CorInfoImpl.CompileMethod(ILCompiler.DependencyAnalysis.ReadyToRun.MethodWithGCInfo, ILCompiler.Logger)
         at ILCompiler.ReadyToRunCodegenCompilation.<ComputeDependencyNodeDependencies>b__35_0(ILCompiler.DependencyAnalysisFramework.DependencyNodeCore`1<ILCompiler.DependencyAnalysis.NodeFactory>)
         at System.Threading.Tasks.Parallel+<>c__DisplayClass33_0`2[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<ForEachWorker>b__0(Int32)
         at System.Threading.Tasks.Parallel+<>c__DisplayClass19_0`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<ForWorker>b__1(System.Threading.Tasks.RangeWorker ByRef, Int32, Boolean ByRef)
         at System.Threading.Tasks.TaskReplicator+Replica.Execute()
         at System.Threading.Tasks.TaskReplicator+Replica+<>c.<.ctor>b__4_0(System.Object)
         at System.Threading.Tasks.Task.InnerInvoke()
         at System.Threading.Tasks.Task+<>c.<.cctor>b__271_0(System.Object)
         at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
         at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
         at System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
         at System.Threading.ThreadPoolWorkQueue.Dispatch()
         at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
         at System.Threading.Thread.StartCallback()

Can't repro it locally. Looking at the 1GB "mini dump" from CI, we are OOMing while in the jit:

00 3007ddb0 728baa33     coreclr!EEPolicy::HandleFatalError+0x72
01 3007df30 727d1f0e     coreclr!CPFH_RealFirstPassHandler+0xe8b0c
02 (Inline) --------     coreclr!CPFH_FirstPassHandler+0xc6
03 3007df74 774d3272     coreclr!COMPlusFrameHandler+0x13e
04 3007df98 774d3244     ntdll!ExecuteHandler2+0x26
05 3007e060 774c0dbf     ntdll!ExecuteHandler+0x24
06 3007e538 6f7c149b     ntdll!KiUserExceptionDispatcher+0xf
07 3007e538 6f7c7168     clrjit_win_x86_x86!ArenaAllocator::allocateNewPage+0x55
08 3007e554 6f7c71ac     clrjit_win_x86_x86!ArenaAllocator::allocateMemory+0x65
09 3007e564 6f87594f     clrjit_win_x86_x86!ArenaAllocator::MemStatsAllocator::allocateMemory+0x20
0a 3007e574 6f877ebc     clrjit_win_x86_x86!CompAllocator::allocate<HashTableBase<int,jitstd::list<GenTree *,jitstd::allocator<GenTree *> > *,HashTableInfo<int>,CompAllocator>::Bucket>+0x19
0b 3007e598 6f875d99     clrjit_win_x86_x86!HashTableBase<int,jitstd::list<GenTree *,jitstd::allocator<GenTree *> > *,HashTableInfo<int>,CompAllocator>::Resize+0x25
0c 3007e5b4 6f87600d     clrjit_win_x86_x86!HashTableBase<int,jitstd::list<GenTree *,jitstd::allocator<GenTree *> > *,HashTableInfo<int>,CompAllocator>::AddOrUpdate+0x4d
0d 3007e61c 6f8766bd     clrjit_win_x86_x86!CheckLclVarSemanticsHelper::Check+0xed
0e 3007eb50 6f811304     clrjit_win_x86_x86!LIR::Range::CheckLIR+0x50a
0f 3007eb88 6f8825fd     clrjit_win_x86_x86!Compiler::fgUpdateFlowGraph+0x7da
10 3007ebac 6f8c3b38     clrjit_win_x86_x86!Lowering::DoPhase+0xed
11 3007ebe0 6f7e642d     clrjit_win_x86_x86!Phase::Run+0x38

so this is unrelated.

We might want to handle OOMing this way a bit more gracefully. Not sure if a minopts fallback is sensible but it may be an option to recover.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 6, 2022

I think all failures are unrelated.

Several timeouts and the two issues above.

@AndyAyersMS AndyAyersMS merged commit d936a66 into dotnet:main Feb 6, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants