-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: contained memory safety analysis fixes and improvements #64843
JIT: contained memory safety analysis fixes and improvements #64843
Conversation
Fixes a couple of issues exposed by forward sub, where containment analysis was allowing unsafe reordering of operands. More checks of this sort may be needed. See issues in dotnet#64828. Generalize the safety check so that a store to a local not live into a handler can be reordered with respect to node causing exceptions. Happily this leads to almost uniformly better code despite the more stringent checking added above. Add a workaround for the late callbacks into the containment checker made on unlinked nodes. Assume these are always safe.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsFixes a couple of issues exposed by forward sub, where containment analysis See issues in #64828. Generalize the safety check so that a store to a local not live into a handler Add a workaround for the late callbacks into the containment checker made on
|
cc @dotnet/jit-contrib |
@@ -4949,7 +4949,7 @@ void Lowering::ContainCheckCompare(GenTreeOp* cmp) | |||
// we can treat the MemoryOp as contained. | |||
if (op1Type == op2Type) | |||
{ | |||
if (IsContainableMemoryOp(op1)) | |||
if (IsContainableMemoryOp(op1) && IsSafeToContainMem(cmp, op1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There looked to be a couple outer places in lowerxarch that do IsContainableMemoryOp
followed by an MakeSrcContained
without checking IsSafeToContainMem
- https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L4726-L4729
- https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L4856-L4859
- https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L5253-L5256
- https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L5282-L5286
- https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L6290-L6298
We also have https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L5072-L5075, where its not clear if it doing IsContainableMemoryOp
is right and perhaps indirOpSource->isContained()
would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added checks.
I'll probably leave that last bit as is, IsContainableMemoryOp
is not that expensive.
// | ||
if (transparentParentNode != nullptr) | ||
{ | ||
canBeContained = IsSafeToContainMem(containingNode, transparentParentNode, node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this for the Load*
intrinsic cases below because its known to always be safe, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes. But not 100% sure.
By my understanding, for unary containing nodes, the safety check should immediately return safe, as the child's gtNext
is the node, so there is nothing "in between" that can interfere. For higher arity operations the children can interfere and also there can be other interference in between them from COMMA expansion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add a quick early out to IsSafeToContainMem
for the unary node case, so we are less tempted to optimize out the safety check call and possibly miss something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By my understanding, for unary containing nodes, the safety check should immediately return safe, as the child's gtNext is the node, so there is nothing "in between" that can interfere
Do you mean that for unary operators operand->gtNext == parent
? But that's not true for the linear order.
t1 = IND(addr)
STOREIND(addr, ...) // Updates [addr] to t2
UNARY_USER(t1) // Should observe t1, not t2.
Is valid LIR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know of any way we'd create such a pattern? It would be nice to have more examples that fail if we mess this up.
At any rate, best not to be too clever here. With the added safety calls and follow-up check in MakeSrcContained
we should have things covered, I hope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know of any way we'd create such a pattern?
Yep:
private static uint Problem(uint a, uint* b)
{
uint zero = 0;
return *b + Bmi2.MultiplyNoFlags(a, a, b) * zero;
}
N003 (???,???) [000015] ------------ IL_OFFSET void INLRT @ 0x002[E-] REG NA
N005 ( 1, 1) [000003] -----------Z t3 = LCL_VAR int V01 arg1 u:1 edx REG edx $81
/--* t3 int
N007 ( 3, 2) [000004] *--XG------- t4 = * IND int REG eax <l:$1c1, c:$1c0>
N009 ( 1, 1) [000005] ------------ t5 = LCL_VAR int V00 arg0 u:1 ecx REG ecx $80
N011 ( 1, 1) [000006] ------------ t6 = LCL_VAR int V00 arg0 u:1 ecx (last use) REG ecx $80
N013 ( 1, 1) [000007] -----------z t7 = LCL_VAR int V01 arg1 u:1 esi (last use) REG esi $81
/--* t5 int
+--* t6 int
+--* t7 int
N015 ( 4, 4) [000008] ---XG------- t8 = * HWINTRINSIC int MultiplyNoFlags REG edx $83
/--* t4 int
N017 ( 10, 9) [000012] ---XG------- * RETURN int REG NA $181
Given that Not sure if this will be super-costly for checked builds, but it seems worth trying. |
Small number of diffs from the extra checks. Here's one: we think we should be null checking ;; BEFORE
; Assembly listing for method System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 3, 3 ) byref -> rcx this single-def
; V01 arg1 [V01,T00] ( 4, 4 ) int -> rdx single-def
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 40
G_M59860_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
sub rsp, 40
vzeroupper
;; bbWeight=1 PerfScore 1.25
G_M59860_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
; byrRegs +[rcx]
cmp edx, 16
jae SHORT G_M59860_IG04
movzx rax, byte ptr [rcx+rdx]
;; bbWeight=1 PerfScore 3.25
G_M59860_IG03: ; , epilog, nogc, extend
add rsp, 40
ret
;; bbWeight=1 PerfScore 1.25
G_M59860_IG04: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
; byrRegs -[rcx]
call CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
; gcr arg pop 0
int3
;; AFTER
; Assembly listing for method System.Runtime.Intrinsics.Vector128`1[Byte][System.Byte]:get_Item(int):ubyte:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No matching PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 3, 3 ) byref -> rcx this single-def
; V01 arg1 [V01,T00] ( 4, 4 ) int -> rdx single-def
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V03 rat0 [V03 ] ( 1, 1 ) simd16 -> [rsp+28H] "SIMDInitTempVar"
;
; Lcl frame size = 56
G_M59860_IG01: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
sub rsp, 56
vzeroupper
;; bbWeight=1 PerfScore 1.25
G_M59860_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000002 {rcx}, byref, isz
; byrRegs +[rcx]
vmovupd xmm0, xmmword ptr [rcx]
cmp edx, 16
jae SHORT G_M59860_IG04
vmovupd xmmword ptr [rsp+28H], xmm0
movzx rax, byte ptr [rsp+rdx+28H]
;; bbWeight=1 PerfScore 8.25
G_M59860_IG03: ; , epilog, nogc, extend
add rsp, 56
ret
;; bbWeight=1 PerfScore 1.25
G_M59860_IG04: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
; byrRegs -[rcx]
call CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
; gcr arg pop 0
int3 Full set of regressions from the extra checks is aspnet.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
Going to push this next batch of changes and will come back to addressing regressions when I have time. |
Seems plausible that for value type instance methods we could allow exceptions from an indir based on |
So at least per innerloop we're not missing any safety checks. The checks still seem a bit heavy handed in places, like the regression noted above. Not sure it's worth trying to relax constraints for these isolated calls though; typically such methods will inline and the problem will likely go away. The cases I've seen so far where we might want to relax:
Am going to look at some of the other regressions in libraries_tests next. Turns out Another thought on how to deal with the SIMD case -- split the INDIR into an upfront nullcheck and a containable non-faulting INDIR. That should avoid the need to "spill" the SIMD value like we're seeing. I have been tempted to do similar things for inlining args that could be forward subbed but may cause exceptions (but no other side effects) -- just evaluate them twice, the first time just for effect. I may give this a try since it seems to be specific to just this one intrinsic. Or (even more far out there perhaps) for this particular case we could do the bounds check first, then when we branch on failure, we could do the null check before calling the bounds check helper. |
Also fixing an outerloop test I broke in #64828. Will next try running outerloop. |
/azp run runtime-coreclr outerloop |
Hmm, not clear where to try and "fix" the Perhaps better to live with the small regressions for now and close the correctness loophole. @dotnet/jit-contrib PTAL |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It would be nice to get this in before the fuzzers run tomorrow.
//------------------------------------------------------------------------ | ||
// IsSafeToContainMem: Checks for conflicts between childNode and grandParentNode | ||
// and returns 'true' iff memory operand childNode can be contained in ancestorNode | ||
// | ||
// Arguments: | ||
// grandParentNode - any non-leaf node | ||
// parentNode - parent of `childNode` and an input to `grandParentNode` | ||
// childNode - some node that is an input to `parentNode` | ||
// | ||
// Return value: | ||
// true if it is safe to make childNode a contained memory operand. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there's a few argument names here that need to be updated, ancestorNode
and grandParentNode
=> grandparentNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- will fix this subsequently so I don't retrigger all the tests.
Libraries test failure (Linux x64 Debug):
I have seen this before (#61359). |
x86 crossgen2 failure on windows, may be related?
Can't repro it locally. Looking at the 1GB "mini dump" from CI, we are OOMing while in the jit:
so this is unrelated. We might want to handle OOMing this way a bit more gracefully. Not sure if a minopts fallback is sensible but it may be an option to recover. |
I think all failures are unrelated. Several timeouts and the two issues above. |
Fixes a couple of issues exposed by forward sub, where containment analysis
was allowing unsafe reordering of operands. More checks of this sort may be
needed.
See issues in #64828.
Generalize the safety check so that a store to a local not live into a handler
can be reordered with respect to node causing exceptions. Happily this leads
to almost uniformly better code despite the more stringent checking added above.
Add special handling for "transparent" nodes like CreateScalarUnsafe that allow
their children to be contained by their parent.
Add a workaround for the late callbacks into the containment checker made on
unlinked nodes. Assume these are always safe.