Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect GTF_IND_NONFAULTING setting that leads to an incorrect execution. #13762

Closed
sandreenko opened this issue Nov 8, 2019 · 19 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Milestone

Comments

@sandreenko
Copy link
Contributor

For such a test case:

public sealed class TestClass<T>
{
    private const int MaxBuffersPerArraySizePerCore = 8;
    private T[] _arrays;

    [MethodImpl(MethodImplOptions.NoInlining)]
    public void TestNullRef()
    {
        _arrays = new T[GetSize()] ;
    }
}

we will generate a tree like:

               [000008] -ACXG+------              *  ASG       ref   
               [000007] ---XG+-N----              +--*  IND       ref   
               [000014] -----+------              |  \--*  ADD       byref 
               [000000] -----+------              |     +--*  LCL_VAR   ref    V00 this         
               [000013] -----+------              |     \--*  CNS_INT   long   8 field offset Fseq[_arrays]
               [000006] --CXG+------              \--*  CALL help ref    HELPER.CORINFO_HELP_NEWARR_1_R2R_DIRECT
               [000017] -ACXGO----L- arg0 SETUP      +--*  ASG       long  
               [000016] D------N----                 |  +--*  LCL_VAR   long   V02 tmp1         
               [000004] --CXG+------                 |  \--*  CALL help r2r_ind long   HELPER.CORINFO_HELP_READYTORUN_GENERIC_HANDLE
               [000003] #----+------ arg0 in rcx     |     \--*  IND       long  
               [000002] -----+------                 |        \--*  LCL_VAR   ref    V00 this         
               [000018] ------------ arg0 in rcx     +--*  LCL_VAR   long   V02 tmp1         
               [000001] -----+------ arg1 in rdx     \--*  CNS_INT   long   8

where 000003 INDIR doesn't have an exception flag and because clever morph can see that we have already dereferenced it in 000007 INDIR:

Morphing args for 4.CALL:

Non-null prop for index dotnet/coreclr#1 in BB01:
               [000003] #--X--------              *  IND       long  
argSlots=1, preallocatedArgCount=4, nextSlotNum=4, outgoingArgSpaceSize=32

Sorting the arguments:
Deferred argument ('rcx'):
               [000003] #----+------              *  IND       long  
               [000002] -----+------              \--*  LCL_VAR   ref    V00 this   

so far so good, but then fgSetBlockOrder could decide to revert 000008 and guess what now it can clear the exception flag from 000007 because now we execute it after 000003 that is known not to throw (because it got GTF_IND_NONFAULTING during the previous phase) so we end up proving that this indirection can't throw:

N015 ( 41, 22) [000008] -ACXGO--R---              *  ASG       ref    $VN.Void
N014 (  4,  4) [000007] n---GO-N----              +--*  IND       ref    $280
N013 (  2,  2) [000014] -------N----              |  \--*  ADD       byref  $2c0
N011 (  1,  1) [000000] ------------              |     +--*  LCL_VAR   ref    V00 this         u:1 $80
N012 (  1,  1) [000013] ------------              |     \--*  CNS_INT   long   8 field offset Fseq[_arrays] $200
N010 ( 36, 17) [000006] -ACXGO------              \--*  CALL help ref    HELPER.CORINFO_HELP_NEWARR_1_R2R_DIRECT $103
N006 ( 17,  8) [000017] -ACXGO--R-L- arg0 SETUP      +--*  ASG       long   $180
N005 (  1,  1) [000016] D------N----                 |  +--*  LCL_VAR   long   V02 tmp1         d:1 $c1
N004 ( 17,  8) [000004] --CXGO------                 |  \--*  CALL help r2r_ind long   HELPER.CORINFO_HELP_READYTORUN_GENERIC_HANDLE $180
N003 (  3,  2) [000003] #----O------ arg0 in rcx     |     \--*  IND       long   $100
N002 (  1,  1) [000002] ------------                 |        \--*  LCL_VAR   ref    V00 this         u:1 $80
N008 (  1,  1) [000018] ------------ arg0 in rcx     +--*  LCL_VAR   long   V02 tmp1         u:1 (last use) $c1
N009 (  1,  1) [000001] ------------ arg1 in rdx     \--*  CNS_INT   long   8 $200

when it is, of course, incorrect:
Update: that is correct in this case because this is never null (we check it before a call to an instance method).

TestClass obj = null;
obj.TestNullRef();

we set revert flag here:
https://github.com/dotnet/coreclr/blob/68043c6306a0449bb5ed10fbdea9f14dafc99df4/src/jit/gentree.cpp#L4004-L4023

under conditions that are not clear to me.

Summary: if we fix #13758 we will have GTF_ASG on 000006. Then 000008 won't be reverted (because of the strange condition that I showed earlier that checks GTF_ASG), so we won't incorrectly drop GTF_EXC from both indirections. However, it will lead to 0.07% code size regression in System.Private.Corelib and 0.31% on framework libraries and it is not a complete fix for the problem.

Link #10813 and dotnet/coreclr#19334.

category:correctness
theme:optimization
skill-level:intermediate
cost:medium

@sandreenko sandreenko self-assigned this Nov 8, 2019
@sandreenko
Copy link
Contributor Author

@CarolEidt could you please take a look and advise?

I will try to create an IL repo to show bad execution tomorrow, but I am not sure how that can be fixed in general (except by forbidding revert or Non-null prop in morph) preferably without a regression that big.

@CarolEidt
Copy link
Contributor

The code you show in gtSetEvalOrder is dancing around the fact that the address of an assignment must be evaluated first if it has any side-effects, but note that the assignment itself doesn't happen until both operands are evaluated. So a GT_IND on the lhs of an assignment is really not evaluated until the assignment actually happens. IMO the right fix for this is to eliminate GT_ASG early (ideally in the importer), using store variants such as GT_STORE_IND where the ordering of the actual indirection is clear, and then assertion prop would see the correct order of the indirections. But that's a big effort, and this should be fixed sooner than that.

I find it quite unfortunate that we are doing order-based assertions prior to actually setting the evaluation order, which seems like it has significant potential for issues like this. A couple of things come to mind:

  • The assertion prop that's setting the GTF_IND_NON_FAULTING on [000003] should be fixed to not process the GTF_IND on the lhs of an assignment (i.e. setting the assertion for [000007]) until after all the operands, since it is never evaluated first. There's a GTF_IND_ASG_LHS flag that I presume would be set on such a node.
  • This particular bug aside, it seems that when we have GTF_IND_NON_FAULTING on a GT_IND we should never reorder it, but I don't believe that's a flag we propagate, so I don't think there's anything to prevent us reordering an operation that has the same indirection on both sides such that the one that's marked GTF_IND_NON_FAULTING will now actually be the one to fault.

I know that @erozenfeld has spent time with the side-effect flags and their propagation, and he might have some insight into this.

@mikedn
Copy link
Contributor

mikedn commented Nov 8, 2019

But that's a big effort, and this should be fixed sooner than that.

Actually I hope I'll get to ASG elimination sooner rather than later. The more stuff like this I the less I believe that it can actually be fixed in a reliable manner.

But yes, in the meantime there are better ways to deal with ASG. I already started this with dotnet/coreclr#27445 and next on the list is liveness. Eventually all the JIT should follow the same approach.

There's a GTF_IND_ASG_LHS flag that I presume would be set on such a node.

Yes, but we'll need to set it in morph. It's currently set during SSA construction.

This particular bug aside, it seems that when we have GTF_IND_NON_FAULTING on a GT_IND we should never reorder it,

I need to take a closer look at this, it's not clear to me what stops this kind of reordering from occurring in other situations that do not involve ASG nodes. Anyone knows?

@mikedn
Copy link
Contributor

mikedn commented Nov 8, 2019

Ah, but assertion prop sets GTF_IND_NONFAULTING and then also sets GTF_ORDER_SIDEEFF so reordering should not occur. But I'm pretty sure GTF_ORDER_SIDEEFF is somehow broken, I remember noticing weird things in the code around it.

@mikedn
Copy link
Contributor

mikedn commented Nov 8, 2019

As far as I can tell there are 2 different problems here:

  • gtSetEvalOrder manages to set GTF_REVERSE_OPS on GT_ASG without checking gtCanSwapOrder and thus ignoring GTF_ORDER_SIDEEFF
  • The typical confusion about where the assignment store (and its potential exception side effect) takes place. If you look at the rationalization code it's clear as daylight that the stores happens at the point of the GT_ASG node and not at the point of its LHS. But for some odd reasons pretty much the entire frontend gets it backwards and considers the LHS to be the store point.

I'm guessing that the first problem may be easy to fix. The second one requires a bit more work :)

@CarolEidt
Copy link
Contributor

@mikedn - thanks for the excellent summary! Agree on the observation that the second issue will likely require more work.

@sandreenko
Copy link
Contributor Author

Thank you both, @CarolEidt and @mikedn.
I am trying to do a better repro now. The one in the header doesn't end up in an incorrect execution because there are exception flags on other calls, so the statement can't be deleted and there are no other exceptions that can be reordered.
Then I will work on fixing gtSetEvalOrder to respect gtCanSwapOrder.

@mikedn
Copy link
Contributor

mikedn commented Nov 8, 2019

Looking a bit more at gtSetEvalOrder:

// However, if the LHS is a lclVar address, SSA relies on using evaluation order for its
// renaming, and therefore the RHS must be evaluated first.

Hmm, dotnet/coreclr#27445 changed SSA so this is no longer necessary.

Though there are still cases where GTF_REVERSE_OPS gets set and I'm not sure why.

@mikedn
Copy link
Contributor

mikedn commented Nov 19, 2019

There's a GTF_IND_ASG_LHS flag that I presume would be set on such a node.

Yes, but we'll need to set it in morph. It's currently set during SSA construction.

BTW, memory liveness is messed up because of the way this flag is set. Since first liveness run is done before SSA the flag is not set so this code:
https://github.com/dotnet/coreclr/blob/a9f3fc16483eecfc47fb79c362811d870be02249/src/jit/liveness.cpp#L265-L279
will make a use out of every indir, including ones that are LHS and thus only defs.

Also, rationalization clears GTF_IND_ASG_LHS because it shares a bit with GTF_IND_REQ_ADDR_IN_REG. And then liveness runs after lowering and hits the same bad condition, except that this time the whole thing is pointless - memory liveness is not needed in LIR.

So...

Actually I hope I'll get to ASG elimination sooner rather than later. The more stuff like this I the less I believe that it can actually be fixed in a reliable manner.

@sandreenko
Copy link
Contributor Author

I have failed to create a repro that hits that bug and leads to an incorrect execution. However, I have found that gtCanSwapOrder(GenTree* firstNode, GenTree* secondNode) is also incorrect, because for a tree like:

*  MUL       int
+--*  LCL_VAR   int    V01 loc1
\--*  COMMA       int
   +--*  ASG   int    V01 loc1
   |  +--*  LCL_VAR   int    V01 loc1
   |  \--*  CNS_INT   int    1  
   \--*  LCL_VAR   int    V01 loc1

or res = V01 * (V01 = 1, V01) in normal language
gtCanSwapOrder(V01, (V01 = 1, V01)) will return true because V01 has no side-effects.

Lucky for us, most of the internal assignments that we have now could write only to temporary variables (TMP0NN) that are not used in other parts of a tree.
And importer tries to spill everything it can, so it hides these errors.
Note, that it won't be fixed by ASG elimination.

Also, gtCanSwapOrder doesn't check GTF_ORDER_SIDEEFF on the second node, but it could be ok, I have not checked that yet.

I think the solution to this that requires minimal changes could be to better describe these special internal assignments. Like prove and check that it is safe to do what we do now. Add checks that we don't have assignments that write to V0NN locals and then fix gtCanSwapOrder/gtSetEvalOrder.
These special assignments won't need to set ASG flag and we would be able to delete it.
An obvious issue could be VN and CSE that could replace TMP0NN with V0NN etc.

@mikedn
Copy link
Contributor

mikedn commented Nov 20, 2019

And importer tries to spill everything it can, so it hides these errors.

Yes, so how did you manage to produce that tree? :)

@sandreenko
Copy link
Contributor Author

Yes, so how did you manage to produce that tree? :)

I was not able to do that honestly.
I hacked an existing tree and changed lclVarNum in morph just before calling gtCanSwapOrder.
Before that, I also add a check that I ran when gtCanSwapOrder returned true. The check collected all assignments in the right subtree and all uses in the left subtree and checks that the intersection is empty. It has not failed in any pri1 tests that I was running.

@mikedn
Copy link
Contributor

mikedn commented Nov 25, 2019

FWIW CSE can create trees where swapping would lead to a use before def case:

return x * 3 + x * 3;

generates:

STMT00000 (IL 0x000...0x007)
N010 ( 18, 14) [000007] -A----------              *  RETURN    int    $140
N009 ( 17, 13) [000006] -A----------              \--*  ADD       int    $101
N007 ( 13, 10) [000011] -A----------                 +--*  COMMA     int    $100
N005 ( 10,  8) [000009] -A------R---                 |  +--*  ASG       int    $VN.Void
N004 (  3,  2) [000008] D------N----                 |  |  +--*  LCL_VAR   int    V03 cse0         d:1 $100
N003 (  6,  5) [000002] ------------                 |  |  \--*  MUL       int    $100
N001 (  1,  1) [000000] ------------                 |  |     +--*  LCL_VAR   int    V00 arg0         u:1 $80
N002 (  1,  1) [000001] ------------                 |  |     \--*  CNS_INT   int    3 $42
N006 (  3,  2) [000010] ------------                 |  \--*  LCL_VAR   int    V03 cse0         u:1 $100
N008 (  3,  2) [000012] ------------                 \--*  LCL_VAR   int    V03 cse0         u:1 $100

But CSE runs in linear order and at that point swapping no longer happens.

Comma trees produced by fgMorphArrayIndex may also have in-tree variable interference but they have all sorts of side effects on them that should prevent swapping.

@sandreenko
Copy link
Contributor Author

@mikedn thanks for that example. It means our current contract is less obvious then I thought.

So:

  • we can't get rid of internal assignments;
  • we can't check these type of conflicts honestly (it would require 2 new binary sets on each tree: defs and uses);
  • we can't forbid swapping every time when we see an assignment (too many regressions);
  • we can't state that internal ASG doesn't interfere with other nodes with the lowest common ancestor higher than its parent (your example);

Am I missing something?

@mikedn
Copy link
Contributor

mikedn commented Nov 25, 2019

we can't get rid of internal assignments;

We'd need to move the entire JIT to LIR or something similar. Maybe once we get rid of ASG, LIST & co. that would be easier but as is now moving to LIR would be too difficult. Practically impossible. It's also not 100% clear if moving to LIR would be a good thing. It has advantages but also downsides - tree transforms would need to manually maintain linear order. That's more work for the developer and it's also easy to make mistakes.

we can't forbid swapping every time when we see an assignment (too many regressions);

Have you tried that? For one thing, I would expect that there aren't that many internal assignments at the point where swapping is done. Keep in mind that once move to linear order (N.B FGOrderLinear, not LIR) swapping no longer happens so the problematic internal assignments are probably only those generated by global morph. And I don't know that many:

  • as mentioned, fgMorphArrayIndex generates such assignments but those trees have other side effects on them anyway
  • call arg morphing generates internal assignments. Those are rather special - they don't use commas, their order is not affected by swapping, they're temporaries that have no use (win-x64 struct args) or a single use that also occurs in the call arg list order (temporaries introduced by arg sorting)
  • copy/init block morphing generate internal assignments but those aren't supposed to interfere with each other since they involve fields of promoted structs
  • anything else?

Besides, gtCanSwapOrder already rejects the case where op1 has GTF_ASG and op2 is anything other than a constant. What appears to be missing is the opposite check - op2 has GTF_ASG and op1 is not a constant. That looks like an oversight to me, these checks should be symmetric.

@sandreenko
Copy link
Contributor Author

sandreenko commented Nov 25, 2019

Have you tried that?

I have tried to correctly propagate ASG and GTF_GLOB_REF flags dotnet/coreclr#27732 and it was 0.3% regression (framework assemblies crossgen x64 checked).

If you also fix gtCanSwapOrder it will be worse.

I would expect that there aren't that many internal assignments at the point where swapping is done.

I saw many assignments under call nodes for arg setup, so these nodes already had side effects but it was legal to swap the order if another subtree was free from them.

@mikedn
Copy link
Contributor

mikedn commented Nov 25, 2019

I have tried to correctly propagate ASG and GTF_GLOB_REF flags dotnet/coreclr#27732 and it was 0.3% regression (framework assemblies crossgen x64 checked).

Right, I've got lost in all this and forgot about that. So it seems that it's not a good idea to add GTF_ASG to call arg setup assignments. It's also not necessary since those temporaries won't ever interfere with anything else.

The trouble is that this kind of special casing can't be handled easily in gtUpdateSideEffects. Ultimately this is a long standing design issue of side effect flags - you can't easily tell if a side effect is generated by the node itself or if it's inherited. Maybe it's possible to teach gtUpdateNodeOperSideEffects that call arg setup assignments don't need GTF_ASG but it's not clear how.

If you also fix gtCanSwapOrder it will be worse.

Fixing only gtCanSwapOrder is fine, if I add just:

    if ((secondNode->gtFlags & GTF_ASG) != 0)
    {
        if (!firstNode->OperIsConst())
        {
            return false;
        }
    }

then the result is a very small diff and it is actually an improvement of ~80 bytes.

But yes, if call nodes also get GTF_ASG then I expect that it will be worse. It should have been obvious from the beginning - GTF_ASG is actually more conservative than GTF_CALL:

  • GTF_ASG - anything may be modified. Heap memory, stack memory - address exposed or not.
  • GTF_CALL - almost anything may be modified. The exception is stack memory that has not been address exposed.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@AndyAyersMS
Copy link
Member

Seems like there's a potential for a bug here, but we haven't found an actual test case.

Moving to future.

@AndyAyersMS AndyAyersMS modified the milestones: 5.0, Future May 1, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@BruceForstall BruceForstall modified the milestones: Future, 6.0.0 Nov 25, 2020
@sandreenko sandreenko modified the milestones: 6.0.0, Future May 21, 2021
@jakobbotsch
Copy link
Member

#13758 was fixed by #97409, and if I'm reading the comments above correctly that should have fixed this problem. So I'm going to close this.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Projects
None yet
Development

No branches or pull requests

7 participants