Fix for number of used frame infos during crossgen2 compilation #64549

gbalykov · 2022-01-31T18:04:30Z

Problem with JIT/Regression/JitBlue/GitHub_17777/GitHub_17777 happens because 1st compCompile invocation in jitNativeCode throws exception CORJIT_INTERNALERROR, but reserved unwind info (via reserveUnwindInfo) is never freed. Second attempt without opts succeeds, but reserveUnwindInfo is called again, so there's actually twice as much unwind info reserved as needed.

It might be worth to investigate the reason of CORJIT_INTERNALERROR on arm64 for GitHub_17777 separately. This change will still be needed, because compilation retry might happen on some other tests too for one reason or another.

Update: I've found the reason of CORJIT_INTERNALERROR. NO_WAY assert happens inside jit during compilation in emit.cpp:

        /* Are we overflowing? */
        if (ig->igNext && (ig->igNum + 1 != ig->igNext->igNum))
        {
            NO_WAY("Too many instruction groups");
        }

igNum here is unsigned on all platforms, i.e. 32 bit. I guess this is currently a limitation of jit.

cc @alpencolt

gbalykov · 2022-02-22T11:14:17Z

@MichalStrehovsky could you, please, take a look?

MichalStrehovsky · 2022-02-22T12:09:16Z

@dotnet/crossgen-contrib could someone have a look. I'm not familiar with this.

jkotas · 2022-02-22T15:42:58Z

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs

@@ -422,6 +422,19 @@ private void PublishCode()
 #endif
                );

+            if (_usedFrameInfos != _numFrameInfos)
+            {
+                // jitNativeCode might retry itself when exception happens inside jit.


Would it be better to clear the already allocated frameinfos when jitNativeCode retries?

Yes, you are right, it would be a better solution. At first I did not see existing jit interface to do this, but for some reason missed reportFatalError. I've checked that cleanup in reportFatalError works, so I'll update this PR.

jitNativeCode might retry itself when exception happens inside jit, but reserveUnwindInfo might be already called at this point, so _numFrameInfos should be cleaned up.

gbalykov · 2022-03-10T13:07:17Z

@jkotas could you, please, take a look?

jkotas · 2022-03-14T21:37:44Z

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
        {
            // We could add some logging here, but for now it's unnecessary.
            // CompileMethod is going to fail with this CorJitResult anyway.


This comment is not correct. CompileMethod is not always going to fail.

jkotas · 2022-03-14T21:37:45Z

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
        {
            // We could add some logging here, but for now it's unnecessary.
            // CompileMethod is going to fail with this CorJitResult anyway.


This comment is not correct. CompileMethod is not always going to fail.

jkotas · 2022-03-14T21:40:52Z

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
        {
            // We could add some logging here, but for now it's unnecessary.
            // CompileMethod is going to fail with this CorJitResult anyway.
+            _numFrameInfos = 0;


Is this all state that needs to be cleaned up? From cursory look, there seems to be more state that may cause problems.

I am wondering whether the retry logic should be rather moved from the JIT to the EE side. It would allow the state cleanup to be more robust. Also, it would allow us to skip the retry in cases where it does not make sense. For example, it does not make sense to retry with no optimizations when generating Tier1 code.

@dotnet/jit-contrib Thoughts?

I am wondering whether the retry logic should be rather moved from the JIT to the EE side.

Maybe I'm missing something: the proposed change is in crossgen2, not the JIT (or EE).

In the JIT'ing case, we already have a backout function: CEEJitInfo::BackoutJitData() => EEJitManager::RemoveJitData(). It removes EH info, GC info, and maybe more. (Maybe not the debug info?) Seems like crossgen2 would need to do the same (i.e., remove GC info, debug info, code buffer).

We have similar method in crossgen2 as well (CompileMethodCleanup). The problem is that neither BackoutJitData nor CompileMethodCleanup get called when the JIT decides to retry with optimizations off.

Ah, I see. There's a "small" window where, after the JIT has already started allocating EH/GC/code space from the VM, where it can hit a NO_WAY assert and decide to retry. It would make sense for the JIT to call back through the JIT/EE interface before retrying saying "I'm going to retry".

Personally I think it makes sense to leave it up to the consumer of the JIT to retry if that is the desired behavior -- it seems a little strange to me that the JIT does this implicitly even though optimizations were requested. It was also the reason behind the unexpected debug codegen in #63708.

reportFatalError signals that the JIT is going to retry, so we sort of have that callback already.

There is retry logic in the VM to deal with relocs overflow that does proper cleanup before retrying. In other words, we have one retry logic in the JIT itself and second retry logic in the VM. My point was whether it would be better to have just one retry logic in the VM that handles both cases.

jkotas · 2022-03-14T21:40:52Z

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
        {
            // We could add some logging here, but for now it's unnecessary.
            // CompileMethod is going to fail with this CorJitResult anyway.
+            _numFrameInfos = 0;


Is this all state that needs to be cleaned up? From cursory look, there seems to be more state that may cause problems.

I am wondering whether the retry logic should be rather moved from the JIT to the EE side. It would allow the state cleanup to be more robust. Also, it would allow us to skip the retry in cases where it does not make sense. For example, it does not make sense to retry with no optimizations when generating Tier1 code.

@dotnet/jit-contrib Thoughts?

BruceForstall · 2022-03-15T04:19:45Z

igNum here is unsigned on all platforms, i.e. 32 bit. I guess this is currently a limitation of jit.

32 bits would be fine. The issue was with the prolog IG only, and should have been fixed by #65153

mangod9 · 2022-04-11T14:48:01Z

Hi @gbalykov, checking in whether you would be working on updating the PR?

trylek · 2022-06-20T19:24:14Z

There has been no traffic on this PR for more than two months. I'm closing this now, please feel free to reopen as you see fit.

gbalykov requested a review from MichalStrehovsky as a code owner January 31, 2022 18:04

dotnet-issue-labeler bot added the area-crossgen2-coreclr label Jan 31, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 31, 2022

jkotas reviewed Feb 22, 2022

View reviewed changes

Fix for number of used frame infos during crossgen2 compilation

2f6259c

jitNativeCode might retry itself when exception happens inside jit, but reserveUnwindInfo might be already called at this point, so _numFrameInfos should be cleaned up.

gbalykov force-pushed the fix-crossgen2-unwindinfo branch from 6ba4f1e to 2f6259c Compare February 22, 2022 19:18

jkotas reviewed Mar 14, 2022

View reviewed changes

trylek closed this Jun 20, 2022

ghost locked as resolved and limited conversation to collaborators Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for number of used frame infos during crossgen2 compilation #64549

Fix for number of used frame infos during crossgen2 compilation #64549

gbalykov commented Jan 31, 2022 •

edited

Loading

gbalykov commented Feb 22, 2022

MichalStrehovsky commented Feb 22, 2022

jkotas Feb 22, 2022

gbalykov Feb 22, 2022

gbalykov commented Mar 10, 2022

jkotas Mar 14, 2022

jkotas Mar 14, 2022

jkotas Mar 14, 2022

BruceForstall Mar 15, 2022

jkotas Mar 15, 2022 •

edited

Loading

BruceForstall Mar 15, 2022

jakobbotsch Mar 15, 2022

jkotas Mar 15, 2022

jkotas Mar 14, 2022

BruceForstall commented Mar 15, 2022

mangod9 commented Apr 11, 2022

trylek commented Jun 20, 2022

Fix for number of used frame infos during crossgen2 compilation #64549

Fix for number of used frame infos during crossgen2 compilation #64549

Conversation

gbalykov commented Jan 31, 2022 • edited Loading

gbalykov commented Feb 22, 2022

MichalStrehovsky commented Feb 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbalykov commented Mar 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall commented Mar 15, 2022

mangod9 commented Apr 11, 2022

trylek commented Jun 20, 2022

gbalykov commented Jan 31, 2022 •

edited

Loading

jkotas Mar 15, 2022 •

edited

Loading