Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for number of used frame infos during crossgen2 compilation #64549

Closed
wants to merge 1 commit into from

Conversation

gbalykov
Copy link
Member

@gbalykov gbalykov commented Jan 31, 2022

Fixes #59040

Problem with JIT/Regression/JitBlue/GitHub_17777/GitHub_17777 happens because 1st compCompile invocation in jitNativeCode throws exception CORJIT_INTERNALERROR, but reserved unwind info (via reserveUnwindInfo) is never freed. Second attempt without opts succeeds, but reserveUnwindInfo is called again, so there's actually twice as much unwind info reserved as needed.

It might be worth to investigate the reason of CORJIT_INTERNALERROR on arm64 for GitHub_17777 separately. This change will still be needed, because compilation retry might happen on some other tests too for one reason or another.

Update: I've found the reason of CORJIT_INTERNALERROR. NO_WAY assert happens inside jit during compilation in emit.cpp:

        /* Are we overflowing? */
        if (ig->igNext && (ig->igNum + 1 != ig->igNext->igNum))
        {
            NO_WAY("Too many instruction groups");
        }

igNum here is unsigned on all platforms, i.e. 32 bit. I guess this is currently a limitation of jit.

cc @alpencolt

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 31, 2022
@gbalykov
Copy link
Member Author

@MichalStrehovsky could you, please, take a look?

@MichalStrehovsky
Copy link
Member

@dotnet/crossgen-contrib could someone have a look. I'm not familiar with this.

@@ -422,6 +422,19 @@ private void PublishCode()
#endif
);

if (_usedFrameInfos != _numFrameInfos)
{
// jitNativeCode might retry itself when exception happens inside jit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to clear the already allocated frameinfos when jitNativeCode retries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it would be a better solution. At first I did not see existing jit interface to do this, but for some reason missed reportFatalError. I've checked that cleanup in reportFatalError works, so I'll update this PR.

jitNativeCode might retry itself when exception happens inside jit,
but reserveUnwindInfo might be already called at this point,
so _numFrameInfos should be cleaned up.
@gbalykov
Copy link
Member Author

@jkotas could you, please, take a look?

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
{
// We could add some logging here, but for now it's unnecessary.
// CompileMethod is going to fail with this CorJitResult anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not correct. CompileMethod is not always going to fail.

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
{
// We could add some logging here, but for now it's unnecessary.
// CompileMethod is going to fail with this CorJitResult anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not correct. CompileMethod is not always going to fail.

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
{
// We could add some logging here, but for now it's unnecessary.
// CompileMethod is going to fail with this CorJitResult anyway.
_numFrameInfos = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this all state that needs to be cleaned up? From cursory look, there seems to be more state that may cause problems.

I am wondering whether the retry logic should be rather moved from the JIT to the EE side. It would allow the state cleanup to be more robust. Also, it would allow us to skip the retry in cases where it does not make sense. For example, it does not make sense to retry with no optimizations when generating Tier1 code.

@dotnet/jit-contrib Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether the retry logic should be rather moved from the JIT to the EE side.

Maybe I'm missing something: the proposed change is in crossgen2, not the JIT (or EE).

In the JIT'ing case, we already have a backout function: CEEJitInfo::BackoutJitData() => EEJitManager::RemoveJitData(). It removes EH info, GC info, and maybe more. (Maybe not the debug info?) Seems like crossgen2 would need to do the same (i.e., remove GC info, debug info, code buffer).

Copy link
Member

@jkotas jkotas Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have similar method in crossgen2 as well (CompileMethodCleanup). The problem is that neither BackoutJitData nor CompileMethodCleanup get called when the JIT decides to retry with optimizations off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. There's a "small" window where, after the JIT has already started allocating EH/GC/code space from the VM, where it can hit a NO_WAY assert and decide to retry. It would make sense for the JIT to call back through the JIT/EE interface before retrying saying "I'm going to retry".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think it makes sense to leave it up to the consumer of the JIT to retry if that is the desired behavior -- it seems a little strange to me that the JIT does this implicitly even though optimizations were requested. It was also the reason behind the unexpected debug codegen in #63708.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reportFatalError signals that the JIT is going to retry, so we sort of have that callback already.

There is retry logic in the VM to deal with relocs overflow that does proper cleanup before retrying. In other words, we have one retry logic in the JIT itself and second retry logic in the VM. My point was whether it would be better to have just one retry logic in the VM that handles both cases.

@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result)
{
// We could add some logging here, but for now it's unnecessary.
// CompileMethod is going to fail with this CorJitResult anyway.
_numFrameInfos = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this all state that needs to be cleaned up? From cursory look, there seems to be more state that may cause problems.

I am wondering whether the retry logic should be rather moved from the JIT to the EE side. It would allow the state cleanup to be more robust. Also, it would allow us to skip the retry in cases where it does not make sense. For example, it does not make sense to retry with no optimizations when generating Tier1 code.

@dotnet/jit-contrib Thoughts?

@BruceForstall
Copy link
Member

igNum here is unsigned on all platforms, i.e. 32 bit. I guess this is currently a limitation of jit.

32 bits would be fine. The issue was with the prolog IG only, and should have been fixed by #65153

@mangod9
Copy link
Member

mangod9 commented Apr 11, 2022

Hi @gbalykov, checking in whether you would be working on updating the PR?

@trylek
Copy link
Member

trylek commented Jun 20, 2022

There has been no traffic on this PR for more than two months. I'm closing this now, please feel free to reopen as you see fit.

@trylek trylek closed this Jun 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-crossgen2-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crossgen2 error on arm64 for GitHub_17777
7 participants