-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for number of used frame infos during crossgen2 compilation #64549
Conversation
@MichalStrehovsky could you, please, take a look? |
@dotnet/crossgen-contrib could someone have a look. I'm not familiar with this. |
@@ -422,6 +422,19 @@ private void PublishCode() | |||
#endif | |||
); | |||
|
|||
if (_usedFrameInfos != _numFrameInfos) | |||
{ | |||
// jitNativeCode might retry itself when exception happens inside jit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to clear the already allocated frameinfos when jitNativeCode retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, it would be a better solution. At first I did not see existing jit interface to do this, but for some reason missed reportFatalError
. I've checked that cleanup in reportFatalError
works, so I'll update this PR.
jitNativeCode might retry itself when exception happens inside jit, but reserveUnwindInfo might be already called at this point, so _numFrameInfos should be cleaned up.
6ba4f1e
to
2f6259c
Compare
@jkotas could you, please, take a look? |
@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result) | |||
{ | |||
// We could add some logging here, but for now it's unnecessary. | |||
// CompileMethod is going to fail with this CorJitResult anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not correct. CompileMethod
is not always going to fail.
@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result) | |||
{ | |||
// We could add some logging here, but for now it's unnecessary. | |||
// CompileMethod is going to fail with this CorJitResult anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not correct. CompileMethod
is not always going to fail.
@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result) | |||
{ | |||
// We could add some logging here, but for now it's unnecessary. | |||
// CompileMethod is going to fail with this CorJitResult anyway. | |||
_numFrameInfos = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all state that needs to be cleaned up? From cursory look, there seems to be more state that may cause problems.
I am wondering whether the retry logic should be rather moved from the JIT to the EE side. It would allow the state cleanup to be more robust. Also, it would allow us to skip the retry in cases where it does not make sense. For example, it does not make sense to retry with no optimizations when generating Tier1 code.
@dotnet/jit-contrib Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether the retry logic should be rather moved from the JIT to the EE side.
Maybe I'm missing something: the proposed change is in crossgen2, not the JIT (or EE).
In the JIT'ing case, we already have a backout function: CEEJitInfo::BackoutJitData()
=> EEJitManager::RemoveJitData()
. It removes EH info, GC info, and maybe more. (Maybe not the debug info?) Seems like crossgen2 would need to do the same (i.e., remove GC info, debug info, code buffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have similar method in crossgen2 as well (CompileMethodCleanup
). The problem is that neither BackoutJitData
nor CompileMethodCleanup
get called when the JIT decides to retry with optimizations off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. There's a "small" window where, after the JIT has already started allocating EH/GC/code space from the VM, where it can hit a NO_WAY assert and decide to retry. It would make sense for the JIT to call back through the JIT/EE interface before retrying saying "I'm going to retry".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think it makes sense to leave it up to the consumer of the JIT to retry if that is the desired behavior -- it seems a little strange to me that the JIT does this implicitly even though optimizations were requested. It was also the reason behind the unexpected debug codegen in #63708.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reportFatalError
signals that the JIT is going to retry, so we sort of have that callback already.
There is retry logic in the VM to deal with relocs overflow that does proper cleanup before retrying. In other words, we have one retry logic in the JIT itself and second retry logic in the VM. My point was whether it would be better to have just one retry logic in the VM that handles both cases.
@@ -3430,6 +3430,7 @@ private void reportFatalError(CorJitResult result) | |||
{ | |||
// We could add some logging here, but for now it's unnecessary. | |||
// CompileMethod is going to fail with this CorJitResult anyway. | |||
_numFrameInfos = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all state that needs to be cleaned up? From cursory look, there seems to be more state that may cause problems.
I am wondering whether the retry logic should be rather moved from the JIT to the EE side. It would allow the state cleanup to be more robust. Also, it would allow us to skip the retry in cases where it does not make sense. For example, it does not make sense to retry with no optimizations when generating Tier1 code.
@dotnet/jit-contrib Thoughts?
32 bits would be fine. The issue was with the prolog IG only, and should have been fixed by #65153 |
Hi @gbalykov, checking in whether you would be working on updating the PR? |
There has been no traffic on this PR for more than two months. I'm closing this now, please feel free to reopen as you see fit. |
Fixes #59040
Problem with
JIT/Regression/JitBlue/GitHub_17777/GitHub_17777
happens because 1stcompCompile
invocation injitNativeCode
throws exceptionCORJIT_INTERNALERROR
, but reserved unwind info (viareserveUnwindInfo
) is never freed. Second attempt without opts succeeds, butreserveUnwindInfo
is called again, so there's actually twice as much unwind info reserved as needed.It might be worth to investigate the reason of
CORJIT_INTERNALERROR
on arm64 for GitHub_17777 separately. This change will still be needed, because compilation retry might happen on some other tests too for one reason or another.Update: I've found the reason of
CORJIT_INTERNALERROR
.NO_WAY
assert happens inside jit during compilation in emit.cpp:igNum
here isunsigned
on all platforms, i.e. 32 bit. I guess this is currently a limitation of jit.cc @alpencolt