-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent hangs when preparing the JIT for checkpoint #15202
Conversation
When the compilation queue is empty, active compilation threads will wait on the comp monitor for work. The thread that hooks into the jit to prepare for checkpoint signals that comp threads should be suspended but it does not notify threads waiting on the comp monitor. This can lead to a hang where the hook thread waits to be notified by the "active" comp thread that it is en route to suspend itself, while the "active" comp thread waits to be notified for when there is work to do be done. Signed-off-by: Irwin D'Souza <[email protected]>
@mpirvu could you please review? |
@mpirvu this PR is now ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small comment, otherwise it looks good
The thread that hooks into the JIT to start preparing for checkpoint has VMAccess. However, any compilation threads currently compiling will request Exclusive VMAccess at some point during the compilation to add the code cache to the artifact manager. Because the hook thread waits for all compilations in flight to complete, a deadlock condition arises as the compilation thread waits for Exclusive VMAccess that it can't acquire because the hook thread has VMAccess, while the hook thread waits for the compilation thread to notify it that it will suspend itself. This commit fixes this by releasing VMAccess before preparing to checkpoint, and requires it before returning to the VM. This is ok because all other java threads have been halted at this point. Signed-off-by: Irwin D'Souza <[email protected]>
@mpirvu I created a new RAII class to release VM Access and acquire the monitor because given how I missed the return, it's entirely possible for it to happen again in the future, so it just felt better to make the acquiring/releasing automatic. Good for another review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins compile all jdk17 |
jenkins test sanity xlinux jdk17 |
Jenkins test sanity xlinuxcriu jdk11 |
mac failure may be infra related (similar to #14994 (comment)) :
|
The criu build failed in
caused by #14974 |
I agree that the failures seen in testing are not related to this PR, hence merging. |
Prevent hangs when preparing the JIT for checkpoint; there are two different circumstances that can lead to a hang:
Fixes #15191