Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure JIT/AOT code is not invalidated post-restore under -XX:+DebugOnRestore #20047

Merged
merged 3 commits into from
Sep 24, 2024

Conversation

dsouzai
Copy link
Contributor

@dsouzai dsouzai commented Aug 22, 2024

#19754 adds VM Support for Debug On Restore. As the code stands, this will result in the JIT invalidating all code post-restore because it is unable to distinguish between FSD mode caused by the VM or the user.

This PR fixes this by adding the following changes:

  • Cache the status of certain runtime events
  • Return early from isFSDNeeded under -XX:+DebugOnRestore

Depends on #19754 (specifically the update to the isDebugOnRestoreEnabled API).

@JasonFengJ9
Copy link
Member

@dsouzai would you mind enabling the following failure conditions that were disabled in #19754? I verified manually this PR passed the test with the failure conditions were reestablished.

<!-- Following two failure conditions to be re-enabled with https://github.com/eclipse-openj9/openj9/pull/20047 -->
<!-- output type="failure" caseSensitive="yes" regex="no">Some or all compiled code in the code cache invalidated post restore.</output>
<output type="failure" caseSensitive="yes" regex="no">JIT compilation disabled post restore.</output -->

<!-- Following two failure conditions to be re-enabled with https://github.com/eclipse-openj9/openj9/pull/20047 -->
<!-- output type="failure" caseSensitive="yes" regex="no">JIT compilation disabled post restore.</output>
<output type="failure" caseSensitive="yes" regex="no">AOT load and compilation disabled post restore.</output -->

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 19, 2024

@ymanton could you please review/merge?

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 20, 2024

@TobiAjila fyi this also needs to get into 0.48

Copy link
Member

@ymanton ymanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ymanton
Copy link
Member

ymanton commented Sep 20, 2024

Jenkins test sanity.functional xlinux,plinux,zlinux jdk11,jdk17

* as well as whether method trace and FSD were enabled
* pre-checkpoint.
*/
cacheEventsStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have an explicit return type (i.e. void).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah damn, thanks for pointing out. Surprising this ever built locally.

Cache the status of JVMTI events such as exception throw/catch as well
as whether method trace and FSD were enabled pre-checkpoint.

Signed-off-by: Irwin D'Souza <[email protected]>
Under -XX:+DebugOnRestore, the VM will enable some capabilities that
will be reset on restore (if the user does not want debug on restore).
However, if the JIT detects these capabilities via the J9HookDisable
call, there is no way for the JIT to know whether the capability was set
by the VM or a user pre-checkpoint.

With the recent change to the isDebugOnRestoreEnabled to check if a uesr
specified an JDWP agent pre-checkpoint, the JIT can simply return early
from isFSDNeeded. Under -XX:+DebugOnRestore, the JIT already generates
FSD code so there is on need to check the caps that were set by the VM
pre-checkpoint.

isFSDNeeded is called on restore, so at that point it is appropriate for
the JIT to execute the rest of the method at that point.

Signed-off-by: Irwin D'Souza <[email protected]>
@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 20, 2024

Jenkins test sanity.functional xlinux,plinux,zlinux jdk11,jdk17

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 23, 2024

The ppc64le test failure are all the Test -Xjit:exclude={*} without -XX:+DebugOnRestore:

[2024-09-21T00:23:24.079Z] Output from test:
[2024-09-21T00:23:24.079Z]  [OUT] start running script
[2024-09-21T00:23:24.079Z]  [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
[2024-09-21T00:23:24.079Z]  [OUT] export LD_BIND_NOT=on
[2024-09-21T00:23:24.079Z]  [OUT] /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_Personal_testList_0/jdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xjit  -cp /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.OptionsFileTest JitOptionsTest -Xjit:exclude={*} 1
[2024-09-21T00:23:24.079Z]  [OUT] Pre-checkpoint
[2024-09-21T00:23:24.079Z]  [OUT] main: Sat Sep 21 00:23:18 UTC 2024, Performing CRIUSupport.checkpointJVM(), System.currentTimeMillis(): 1726878198957, System.nanoTime(): 1363144591107177
[2024-09-21T00:23:24.079Z]  [OUT] JVMJITM043W AOT load and compilation disabled post restore.
[2024-09-21T00:23:24.079Z]  [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
[2024-09-21T00:23:24.079Z]  [OUT] Post-checkpoint
[2024-09-21T00:23:24.079Z]  [OUT] initiate restore
[2024-09-21T00:23:24.079Z]  [OUT] Removed test output files
[2024-09-21T00:23:24.079Z]  [OUT] finished script
[2024-09-21T00:23:24.079Z]  [ERR] /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_ppc64le_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 41: 2543286 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
[2024-09-21T00:23:24.079Z] >> Success condition was found: [Output match: Killed]
[2024-09-21T00:23:24.079Z] >> Required condition was found: [Output match: Pre-checkpoint]
[2024-09-21T00:23:24.079Z] >> Success condition was found: [Output match: Post-checkpoint]
[2024-09-21T00:23:24.079Z] >> Failure condition was not found: [Output match: CRIU is not enabled]
[2024-09-21T00:23:24.080Z] >> Failure condition was not found: [Output match: Operation not permitted]
[2024-09-21T00:23:24.080Z] >> Success condition was found: [Output match: Some or all compiled code in the code cache invalidated post restore.]
[2024-09-21T00:23:24.080Z] >> Failure condition was not found: [Output match: JIT compilation disabled post restore.]
[2024-09-21T00:23:24.080Z] >> Failure condition was found: [Output match: AOT load and compilation disabled post restore.]
[2024-09-21T00:23:24.080Z] >> Success condition was not found: [Output match: Thread pid mismatch]
[2024-09-21T00:23:24.080Z] >> Success condition was not found: [Output match: do not match expected]
[2024-09-21T00:23:24.080Z] >> Success condition was not found: [Output match: Unable to create a thread:]
[2024-09-21T00:23:24.080Z] >> Failure condition was not found: [Output match: User requested Java dump using]

Because AOT load and compilation disabled post restore. gets printed out before Some or all compiled code in the code cache invalidated post restore., I believe this is due to AOT being disabled because of the SCC or similar, rather than because of something like FSD. Essentially, I believe it gets printed out here


rather than here

@ymanton I believe you've also seen this issue before right?

@ymanton
Copy link
Member

ymanton commented Sep 23, 2024

@ymanton I believe you've also seen this issue before right?

Yeah that's a pre-existing issue.

I'll merge shortly if there are no further comments and requested changes are confirmed.

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 24, 2024

@ymanton could this be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit criu Used to track CRIU snapshot related work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants