Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset JNI Addresses if FSD on restore #20108

Merged
merged 1 commit into from
Sep 5, 2024
Merged

Conversation

dsouzai
Copy link
Contributor

@dsouzai dsouzai commented Sep 3, 2024

If debug is specified on restore, ensure that any proactively compiled
JNI Methods are reset to their original address.

JNI methods are not compiled under FSD, but they are compiled
proactively in the checkpoint hook. However, if debug is specified on
restore, all proactively compiled methods, except for JNI thunks, get
invalidated. Thus, the extra field of the J9Method of a JNI method
needs to get reset to whatever the VM initialized them to originally.

This is necessary to ensure an environment that is consistent with FSD
mode. The invalidation of proactively compiled non-JNI methods (which
are non-FSD bodies) ensure that non-FSD code does not execute
post-restore. At the checkpoint hook, only FSD bodies are on the stacks
of the threads; although the compiler currently does not reuse these
FSD bodies, they will continue to execute on restore until an OSR
transition, but this is OK because they are already set up for
involuntary OSR. The missing piece was the JNI methods, which this
commit addresses.

@dsouzai dsouzai added comp:jit criu Used to track CRIU snapshot related work labels Sep 3, 2024
@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 3, 2024

@mpirvu could you please review?

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. However, I don't have enough knowledge to assess whether "resetting" a JNI method is as simple as overwriting j9method-extra.

@mpirvu
Copy link
Contributor

mpirvu commented Sep 4, 2024

@dsouzai Could you also please add more info to the message of this commit. Why do we need to reset the JNI methods post restore? Are JNI methods compiled differently when debug is enable? And why only the proactive ones need to be reset?

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 4, 2024

However, I don't have enough knowledge to assess whether "resetting" a JNI method is as simple as overwriting j9method-extra.

When I had first looked into JNI methods wrt -XX:+DebugOnRestore, I recall trying to set the extra to a count value to retry compilation on restore, but this caused crashes with JNI methods. It turns out that the extra field of the JNI methods is set to a native address the first time it's initialized (and the VM also queues a compilation at that time). If the compilation fails (eg because of FSD) the JIT doesn't touch the extra field like it would with other methods (ie setting the extra to J9_JIT_NEVER_TRANSLATE).

So, "resetting" here is just reverting the extra to the original address the VM initialized the JNI method with.

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 4, 2024

@dsouzai Could you also please add more info to the message of this commit.

Sure will do, however to answer the questions:

Why do we need to reset the JNI methods post restore? Are JNI methods compiled differently when debug is enable? And why only the proactive ones need to be reset?

Under FSD we don't compile JNI methods. However, we do compile the JNI methods proactively in the checkpoint hook as we optimistically assume debug will not be specified on restore. If debug is specified on restore, we need to invalidate all the proactively compiled methods. Now the code currently invalidates all compiled code (including the FSD bodies) which is not very efficient, but it's a good enough solution for now. However, we don't invalidate JNI thunks because they're different. Thus, we need to reset them so that they're back to whatever the interpreter initialized them originally.

If debug is specified on restore, ensure that any proactively compiled
JNI Methods are reset to their original address.

JNI methods are not compiled under FSD, but they are compiled
proactively in the checkpoint hook. However, if debug is specified on
restore, all proactively compiled methods, except for JNI thunks, get
invalidated. Thus, the extra field of the J9Method of a JNI method
needs to get reset to whatever the VM initialized them to originally.

This is necessary to ensure an environment that is consistent with FSD
mode. The invalidation of proactively compiled non-JNI methods (which
are non-FSD bodies) ensure that non-FSD code does not execute
post-restore. At the checkpoint hook, only FSD bodies are on the stacks
of the threads; although the compiler currently does not reuse these
FSD bodies, they will continue to execute on restore until an OSR
transition, but this is OK because they are already set up for
involuntary OSR. The missing piece was the JNI methods, which this
commit addresses.

Signed-off-by: Irwin D'Souza <[email protected]>
@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 4, 2024

@mpirvu made requested changes.

@mpirvu
Copy link
Contributor

mpirvu commented Sep 4, 2024

jenkins compile all jdk17

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 5, 2024

linux build failed because of

[2024-09-04T19:57:54.688Z] /usr/local/gcc11/lib/gcc/x86_64-pc-linux-gnu/11.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: final link failed: No space left on device

@mpirvu
Copy link
Contributor

mpirvu commented Sep 5, 2024

jenkins test sanity zlinux,xlinux jdk17

@dsouzai
Copy link
Contributor Author

dsouzai commented Sep 5, 2024

zlinux failure

[2024-09-05T15:07:46.658Z] Testing: Ensure no core files have been produced by the preceding tests
[2024-09-05T15:07:46.658Z] Test start time: 2024/09/05 15:07:33 Coordinated Universal Time
[2024-09-05T15:07:46.658Z] Running command: sh -c ls
[2024-09-05T15:07:46.659Z] Time spent starting: 7 milliseconds
[2024-09-05T15:07:46.659Z] Time spent executing: 12 milliseconds
[2024-09-05T15:07:46.659Z] Test result: FAILED
[2024-09-05T15:07:46.659Z] Output from test:
[2024-09-05T15:07:46.659Z]  [OUT] core.20240905.150638.2351912.0001.dmp
[2024-09-05T15:07:46.659Z]  [OUT] core.20240905.150638.2351912.0002.dmp
[2024-09-05T15:07:46.659Z]  [OUT] foo.%001.log
[2024-09-05T15:07:46.659Z]  [OUT] foo.001.log
[2024-09-05T15:07:46.659Z]  [OUT] foo.002.log
[2024-09-05T15:07:46.659Z]  [OUT] foo.#.log
[2024-09-05T15:07:46.659Z]  [OUT] foo.log
[2024-09-05T15:07:46.659Z]  [OUT] foo.log.001
[2024-09-05T15:07:46.659Z]  [OUT] foo.log.002
[2024-09-05T15:07:46.659Z]  [OUT] foo.log.003
[2024-09-05T15:07:46.659Z]  [OUT] heapdump.20240905.150638.2351912.0003.phd
[2024-09-05T15:07:46.659Z]  [OUT] javacore.20240905.150638.2351912.0004.txt
[2024-09-05T15:07:46.659Z]  [OUT] Snap.20240905.150638.2351912.0005.trc
[2024-09-05T15:07:46.659Z]  [OUT] static_temp.txt
[2024-09-05T15:07:46.659Z]  [OUT] verbosegc.20240905.150242.2351494.txt
[2024-09-05T15:07:46.659Z]  [OUT] verbosegc.20240905.150257.2351721.txt
[2024-09-05T15:07:46.659Z] >> Success condition was found: [Output match: ]
[2024-09-05T15:07:46.659Z] >> Failure condition was found: [Output match: core.*]

due to #18048

@mpirvu
Copy link
Contributor

mpirvu commented Sep 5, 2024

Tests on xlinux have passed and the only failure on zlinux is due to other causes, hence merging.

@mpirvu mpirvu self-assigned this Sep 5, 2024
@mpirvu mpirvu merged commit 92fefdf into eclipse-openj9:master Sep 5, 2024
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit criu Used to track CRIU snapshot related work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants