Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JNI Target relocation record to have offset to the reloLocation. #14421

Merged
merged 2 commits into from
Feb 15, 2022

Conversation

dsouzai
Copy link
Contributor

@dsouzai dsouzai commented Feb 2, 2022

On x86, the JNI Target relocation puts the incorrect address when
generating a Runtime Assumption. However, the address that the
relocation infra itself needs to patch is correct. Thus, the relo record
has to hold not just the location to patch but also the location to
register the assumption against. This is the same on most platforms, but
different on x86.

Depends on eclipse/omr#6326
Depends on eclipse/omr#6331
Depends on eclipse/omr#6332

Fixes #14300

Normally the relocation records are generated at binary encoding, when
the offset to patch relative to the start of the method is known.
However, sometimes a relocation record has to be generated before.

This commit fixes some such locations that did not use the
TR::BeforeBinaryEncodingExternalRelocation.

Signed-off-by: Irwin D'Souza <[email protected]>
@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 2, 2022

@jdmpapin do you mind reviewing this and the associated OMR PR?

Copy link
Contributor

@mstoodle mstoodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of minor comments, but looks ok to me if the 8bit mask is fixed/explained :)

runtime/compiler/codegen/J9AheadOfTimeCompile.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/RelocationRecord.cpp Outdated Show resolved Hide resolved
runtime/compiler/codegen/J9AheadOfTimeCompile.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/RelocationRecord.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/RelocationRecord.cpp Outdated Show resolved Hide resolved
runtime/compiler/p/codegen/J9TreeEvaluator.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/RelocationRecord.cpp Outdated Show resolved Hide resolved
@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 4, 2022

Gonna have to open yet another PR; because I force pushed, it's causing OpenJ9 OMR mirroring to fail.

Mixed it up with the OMR PR...

@dsouzai dsouzai closed this Feb 4, 2022
@dsouzai dsouzai reopened this Feb 4, 2022
@jdmpapin
Copy link
Contributor

jdmpapin commented Feb 4, 2022

Jenkins test sanity+aot all jdk17 depends eclipse/omr#6332

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 7, 2022

Worth noting all tests passted except for aarch64, which failed because of

16:13:25  /home/jenkins/workspace/Build_JDK17_aarch64_linux_Personal/openj9/runtime/compiler/aarch64/codegen/ARM64JNILinkage.cpp: In member function 'TR::Instruction* J9::ARM64::JNILinkage::generateMethodDispatch(TR::Node*, bool, TR::RegisterDependencyConditions*, uintptr_t, TR::Register*)':
16:13:25  /home/jenkins/workspace/Build_JDK17_aarch64_linux_Personal/openj9/runtime/compiler/aarch64/codegen/ARM64JNILinkage.cpp:924:51: error: invalid use of member function 'TR::Compilation* OMR::Linkage::comp()' (did you forget the '()' ?)
16:13:25         TR_RelocationRecordInformation *info = new (comp->trHeapMemory()) TR_RelocationRecordInformation();
16:13:25                                                     ^~~~
16:13:25  /home/jenkins/workspace/Build_JDK17_aarch64_linux_Personal/openj9/runtime/compiler/aarch64/codegen/ARM64JNILinkage.cpp:924:55: error: base operand of '->' is not a pointer
16:13:25         TR_RelocationRecordInformation *info = new (comp->trHeapMemory()) TR_RelocationRecordInformation();
16:13:25                                                         ^~

Will force push the build break fix. Commenting this so that we don't need to run tests on all platforms again.

As per https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/1574/ only the builds passed, I guess the CI was too unstable and didn't actually test anything...

On x86, the JNI Target relocation puts the incorrect address when
generating a Runtime Assumption. However, the address that the
relocation infra itself needs to patch is correct. Thus, the relo record
has to hold not just the location to patch but also the location to
register the assumption against. This is the same on most platforms, but
different on x86.

Signed-off-by: Irwin D'Souza <[email protected]>
@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 7, 2022

Jenkins test sanity+aot all jdk17 depends eclipse/omr#6332

1 similar comment
@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 7, 2022

Jenkins test sanity+aot all jdk17 depends eclipse/omr#6332

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 7, 2022

Ok well jenkins is too unstable to launch the PR tests so I guess we're on standby till then..

@pshipton
Copy link
Member

pshipton commented Feb 7, 2022

Jenkins test sanity.functional+aot all jdk17 depends eclipse/omr#6332

@AdamBrousseau
Copy link
Contributor

Jenkins test sanity.functional+aot all jdk17 depends eclipse/omr#6332

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 9, 2022

The failing tests are due to

  1. Cache name should not be longer than 64 chars #14461
  2. Crash in jit_compareAndBranch_0 under forceAOT #14459

1 should be fixed now and the test in 2 has been disabled.

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 9, 2022

Jenkins test sanity.functional+aot all jdk17 depends eclipse/omr#6332

launching again just for sanity purposes.

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 10, 2022

The other aix failure is in cmdLineTester_libpathTestRtf_0 caused by

[2022-02-10T01:50:23.505Z]  [ERR] Exception in thread "main" java.awt.AWTError: Can't connect to X11 window server using 'unix:0' as the value of the DISPLAY variable.
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:101)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.base/java.security.AccessController.doPrivileged(AccessController.java:683)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:60)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/sun.awt.PlatformGraphicsInfo.createGE(PlatformGraphicsInfo.java:36)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/java.awt.GraphicsEnvironment$LocalGE.createGE(GraphicsEnvironment.java:93)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/java.awt.GraphicsEnvironment$LocalGE.<clinit>(GraphicsEnvironment.java:84)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:106)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/sun.awt.X11.XToolkit.<clinit>(XToolkit.java:224)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/sun.awt.PlatformGraphicsInfo.createToolkit(PlatformGraphicsInfo.java:40)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/java.awt.Toolkit.getDefaultToolkit(Toolkit.java:599)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/java.awt.Toolkit.getEventQueue(Toolkit.java:1493)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/java.awt.EventQueue.isDispatchThread(EventQueue.java:1087)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.SwingUtilities.isEventDispatchThread(SwingUtilities.java:1493)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.StyleContext.reclaim(StyleContext.java:473)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.StyleContext.addAttribute(StyleContext.java:330)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.StyleContext$NamedStyle.addAttribute(StyleContext.java:1558)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.StyleContext$NamedStyle.setName(StyleContext.java:1368)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.StyleContext$NamedStyle.<init>(StyleContext.java:1315)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.StyleContext.addStyle(StyleContext.java:125)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.StyleContext.<init>(StyleContext.java:105)
[2022-02-10T01:50:23.505Z]  [ERR] 	at java.desktop/javax.swing.text.DefaultStyledDocument.<init>(DefaultStyledDocument.java:109)
[2022-02-10T01:50:23.505Z]  [ERR] 	at org.openj9.test.libpath.Rtf.convert(Rtf.java:42)
[2022-02-10T01:50:23.505Z]  [ERR] 	at org.openj9.test.libpath.Rtf.main(Rtf.java:60)

which happens on the second run. Considering the method at the top of the stack is a native method, maybe there's something wrong with the relocation on power.

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 10, 2022

DirectToJNI is disabled on P as the query [1] is not overridden. All of my changes on P are related to the following methods:

loadAddressRAM32
loadAddressRAM
loadAddressJNI32
loadAddressJNI

These only run when generating directToJNI code.

I verified that the relocation isn't generated by running a unit test @jdmpapin has written. As such, the failing test can't be related to my changes.

FWIW:

; Live regs: GPR=0 FPR=0 CCR=0 VRF=0 VSX_SCALAR=0 VSX_VECTOR=0 {}
------------------------------
 n4n      (  0)  treetop                                                                              [0xa000000400044d0] bci=[-1,0,-] rc=0 vc=106 vn=- li=2 udi=- nc=1
 n3n      (  1)    call  Repro.nativeMethod()V[#381  native static Method] [flags 0x500 0x0 ]         [0xa00000040004480] bci=[-1,0,-] rc=1 vc=106 vn=- li=2 udi=- nc=0
------------------------------
------------------------------
 n4n      (  0)  treetop                                                                              [0xa000000400044d0] bci=[-1,0,-] rc=0 vc=106 vn=- li=2 udi=- nc=1
 n3n      (  0)    call  Repro.nativeMethod()V[#381  native static Method] [flags 0x500 0x0 ]         [0xa00000040004480] bci=[-1,0,-] rc=0 vc=106 vn=- li=2 udi=- nc=0
------------------------------

 [0xa0000004014b9f0]    0       bl      0x0000000000000000              ; Direct Call "Snippet Label L0049"
 PRE: [D_GPR_0032 : gr2] [D_GPR_0033 : gr11] [D_GPR_0034 : gr12] [D_GPR_0035 : gr0] [D_GPR_0036 : gr3] [D_GPR_0037 : gr4] [D_GPR_0038 : gr5] [D_GPR_0039 : gr6] [D_GPR_0040 : gr7] [D_GPR_0041 : gr8] [D_GPR_0042 : gr9] [D_GPR_0043 : gr10] [D_CCR_0044 : cr0]
POST: [D_GPR_0032 : gr2] [D_GPR_0033 : gr11] [D_GPR_0034 : gr12] [D_GPR_0035 : gr0] [D_GPR_0036 : gr3] [D_GPR_0037 : gr4] [D_GPR_0038 : gr5] [D_GPR_0039 : gr6] [D_GPR_0040 : gr7] [D_GPR_0041 : gr8] [D_GPR_0042 : gr9] [D_GPR_0043 : gr10] [D_CCR_0044 : cr0]

...

0x00000100128832A8 000000e4                                        Snippet Label L0049:         ; Unresolved Direct Call Snippet
0x100128832a8 000000e4                      481ff549               bl   0x0000010012A827F0              ; Through trampoline
0x100128832ac 000000e8                      00000100 12883200      .long        0x0000010012883200              ; Call Site RA
0x100128832b4 000000f0                      00000000 00000000      .long        0x0000000000000000              ; Method Pointer
0x100128832bc 000000f8                      00000000               .long        0x00000000              ; Lock Word For Compilation
0x100128832c0 000000fc                      00000006               .long        0x00000006              ; Offset | Flag | CP Index
0x100128832c4 00000100                      0a000100 08aba568      .long        0x0A00010008ABA568              ; Pointer To Constant Pool
0x100128832cc 00000108                      00000000               .long        0x00000000              ; Lock Word For Resolution

[1] https://github.com/eclipse/omr/blob/e6d7abfdac20e3c3e3325afbc855fbe7342ea5c2/compiler/codegen/OMRCodeGenerator.hpp#L1514

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 14, 2022

@jdmpapin do you mind merging this if everything looks good to you?

@jdmpapin
Copy link
Contributor

All observed test failures are unrelated just as @dsouzai described above. Re-launching tests for a coordinated merge (both to include jdk8,jdk11 and to ensure that the test results will be recent)

Jenkins test sanity all jdk8,jdk11,jdk17 depends eclipse/omr#6332

@jdmpapin jdmpapin merged commit d212ef8 into eclipse-openj9:master Feb 15, 2022
@pshipton
Copy link
Member

Should we be adding this fix to the Java 18 release?

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 17, 2022

We could. It's a very low likelihood issue in practice (which explains why it's only been reported now) but at the same time, it's 100% reproducible in Devin's unit test. What kind of changes are generally expected to be cherry-picked into a release? If it's any kind of bug fix then I guess this would fall into that category.

@pshipton
Copy link
Member

We want to deliver fixes that may impact users and are of low risk for breaking something else. If it's low risk we could just put it in, even if not sure of the impact on users.

@pshipton
Copy link
Member

Sometimes we want changes to stew for a while in the head stream before backporting, to ensure they aren't going to cause additional issues. If this is one of those things then given "very low likelihood in practice" we shouldn't backport if it needs to stew longer than it already has.

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 17, 2022

Ok I'll open a set of PRs for openj9-omr/openj9 0.31 shortly.

EDIT: This was in response to #14421 (comment)

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 17, 2022

Sometimes we want changes to stew for a while in the head stream before backporting, to ensure they aren't going to cause additional issues. If this is one of those things then given "very low likelihood in practice" we shouldn't backport if it needs to stew longer than it already has.

Sorry, I didn't seet his when I posted my earlier comment (and I just got the email notification for it for some reason).

I mean the PR involves changing the raw relocation data stored to the SCC for these directToJNI relocations, which had to be done across all platforms, so while the change is relatively straightforward, it does have a big scope. Additionally there is a workaround for the problem, namely -Xaot:disableDirectToJNIInline.

The combination of these two facts leads me to personally believe we don't need to double deliver. However, if others feel differently I have no problem opening the necessary PRs. Thoughts @mstoodle @DanHeidinga ?

Also I just realized I should have also changed the JITserver minor version -_-

dsouzai added a commit to dsouzai/openj9 that referenced this pull request Feb 17, 2022
eclipse-openj9#14421 changed the layout
of the relocation data. As such, the JITServer minor version has to be
updated.

Signed-off-by: Irwin D'Souza <[email protected]>
@mstoodle
Copy link
Contributor

Agree, we haven't seen it in practice until now so I think it can wait for 0.32 . Can/has Devin's test case be(en) added to our tests?

@dsouzai
Copy link
Contributor Author

dsouzai commented Feb 23, 2022

Can/has Devin's test case be(en) added to our tests?

In theory I suppose it can, but because it involves compiling (using a platforms specific native compiler) the native code as a library, it isn't as straightforward as a test that's purely java code. We'd need to get the help of the test team to best determine how to get this test added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants