Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX: Grinder target cmdLineTester_jvmtitests_5 fails because dependent shared libraries are not found #2264

Open
aixtools opened this issue Feb 11, 2021 · 18 comments
Assignees
Labels

Comments

@aixtools
Copy link
Contributor

Describe the bug
This test fails to load - and grinder reports unstable as result.

The java executable (or .jar code) defeat the normal AIX rtld behavior.

JVMJ9TI001E Agent library jvmtitest could not be opened (Could not load module .
System error: No such file or directory)
JVMJ9VM015W Initialization error for library j9jvmti29(-3): JVMJ9VM009E J9VMDllMain failed

Note: JAVA messages - with an AIX rtld message embedded - see below for a pure AIX example of rtld messaging

To Reproduce
Steps to reproduce the behavior (or Grinder rerun link):
The key command - from a Grinder job - with workspace saved:
"/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java" -Xgcpolicy:metronome -Xcompressedrefs -Xdump -agentlib:jvmtitest=test:rnwr001 -cp "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitest.jar" com.ibm.jvmti.tests.util.TestRunner

Expected behavior
That the test runs and finishes with status 0
Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
To assist with understanding the error message above:

  1. JVMJ9TI001E Agent library jvmtitest could not be opened: This bit is from java - what follows (need slave machines to set up build #2 and 3) is the AIX rtld messages - note: jvmitest is a .jar file
  2. (Could not load module .`: specifically - a module named '.' is not loading - this is probably the error - wrong argument in something.
  3. System error: No such file or directory): the error at OS level
  4. JVMJ9VM015W Initialization error for library j9jvmti29(-3): (java message)
  5. JVMJ9VM009E J9VMDllMain failed` - multiple messages: JVM messages - related to the shared library j9jvmti29 (libj9jvmti29.so). I am guessing the java routine J9VMDIMain (in C main() ?) does not load - because it could not load the Agent (.jar)

To further help reading: an example using a simple application (/usr/bin/python)

PYTHON has the following library dependency:

/usr/bin/python needs:
         /opt/freeware/lib/libpython2.7.so
         /opt/freeware/lib/libexpat.a(libexpat.so.1)
         /opt/freeware/lib/libffi.a(libffi.so.6)
         /opt/freeware/lib/libncurses.so
         /opt/freeware/lib/libsqlite3.so
         /opt/freeware/lib/libstdc++.a(libstdc++.so.6)
         /opt/freeware/lib/libgcc_s.a(shr.o)
         /usr/lib/libc.a(shr.o)
         /usr/lib/librtl.a(shr.o)
         /usr/lib/libdl.a(shr.o)
         /usr/lib/libpthreads.a(shr_comm.o)
         /usr/lib/libpthreads.a(shr_xpg5.o)
         /usr/lib/libpthread.a(shr_xpg5.o)
         /unix
         /usr/lib/libcrypt.a(shr.o)

The first library it needs is /opt/freeware/lib/libpython2.7.so. It's dependencies are:

/opt/freeware/lib/libpython2.7.so needs:
         /usr/lib/libdl.a(shr.o)
         /opt/freeware/lib/libexpat.a(libexpat.so.1)
         /opt/freeware/lib/libffi.a(libffi.so.6)
         /opt/freeware/lib/libncurses.so
         /opt/freeware/lib/libsqlite3.so
         /usr/lib/libpthreads.a(shr_comm.o)
         /usr/lib/libpthreads.a(shr_xpg5.o)
         /usr/lib/libc.a(shr.o)
         /usr/lib/librtl.a(shr.o)
         /opt/freeware/lib/libgcc_s.a(shr.o)
         /usr/lib/libpthread.a(shr_xpg5.o)
         /unix
         /usr/lib/libcrypt.a(shr.o)

To get a similar message as this test - rather than rename libpython2.7.so I rename one of it's dependencies - and then try to start python.

# mv          /opt/freeware/lib/libgcc_s.a          /opt/freeware/lib/libgcc_s.a.save
# python
Could not load program python:
        Dependent module libgcc_s.a(shr.o) could not be loaded.
Could not load module libgcc_s.a(shr.o).
System error: No such file or directory

Note: this message is known to IBM - but for a differnt case:

JVMJ9TI001E Agent library am_ibm_16 could not be opened (Could not load module .
System error: No such file or directory)
JVMJ9VM015W Initialization error for library j9jvmti29(-3): JVMJ9VM009E J9VMDllMain failed

My first thoughts is that it can be resolved by xml files - currently not available in the test directories:

3) The last place to check is the server.xml file itself.   Look at the genericJvmarguments line in the file and check the -Xbootclasspath/p:${ITCAMDCHOME}/toolkit/lib/bcm-bootstrap.jar line.
4) Lastly, check the variables.xml file.  In this file, specifically check the  AM_CONFIG_JVM_ARGS variable.  In the value section, check the -Xbootclasspath and the agentlib parms specified here.

Or, perhaps an additional argument - something like -Xbootclasspath/p:/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/jre/lib/ppc64/compressedrefs/ (I do not know java - so this may include a syntax error)

@karianna karianna added the bug label Feb 12, 2021
@smlambert
Copy link
Contributor

Thanks for the report @aixtools, I will look at how the cmdLineTester_jvmtitests_5 test is configured to see what our options are for changing test config.

@smlambert
Copy link
Contributor

smlambert commented Feb 24, 2021

Rerunning cmdLineTester_jvmtitests_5 in a Grinder to get a fresh set of results:
jdk8 in Grinder/7235 - fails as described above
jdk11 in Grinder/7236 - passes

@smlambert
Copy link
Contributor

Interesting, based on Grinders above, this test passed in the jdk11 run on test-osuosl-aix72-ppc64-1, but failed in the jdk8 run on same machine.

@aixtools
Copy link
Contributor Author

Yes - very interesting. I'll try as well - with save workspace - and run my traces/trcrpt to see what OS behavioral changes I can find.

@aixtools
Copy link
Contributor Author

aixtools commented Feb 26, 2021

Placeholders: https://ci.adoptopenjdk.net/job/Grinder/7290/ (jdk8) and https://ci.adoptopenjdk.net/job/Grinder/7291/ (jdk11)

Marked with 'Keep Build' so the logs don't disappear so quickly.

Note: remember to release when finished. Forever, is a very long time :)

@aixtools
Copy link
Contributor Author

I looked at your console outputs - at the surface level the commands seem the same - so what does changing jdk version change in terms of what actually get executed? Helps to know what/where to look.

@smlambert
Copy link
Contributor

Okay it actually looks like there is no native testimage package for jdk8 being found and downloaded via API (and therefore the message, Agent library jvmtitest could not be opened (Could not load module . System error: No such file or directory)

jdk11 case:

18:16:30  + ./openjdk-tests/get.sh -s /home/jenkins/workspace/Grinder -t /home/jenkins/workspace/Grinder/openjdk-tests -p ppc64_aix -r nightly -j 11 -i openj9 --openj9_repo https://github.com/eclipse/openj9.git --openj9_branch master --tkg_repo https://github.com/AdoptOpenJDK/TKG.git --tkg_branch master
18:16:30  TESTDIR: /home/jenkins/workspace/Grinder/openjdk-tests
18:16:30  get jdk binary...
18:16:30  _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks  https://api.adoptopenjdk.net/v3/binary/latest/11/ea/aix/ppc64/jdk/openj9/normal/adoptopenjdk
18:16:40  _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks  https://api.adoptopenjdk.net/v3/binary/latest/11/ea/aix/ppc64/testimage/openj9/normal/adoptopenjdk
18:16:41  Uncompressing file: OpenJDK11U-jdk_ppc64_aix_openj9_2021-02-17-04-54.tar.gz ...
18:16:53  Uncompressing file: OpenJDK11U-testimage_ppc64_aix_openj9_2021-02-16-18-06.tar.gz ...
18:16:55  Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
...

jdk8 case:

17:55:59  + ./openjdk-tests/get.sh -s /home/jenkins/workspace/Grinder -t /home/jenkins/workspace/Grinder/openjdk-tests -p ppc64_aix -r nightly -j 8 -i openj9 --openj9_repo https://github.com/eclipse/openj9.git --openj9_branch master --tkg_repo https://github.com/AdoptOpenJDK/TKG.git --tkg_branch master
17:55:59  TESTDIR: /home/jenkins/workspace/Grinder/openjdk-tests
17:55:59  get jdk binary...
17:55:59  _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks  https://api.adoptopenjdk.net/v3/binary/latest/8/ea/aix/ppc64/jdk/openj9/normal/adoptopenjdk
17:56:04  Uncompressing file: OpenJDK8U-jdk_ppc64_aix_openj9_2021-02-16-06-25.tar.gz ...
17:56:12  Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
...

@smlambert
Copy link
Contributor

We need to look and see why the native test image for jdk8 AIX is not found, is it being built?

From the build job (example: Build 980), looks like they are built.

Screen Shot 2021-02-26 at 11 38 43 AM

Next thing to check, are they being moved correctly to the releases repo?

@smlambert
Copy link
Contributor

testimages for jdk8 j9 aix are on the website and so are being moved to releases repo presumably
Screen Shot 2021-02-26 at 12 48 01 PM

Our test script will attempt to download the jdk binary, and the testimage and debug packages by default, next check, is this broken for this particular case? Or does the API not serve up the testimage package for some reason?

@aixtools
Copy link
Contributor Author

aixtools commented Mar 1, 2021

Well, this seems to explain why the test fails. A giant step forward.

@sophia-guo
Copy link
Contributor

Looks like originally testimage is not available for jdk8 https://github.com/AdoptOpenJDK/openjdk-tests/blob/master/get.sh#L225 #1661. Need to update to support testimage for jdk8.

@smlambert
Copy link
Contributor

#2320 is now merged (and fixes the originally reported issue). Verifying via this Grinder: https://ci.adoptopenjdk.net/job/Grinder/7422

It may be that there is a new different issue for this test target, but that should get raised separately.

@smlambert
Copy link
Contributor

We now have the the native test images, but we see that the test causes a crash based on output from Grinder/7422, not investigated why or if it has been seen at ci.eclipse.org/openj9 (@pshipton have you seen this before?):

----------- Stack Backtrace -----------
(0x09000000105B247C [libj9vm29.so+0x7d47c])
runCallInMethod+0x224 (0x09000000105A7608 [libj9vm29.so+0x72608])
gpProtectedRunCallInMethod__FPv+0x48 (0x0900000010596E4C [libj9vm29.so+0x61e4c])
signalProtectAndRunGlue+0x28 (0x090000001058E5AC [libj9vm29.so+0x595ac])
omrsig_protect+0x488 (0x090000000743DE4C [libj9prt29.so+0x59e4c])
gpProtectAndRun+0xf8 (0x090000001058E6FC [libj9vm29.so+0x596fc])
gpCheckCallin+0x114 (0x0900000010596DB8 [libj9vm29.so+0x61db8])
callStaticVoidMethod+0x48 (0x09000000105F708C [libj9vm29.so+0xc208c])
(0x0000010000003FD0 [java+0x3fd0])
_pthread_body+0xe8 (0x090000000059DFEC [libpthread.a+0x3fec])
---------------------------------------

@pshipton
Copy link
Contributor

pshipton commented Mar 4, 2021

I don't see any matches.

@JasonFengJ9
Copy link
Contributor

There is an internal workitem (RTC 145045: [aix_ppc-64_cmprssptrs] SE80_GIT J9vmTest_2 crash -Xlp16G Requested page size in GC logs matches with page size in -verbose:sizes output) , which is a regression being investigated but probably introduced a while ago just occurred more recently.

----------- Stack Backtrace -----------
(0x0900000079E2B07C [libj9vm29.so+0x6b07c])
runCallInMethod+0x238 (0x0900000079E21B3C [libj9vm29.so+0x61b3c])
gpProtectedRunCallInMethod__FPv+0x48 (0x0900000079E11F6C [libj9vm29.so+0x51f6c])
signalProtectAndRunGlue+0x28 (0x0900000079E0B9AC [libj9vm29.so+0x4b9ac])
(0x090000007A0A38A8 [libj9prt29.so+0x528a8])
gpProtectAndRun+0xf8 (0x0900000079E0BAFC [libj9vm29.so+0x4bafc])
gpCheckCallin+0x114 (0x0900000079E11ED8 [libj9vm29.so+0x51ed8])
callStaticObjectMethod+0x54 (0x0900000079E62FB8 [libj9vm29.so+0xa2fb8])
LoadMainClass+0x10c (0x0000010000005CB0 [java+0x5cb0])
JavaMain+0x454 (0x0000010000004718 [java+0x4718])
_pthread_body+0xf0 (0x0900000000515D34 [libpthreads.a+0x3d34])

Note: the internal test failure is at J9vmTest_2 while the failure in this issue is cmdLineTester_jvmtitests_5.

@JasonFengJ9
Copy link
Contributor

Actually I think they might be related

===============================================
Running test cmdLineTester_jvmtitests_5 ...
===============================================
cmdLineTester_jvmtitests_5 Start Time: Wed Mar  3 19:00:44 2021 Epoch Time (ms): 1614819644119
variation: Mode351
JVM_OPTIONS:  -Xgcpolicy:metronome -Xcompressedrefs 

This failure is -Xgcpolicy:metronome while the internal RTC 145045 investigation indicated that it only occurs in this GC policy (along with another corner condition) as well.

@karianna karianna added this to the March 2021 milestone Mar 4, 2021
@pshipton
Copy link
Contributor

pshipton commented Mar 4, 2021

The problem Jason refers to (145045) was introduced Feb 17th by eclipse-openj9/openj9#11305 and/or eclipse/omr#5652. It would be interesting to see the result of java -verbose:sizes on the machine where it occurs, as we've only seen the problem on a machine where 16G pages are listed as an available size, like follows:

  -Xlp:objectheap:pagesize=64K   large page size
                  available large page sizes:
                  4K
                  64K
                  16M
                  16G

@pshipton
Copy link
Contributor

pshipton commented Mar 4, 2021

145045 also only occurs when -Xlp16G is used, but perhaps this is a different manifestation of the same problem.

@zl-wang fyi

@smlambert smlambert modified the milestones: March 2021, April 2021 Apr 21, 2021
@smlambert smlambert removed this from the April 2021 milestone May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants