Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core files found from cmdLineTester_GCRegressionTests #18048

Open
pshipton opened this issue Aug 30, 2023 · 7 comments
Open

core files found from cmdLineTester_GCRegressionTests #18048

pshipton opened this issue Aug 30, 2023 · 7 comments

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/372/ - ub20-390-2
cmdLineTester_GCRegressionTests_1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/372/functional_test_output.tar.gz

Testing: Ensure no core files have been produced by the preceding tests
Test start time: 2023/08/30 15:01:10 Coordinated Universal Time
Running command: sh -c ls
Time spent starting: 18 milliseconds
Time spent executing: 41 milliseconds
Test result: FAILED
Output from test:
 [OUT] core.20230830.145839.1792369.0001.dmp
 [OUT] core.20230830.145839.1792369.0002.dmp
 [OUT] foo.%001.log
 [OUT] foo.001.log
 [OUT] foo.002.log
 [OUT] foo.#.log
 [OUT] foo.log
 [OUT] foo.log.001
 [OUT] foo.log.002
 [OUT] foo.log.003
 [OUT] heapdump.20230830.145839.1792369.0003.phd
 [OUT] javacore.20230830.145839.1792369.0004.txt
 [OUT] Snap.20230830.145839.1792369.0005.trc
 [OUT] static_temp.txt
 [OUT] verbosegc.20230830.145454.1791981.txt
 [OUT] verbosegc.20230830.145532.1792214.txt
>> Success condition was found: [Output match: ]
>> Failure condition was found: [Output match: core.*]
@dmitripivkine
Copy link
Contributor

The reason cores were created is OOM:

1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received

from test:

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2023/08/30 14:58:23 Coordinated Universal Time
Running command: "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 94 milliseconds
Time spent executing: 98229 milliseconds
Test result: PASSED

@pshipton
Copy link
Member Author

pshipton commented Jul 11, 2024

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/264 - rh8-390-1
cmdLineTester_GCRegressionTests_1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/264/functional_test_output.tar.gz

20:31:35  Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
20:31:35  Test start time: 2024/07/10 20:31:32 Eastern Standard Time
20:31:35  Running command: "/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
20:31:35  Time spent starting: 4 milliseconds
20:32:50  Time spent executing: 66125 milliseconds
20:32:50  Test result: PASSED
20:32:50  
20:32:50  Testing: Ensure no core files have been produced by the preceding tests
20:32:50  Test start time: 2024/07/10 20:32:38 Eastern Standard Time
20:32:50  Running command: sh -c ls
20:32:50  Time spent starting: 6 milliseconds
20:32:50  Time spent executing: 15 milliseconds
20:32:50  Test result: FAILED
20:32:50  Output from test:
20:32:50   [OUT] core.20240710.202836.1630732.0001.dmp
20:32:50   [OUT] core.20240710.202836.1630732.0002.dmp
20:32:50   [OUT] foo.%001.log
20:32:50   [OUT] foo.001.log
20:32:50   [OUT] foo.002.log
20:32:50   [OUT] foo.#.log
20:32:50   [OUT] foo.log
20:32:50   [OUT] foo.log.001
20:32:50   [OUT] foo.log.002
20:32:50   [OUT] foo.log.003
20:32:50   [OUT] heapdump.20240710.202836.1630732.0003.phd
20:32:50   [OUT] javacore.20240710.202836.1630732.0004.txt
20:32:50   [OUT] Snap.20240710.202836.1630732.0005.trc
20:32:50   [OUT] static_temp.txt
20:32:50   [OUT] verbosegc.20240710.202725.1630495.txt
20:32:50   [OUT] verbosegc.20240710.202737.1630713.txt
20:32:50  >> Success condition was found: [Output match: ]
20:32:50  >> Failure condition was found: [Output match: core.*]
1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received

@amicic
Copy link
Contributor

amicic commented Jul 11, 2024

At the first glance, just genuinely OOM. Last Global GC expectedly did a compact, but still no help.

00:28:33.689208946  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:33.689618464  0x0 j9mm.474             Event       GlobalGC start: globalcount=479
00:28:33.722677114  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:33.722693688  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:33.723049303  0x0 j9mm.474             Event       GlobalGC start: globalcount=480
00:28:33.801943391  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:33.801966460  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.116965255  0x0 j9mm.474             Event       GlobalGC start: globalcount=482
00:28:35.802519107  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:35.802544021  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.803401100  0x0 j9mm.474             Event       GlobalGC start: globalcount=483
00:28:35.886767952  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:35.886797807  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.986958403  0x0 j9mm.474             Event       GlobalGC start: globalcount=485
00:28:36.068119797  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:36.068143276  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:36.073785786  0x0 j9mm.474             Event       GlobalGC start: globalcount=486
00:28:36.167755330 *0x3FF88002400 j9mm.90              Event       GlobalGC collect complete
00:28:36.167755385 *0x0 j9mm.90              Event       GlobalGC collect complete
00:28:36.167790932  0x3FF88002400 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608

But then, for Balanced GC, per design, we have to free up a whole region, to be able to proceed with allocation.

This (also from traces) indicates we might have some free memory

Tenure: 7707034 bytes, Free Tenure 0 bytes ,

(gdb) p 8388608 - 7707034
$1 = 681574

This is enough for one region (524288), but perhaps somewhat scattered/fragmented so actually not enough to free one. Parallel Compact is indeed non-perfect, and this can happened to some degree (can't tell exactly what the imperfection factor is).

This is another indication of large live set size (this might exclude leafs/primitive arrays, hence a bit smaller than the real one):

4 threads scanned  7506528 bytes.

Long story short, it could be that just the heap of 8M is occasionally insufficient.

Explicitely specified (both from console logs and from traces): -Xmx8m

@pshipton
Copy link
Member Author

pshipton commented Sep 3, 2024

https://openj9-jenkins.osuosl.org/job/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/60 - ub20-390-6

cmdLineTester_GCRegressionTests_1

23:54:03  Testing: Ensure no core files have been produced by the preceding tests
23:54:03  Test start time: 2024/09/03 03:53:56 Coordinated Universal Time
23:54:03  Running command: sh -c ls
23:54:03  Time spent starting: 35 milliseconds
23:54:03  Time spent executing: 54 milliseconds
23:54:03  Test result: FAILED
23:54:03  Output from test:
23:54:03   [OUT] core.20240903.035256.4069621.0001.dmp
23:54:03   [OUT] core.20240903.035256.4069621.0002.dmp
23:54:03   [OUT] foo.%001.log
23:54:03   [OUT] foo.001.log
23:54:03   [OUT] foo.002.log
23:54:03   [OUT] foo.#.log
23:54:03   [OUT] foo.log
23:54:03   [OUT] foo.log.001
23:54:03   [OUT] foo.log.002
23:54:03   [OUT] foo.log.003
23:54:03   [OUT] heapdump.20240903.035256.4069621.0003.phd
23:54:03   [OUT] javacore.20240903.035256.4069621.0004.txt
23:54:03   [OUT] Snap.20240903.035256.4069621.0005.trc
23:54:03   [OUT] static_temp.txt
23:54:03   [OUT] verbosegc.20240903.034815.4069165.txt
23:54:03   [OUT] verbosegc.20240903.034914.4069415.txt

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/60/functional_test_output.tar.gz

@dmitripivkine
Copy link
Contributor

I am confused with this one. Cores are thrown at OOM event:

1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received

by test with this command line:

1CICMDLINE     /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders -cp /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -

The console output of this test:

23:52:47  Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
23:52:47  Test start time: 2024/09/03 03:52:42 Coordinated Universal Time
23:52:47  Running command: "/home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
23:52:47  Time spent starting: 16 milliseconds
23:54:03  Time spent executing: 74461 milliseconds
23:54:03  Test result: PASSED

Test filters out filter=java/lang/OutOfMemoryError but files are generated regardless.
The thread produced OOM is "main":

03:52:56.379613292  0x3FF9816C200 j9mm.100             Event       J9AllocateObject() returning NULL! 64 bytes requested for object of class 0x3ff98241400 from memory space 'Flat' id=0x3ff980ecd00

!stack 0x3ff9816c200	!j9vmthread 0x3ff9816c200	!j9thread 0x3ff9801e160	tid 0x3e18f6 (4069622) // (main)

@pshipton How do you think it has failed?

@pshipton
Copy link
Member Author

pshipton commented Sep 4, 2024

I think the filter means you only get java/lang/OutOfMemoryError.
The java callstack

3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/ClassLoader.defineClassImpl(Native Method)
4XESTACKTRACE                at java/lang/ClassLoader.defineClassInternal(ClassLoader.java:470(Compiled Code))
4XESTACKTRACE                at java/lang/ClassLoader.defineClass(ClassLoader.java:431(Compiled Code))
4XESTACKTRACE                at java/lang/ClassLoader.defineClass(ClassLoader.java:383(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoader.newInstance(TestClassLoader.java:37(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoaderMain.classLoaderAllocate(TestClassLoaderMain.java:89(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoaderMain.test(TestClassLoaderMain.java:74(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoaderMain.main(TestClassLoaderMain.java:139)

The success criteria for the sub-test is Successful test run! or Cannot load library required by: -Xjit so presumably one of these was already printed out. I guess it must be the second one because I don't see how the first could be printed yet.

@pshipton
Copy link
Member Author

pshipton commented Sep 4, 2024

There is a way to always show the output from the subtest by running the cmdlinetester with -verbose, if we want to modify the test to do that. Seems this is a rare occurrence, not sure it's worth it, but it could help clear up the mystery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants