Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash vmState=0x0005ffff building xmac jdk22 #18321

Closed
pshipton opened this issue Oct 20, 2023 · 16 comments · Fixed by #18346
Closed

crash vmState=0x0005ffff building xmac jdk22 #18321

pshipton opened this issue Oct 20, 2023 · 16 comments · Fixed by #18346

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Build_JDKnext_x86-64_mac_OpenJDK/535

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDKnext_x86-64_mac_OpenJDK/535/Build_JDKnext_x86-64_mac_OpenJDK-535-20231020-140300-diagnostics.tar.gz

14:00:24  Compiling up to 4 files for BUILD_JIGSAW_TOOLS
14:00:34  Optimizing the exploded image
14:00:43  Unhandled exception
14:00:43  Type=Segmentation error vmState=0x0005ffff
14:00:43  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
14:00:43  Handler1=0000000009075FC0 Handler2=00000000093988B0 InaccessibleAddress=00007000ECE08511
14:00:43  RDI=0000700004E07D50 RSI=0000000000000006 RAX=00007FFF7289A66D RBX=0000000000000000
14:00:43  RCX=0000700004E07D48 RDX=00007000ECE08511 R8=0000700004E07D78 R9=0000700004E07DC0
14:00:43  R10=0000700004E07DB8 R11=0000700004E07DB0 R12=00007FFF7289A6F0 R13=0000000000000005
14:00:43  R14=0000700004E07DA8 R15=0000700004E07D50
14:00:43  RIP=00007FFF7289A66D GS=0000 FS=0000 RSP=0000700004E07840
14:00:43  RFlags=0000000000010203 CS=002B RBP=0000700004E078B0 ERR=ECE0851100000004
14:00:43  TRAPNO=000000040000000E CPU=8511000000040000 FAULTVADDR=00007000ECE08511
14:00:43  XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM8 0f0f0f0f0f0f0f0f (f: 252645136.000000, d: 3.815737e-236)
14:00:43  XMM9 0f0f0f0f0f0f0f0f (f: 252645136.000000, d: 3.815737e-236)
14:00:43  XMM10 0302020102010100 (f: 33620224.000000, d: 3.524484e-294)
14:00:43  XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:00:43  Module=/usr/lib/system/libunwind.dylib
14:00:43  Module_base_address=00007FFF72896000 Symbol=_ZN9libunwind22CompactUnwinder_x86_64INS_17LocalAddressSpaceEE32stepWithCompactEncodingFramelessEjyRS1_RNS_16Registers_x86_64Eb
14:00:43  Symbol_address=00007FFF7289A430
14:00:43  
14:00:43  Method_being_compiled=java/util/stream/ReferencePipeline$Head.forEach(Ljava/util/function/Consumer;)V
14:00:43  Target=2_90_20231020_535 (Mac OS X 10.15.7)
14:00:43  CPU=amd64 (12 logical CPUs) (0x400000000 RAM)
14:00:43  ----------- Stack Backtrace -----------
14:00:43  _ZN9libunwind22CompactUnwinder_x86_64INS_17LocalAddressSpaceEE32stepWithCompactEncodingFramelessEjyRS1_RNS_16Registers_x86_64Eb+0x23e (0x00007FFF7289A66E [libunwind.dylib+0x466e])
14:00:43  _ZN9libunwind12UnwindCursorINS_17LocalAddressSpaceENS_16Registers_x86_64EE4stepEv+0x68 (0x00007FFF728972D8 [libunwind.dylib+0x12d8])
14:00:43  _Unwind_RaiseException+0xa2 (0x00007FFF72896AAE [libunwind.dylib+0xaae])
14:00:43  __cxa_throw+0x69 (0x00007FFF6F992161 [libc++abi.dylib+0x12161])
14:00:43  _ZN3OMR11Compilation15failCompilationIN2J919AOTRelocationFailedEEEvPKcz+0xd8 (0x000000000B264178 [libj9jit29.dylib+0x64178])
14:00:43  _ZN2TR28CompilationInfoPerThreadBase14performAOTLoadEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodP11TR_J9VMBaseP8J9Method+0x314 (0x000000000B264044 [libj9jit29.dylib+0x64044])
14:00:43  ---------------------------------------
@pshipton pshipton added comp:jit test failure segfault Issues that describe segfaults / JVM crashes labels Oct 20, 2023
@pshipton
Copy link
Member Author

@hzongaro fyi
I'm restarting the job.

@pshipton
Copy link
Member Author

pshipton commented Oct 20, 2023

If it doesn't work on restart then it blocks accepting new jdk22 levels.

The restart also failed
https://openj9-jenkins.osuosl.org/job/Build_JDKnext_x86-64_mac_OpenJDK/537/

@pshipton pshipton added the jdk22 label Oct 20, 2023
@pshipton pshipton added this to the Java 22 milestone Oct 20, 2023
@pshipton
Copy link
Member Author

Failed in the next build as well so this seems a blocker.

https://openj9-jenkins.osuosl.org/job/Build_JDKnext_x86-64_mac_OpenJDK/536/

@hzongaro
Copy link
Member

I took a quick look. Irwin @dsouzai, I'm guessing that the JIT is expected to recover from the compilation failure for AOTRelocationFailed, right? Babneet @babsingh, is it possible this is some sort of port library issue?

@babsingh
Copy link
Contributor

Babneet @babsingh, is it possible this is some sort of port library issue?

I don't see a port library function in the native stack. A C++ exception is raised from failCompilation.

4:00:43  _ZN9libunwind22CompactUnwinder_x86_64INS_17LocalAddressSpaceEE32stepWithCompactEncodingFramelessEjyRS1_RNS_16Registers_x86_64Eb+0x23e (0x00007FFF7289A66E [libunwind.dylib+0x466e])
14:00:43  _ZN9libunwind12UnwindCursorINS_17LocalAddressSpaceENS_16Registers_x86_64EE4stepEv+0x68 (0x00007FFF728972D8 [libunwind.dylib+0x12d8])
14:00:43  _Unwind_RaiseException+0xa2 (0x00007FFF72896AAE [libunwind.dylib+0xaae])
14:00:43  __cxa_throw+0x69 (0x00007FFF6F992161 [libc++abi.dylib+0x12161])
14:00:43  _ZN3OMR11Compilation15failCompilationIN2J919AOTRelocationFailedEEEvPKcz+0xd8 (0x000000000B264178 [libj9jit29.dylib+0x64178])
14:00:43  _ZN2TR28CompilationInfoPerThreadBase14performAOTLoadEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodP11TR_J9VMBaseP8J9Method+0x314 (0x000000000B264044 [libj9jit29.dylib+0x64044])

@hzongaro
Copy link
Member

I don't see a port library function in the native stack. A C++ exception is raised from failCompilation.

Sorry about that - I'm outside of my area of understanding here. For some reason I thought the port library was somehow involved in dealing with exceptions.

@dsouzai
Copy link
Contributor

dsouzai commented Oct 23, 2023

Yeah the jit is expected to recover from any exception that's thrown using failCompilation. If an exception was not caught, we should see std::terminate on the stack; however, this seems like a crash during the actual unwinding, which doesn't look like is in our control.

Has the version of libunwind changed on mac?

@hzongaro
Copy link
Member

I had been wondering whether the problem could have somehow been related commit 457d8a3a5, but I can reproduce the crash without that change. I can also reproduce the successful build with the SHAs from https://openj9-jenkins.osuosl.org/job/Pipeline_Build_Test_JDKnext_x86-64_mac/605/, so I'm trying to narrow down what change could have triggered this, in case that provides some clue.

@dsouzai
Copy link
Contributor

dsouzai commented Oct 23, 2023

I've been trying to figure out if https://bugs.llvm.org/show_bug.cgi?id=20800 (llvm/llvm-project@c578567) applies to us (found this issue from this mailing list thread). The xcode version is 12.4 which based on this means clang-1200.0.32.29, but I can't figure out how to map a clang version to an llvm version.

@hzongaro
Copy link
Member

I've been trying to figure out if https://bugs.llvm.org/show_bug.cgi?id=20800 (llvm/llvm-project@c578567) applies to us (found this issue from this mailing list thread).

Thanks, Irwin @dsouzai. I attempted two internal x86-64_mac builds building the compiler:

Assuming that I'm setting the option in the right place (our builds are a real black box to me) using -fno-omit-frame-pointer seemed to work in at least one build attempt.

Peter @pshipton, I'm not sure how to proceed in verifying whether this really is an instance of the bug that Irwin identified, and whether we should go ahead with this as a work-around.

@pshipton
Copy link
Member Author

@keithc-ca any opinions? @hzongaro if you create a PR with hzongaro@0210b618899 it can be reviewed, PR tested, and merged.

@pshipton
Copy link
Member Author

Do you want to do any perf testing in advance, or release it and wait to see if regular perf testing picks up on the change? I expect it could be restricted to jdk22+ since we don't (yet) see the problem in earlier versions.

@keithc-ca
Copy link
Contributor

it could be restricted to jdk22+

I think that's a good suggestion.

@keithc-ca
Copy link
Contributor

it could be restricted to jdk22+

I think that's a good suggestion.

After thinking about it some more, I no longer think it should be version-specific. If we need -fno-omit-frame-pointer (in the JIT) to make things work properly for jdk22, I think the same will be true of other versions.

@hzongaro
Copy link
Member

If we need -fno-omit-frame-pointer (in the JIT) to make things work properly for jdk22, I think the same will be true of other versions.

That's quite true. How would you feel if I made the version-specific change, just to get the JDKNext acceptance build working, while we evaluate the performance impact?

@keithc-ca
Copy link
Contributor

Sure, that sounds like a reasonable plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants