Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] AOT-llvm arm64 Performance runs crashing since 2024/09/30 #109220

Open
matouskozak opened this issue Oct 25, 2024 · 7 comments
Open

[mono] AOT-llvm arm64 Performance runs crashing since 2024/09/30 #109220

matouskozak opened this issue Oct 25, 2024 · 7 comments
Labels
arch-arm64 area-Codegen-LLVM-mono Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro) tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@matouskozak
Copy link
Member

matouskozak commented Oct 25, 2024

Build Information

Build: https://dev.azure.com/dnceng/internal/_build/results?buildId=2550314
Build error leg or test failing: Performance linux arm64 release mono AOT micro_mono perfampere
Runtime range: e690777...d024a5b
Performance range: dotnet/performance@b0debbd...7c3ed5d

Stack trace:

[2024/10/02 23:23:05][INFO] /__w/1/s/src/mono/mono/metadata/icall.c:1639:
[2024/10/02 23:23:05][INFO] /__w/1/s/src/mono/mono/metadata/icall.c:1639:
[2024/10/02 23:23:05][INFO] =================================================================
[2024/10/02 23:23:05][INFO] 	Native Crash Reporting
[2024/10/02 23:23:05][INFO] =================================================================
[2024/10/02 23:23:05][INFO] Got a SIGSEGV while executing native code. This usually indicates
[2024/10/02 23:23:05][INFO] a fatal error in the mono runtime or one of the native libraries
[2024/10/02 23:23:05][INFO] used by your application.
[2024/10/02 23:23:05][INFO] =================================================================
[2024/10/02 23:23:05][INFO] =================================================================
[2024/10/02 23:23:05][INFO] 	Native stacktrace:
[2024/10/02 23:23:05][INFO] =================================================================
[2024/10/02 23:23:05][INFO] 	0xf2d68307a290 - Unknown
[2024/10/02 23:23:05][INFO] 	0xf2d683023870 - Unknown
[2024/10/02 23:23:05][INFO] 	0xf2d682f8e084 - Unknown
[2024/10/02 23:23:05][INFO] 	0xf2d6841c48f8 - Unknown
[2024/10/02 23:23:05][INFO] 	0xf2d6831edc20 - Unknown
[2024/10/02 23:23:05][INFO] 	0xf2d6831825a0 - Unknown
[2024/10/02 23:23:05][INFO] 	0xf2d683196708 - Unknown
[2024/10/02 23:23:05][INFO] 	0xf2d674006ee0 - Unknown
[2024/10/02 23:23:05][INFO] 
[2024/10/02 23:23:05][INFO] =================================================================
[2024/10/02 23:23:05][INFO] 	External Debugger Dump:
[2024/10/02 23:23:05][INFO] =================================================================
[2024/10/02 23:23:05][INFO] [New LWP 59526]
[2024/10/02 23:23:05][INFO] [New LWP 59527]
[2024/10/02 23:23:05][INFO] [New LWP 59528]
[2024/10/02 23:23:09][INFO] [Thread debugging using libthread_db enabled]
[2024/10/02 23:23:09][INFO] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[2024/10/02 23:23:09][INFO] 0x0000f2d683d06800 in __GI___wait4 (pid=pid@entry=59529, stat_loc=stat_loc@entry=0xffffc50996f0, options=0, options@entry=-989227536, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
[2024/10/02 23:23:09][INFO] 30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
[2024/10/02 23:23:09][INFO]   Id   Target Id                                          Frame
[2024/10/02 23:23:09][INFO] * 1    Thread 0xf2d6841bc020 (LWP 59524) "Job-PLLYFY"     0x0000f2d683d06800 in __GI___wait4 (pid=pid@entry=59529, stat_loc=stat_loc@entry=0xffffc50996f0, options=0, options@entry=-989227536, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
[2024/10/02 23:23:09][INFO]   2    Thread 0xf2d681a0f0e0 (LWP 59526) "SGen worker"    __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0xf2d683b0ff38 <work_cond+40>) at ./nptl/futex-internal.c:57
[2024/10/02 23:23:09][INFO]   3    Thread 0xf2d67f00f0e0 (LWP 59527) ".NET EventPipe" 0x0000f2d683d2be38 in __GI___poll (fds=fds@entry=0xf2d678003710, nfds=nfds@entry=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:41
[2024/10/02 23:23:09][INFO]   4    Thread 0xf2d67ec0f0e0 (LWP 59528) "Finalizer"      __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0xf2d683b01598 <finalizer_sem>) at ./nptl/futex-internal.c:57
[2024/10/02 23:23:09][INFO] 
[2024/10/02 23:23:09][INFO] Thread 4 (Thread 0xf2d67ec0f0e0 (LWP 59528) "Finalizer"):
[2024/10/02 23:23:09][INFO] #0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0xf2d683b01598 <finalizer_sem>) at ./nptl/futex-internal.c:57
[2024/10/02 23:23:09][INFO] #1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0xf2d683b01598 <finalizer_sem>) at ./nptl/futex-internal.c:87
[2024/10/02 23:23:09][INFO] #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0xf2d683b01598 <finalizer_sem>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
[2024/10/02 23:23:09][INFO] #3  0x0000f2d683cd59a4 in do_futex_wait (sem=sem@entry=0xf2d683b01598 <finalizer_sem>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
[2024/10/02 23:23:09][INFO] #4  0x0000f2d683cd5a5c in __new_sem_wait_slow64 (sem=0xf2d683b01598 <finalizer_sem>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
[2024/10/02 23:23:09][INFO] #5  0x0000f2d683cd5b04 in __new_sem_wait (sem=<optimized out>) at ./nptl/sem_wait.c:42
[2024/10/02 23:23:09][INFO] #6  0x0000f2d68320b0c8 in mono_os_sem_wait (sem=0xf2d683b01598 <finalizer_sem>, flags=MONO_SEM_FLAGS_ALERTABLE) at /__w/1/s/src/mono/mono/metadata/../../mono/utils/mono-os-semaphore.h:204
[2024/10/02 23:23:09][INFO] #7  mono_coop_sem_wait (sem=0xf2d683b01598 <finalizer_sem>, flags=MONO_SEM_FLAGS_ALERTABLE) at /__w/1/s/src/mono/mono/metadata/../../mono/utils/mono-coop-semaphore.h:41
[2024/10/02 23:23:09][INFO] #8  finalizer_thread (unused=<optimized out>) at /__w/1/s/src/mono/mono/metadata/gc.c:868
[2024/10/02 23:23:09][INFO] #9  0x0000f2d6831e4388 in start_wrapper_internal (start_info=0x0, stack_ptr=<optimized out>) at /__w/1/s/src/mono/mono/metadata/threads.c:1208
[2024/10/02 23:23:09][INFO] #10 start_wrapper (data=<optimized out>) at /__w/1/s/src/mono/mono/metadata/threads.c:1276
[2024/10/02 23:23:09][INFO] #11 0x0000f2d683ccd5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
[2024/10/02 23:23:09][INFO] #12 0x0000f2d683d35edc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
[2024/10/02 23:23:09][INFO] 
[2024/10/02 23:23:09][INFO] Thread 3 (Thread 0xf2d67f00f0e0 (LWP 59527) ".NET EventPipe"):
[2024/10/02 23:23:09][INFO] #0  0x0000f2d683d2be38 in __GI___poll (fds=fds@entry=0xf2d678003710, nfds=nfds@entry=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:41
[2024/10/02 23:23:09][INFO] #1  0x0000f2d68311bcc4 in ipc_poll_fds (fds=0xf2d678003710, nfds=1, timeout=4294967295) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:470
[2024/10/02 23:23:09][INFO] #2  ds_ipc_poll (poll_handles_data=0xf2d678003500, poll_handles_data_len=1, timeout_ms=<optimized out>, timeout_ms@entry=4294967295, callback=callback@entry=0xf2d683121f34 <server_warning_callback>) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:1105
[2024/10/02 23:23:09][INFO] #3  0x0000f2d68311b8f4 in ds_ipc_stream_factory_get_next_available_stream (callback=callback@entry=0xf2d683121f34 <server_warning_callback>) at /__w/1/s/src/native/eventpipe/ds-ipc.c:393
[2024/10/02 23:23:09][INFO] #4  0x0000f2d68311f760 in server_thread (data=<optimized out>) at /__w/1/s/src/native/eventpipe/ds-server.c:129
[2024/10/02 23:23:09][INFO] #5  0x0000f2d683121f14 in ep_rt_thread_mono_start_func (data=0xca5c79942c80) at /__w/1/s/src/mono/mono/mini/../../mono/eventpipe/ep-rt-mono.h:878
[2024/10/02 23:23:09][INFO] #6  0x0000f2d683ccd5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
[2024/10/02 23:23:09][INFO] #7  0x0000f2d683d35edc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
[2024/10/02 23:23:09][INFO] 
[2024/10/02 23:23:09][INFO] Thread 2 (Thread 0xf2d681a0f0e0 (LWP 59526) "SGen worker"):
[2024/10/02 23:23:09][INFO] #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0xf2d683b0ff38 <work_cond+40>) at ./nptl/futex-internal.c:57
[2024/10/02 23:23:09][INFO] #1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0xf2d683b0ff38 <work_cond+40>) at ./nptl/futex-internal.c:87
[2024/10/02 23:23:09][INFO] #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0xf2d683b0ff38 <work_cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
[2024/10/02 23:23:09][INFO] #3  0x0000f2d683ccc8fc in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xf2d683b0fee0 <lock>, cond=0xf2d683b0ff10 <work_cond>) at ./nptl/pthread_cond_wait.c:503
[2024/10/02 23:23:09][INFO] #4  ___pthread_cond_wait (cond=cond@entry=0xf2d683b0ff10 <work_cond>, mutex=mutex@entry=0xf2d683b0fee0 <lock>) at ./nptl/pthread_cond_wait.c:627
[2024/10/02 23:23:09][INFO] #5  0x0000f2d683271d04 in mono_os_cond_wait (mutex=0xf2d683b0fee0 <lock>, cond=<optimized out>) at /__w/1/s/src/mono/mono/sgen/../../mono/utils/mono-os-mutex.h:219
[2024/10/02 23:23:09][INFO] #6  get_work (worker_index=0, work_context=<optimized out>, do_idle=<optimized out>, job=<optimized out>) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:164
[2024/10/02 23:23:09][INFO] #7  thread_func (data=0x0) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:195
[2024/10/02 23:23:09][INFO] #8  0x0000f2d683ccd5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
[2024/10/02 23:23:09][INFO] #9  0x0000f2d683d35edc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
[2024/10/02 23:23:09][INFO] 
[2024/10/02 23:23:09][INFO] Thread 1 (Thread 0xf2d6841bc020 (LWP 59524) "Job-PLLYFY"):
[2024/10/02 23:23:09][INFO] #0  0x0000f2d683d06800 in __GI___wait4 (pid=pid@entry=59529, stat_loc=stat_loc@entry=0xffffc50996f0, options=0, options@entry=-989227536, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
[2024/10/02 23:23:09][INFO] #1  0x0000f2d683d06758 in __GI___waitpid (pid=pid@entry=59529, stat_loc=stat_loc@entry=0xffffc50996f0, options=options@entry=-989227536) at ./posix/waitpid.c:38
[2024/10/02 23:23:09][INFO] #2  0x0000f2d68307a3b8 in dump_native_stacktrace (signal=<optimized out>, mctx=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:846
[2024/10/02 23:23:09][INFO] #3  mono_dump_native_crash_info (signal=<optimized out>, mctx=mctx@entry=0xffffc509a150, info=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:868
[2024/10/02 23:23:09][INFO] #4  0x0000f2d683023870 in mono_handle_native_crash (signal=0xf2d682ce1c9f "SIGSEGV", mctx=0xffffc509a150, info=0xffffc509a4b0) at /__w/1/s/src/mono/mono/mini/mini-exceptions.c:2967
[2024/10/02 23:23:09][INFO] #5  0x0000f2d682f8e084 in mono_sigsegv_signal_handler_debug (_dummy=11, _info=0xffffc509a4b0, context=0xffffc509a530, debug_fault_addr=0x838) at /__w/1/s/src/mono/mono/mini/mini-runtime.c:3908
[2024/10/02 23:23:09][INFO] #6  <signal handler called>
[2024/10/02 23:23:09][INFO] #7  mono_reflection_get_type_with_rootimage (alc=0xca5c798b0e10, rootimage=0x0, image=0x0, info=0xffffc509b780, ignorecase=0, search_mscorlib=<optimized out>, type_resolve=<optimized out>, error=0xffffc509b8a0) at /__w/1/s/src/mono/mono/metadata/reflection.c:2348
[2024/10/02 23:23:09][INFO] #8  0x0000f2d6831825a0 in type_from_parsed_name (info=0xffffc509b780, ignoreCase=0 '\000', error=0xffffc509b8a0, stack_mark=<optimized out>, caller_assembly=<optimized out>) at /__w/1/s/src/mono/mono/metadata/icall.c:1668
[2024/10/02 23:23:09][INFO] #9  ves_icall_System_RuntimeTypeHandle_internal_from_name (name=name@entry=0xca5c7aee7f40 "Mono", stack_mark=<optimized out>, res=res@entry=0xffffc509ba00, throwOnError=throwOnError@entry=0 '\000', ignoreCase=ignoreCase@entry=0 '\000', error=error@entry=0xffffc509b8a0) at /__w/1/s/src/mono/mono/metadata/icall.c:1703
[2024/10/02 23:23:09][INFO] #10 0x0000f2d683196708 in ves_icall_System_RuntimeTypeHandle_internal_from_name_raw (a0=0xca5c7aee7f40 "Mono", a1=<optimized out>, a2=0xffffc509ba00, a3=0 '\000', a4=0 '\000') at /__w/1/s/src/mono/mono/metadata/icall-def.h:536
[2024/10/02 23:23:09][INFO] #11 0x0000f2d674006ee0 in ?? ()
[2024/10/02 23:23:09][INFO] #12 0x0000ffffc509baac in ?? ()
[2024/10/02 23:23:09][INFO] Backtrace stopped: previous frame inner to this frame (corrupt stack?)
[2024/10/02 23:23:09][INFO] [Inferior 1 (process 59524) detached]
[2024/10/02 23:23:09][INFO] =================================================================
[2024/10/02 23:23:09][INFO] 	Basic Fault Address Reporting
[2024/10/02 23:23:09][INFO] =================================================================
[2024/10/02 23:23:09][INFO] Memory around native instruction pointer (0xf2d6831edc20):0xf2d6831edc10  fd e7 ff 97 88 02 40 79 e8 00 00 35 e1 03 00 aa  [email protected]....
[2024/10/02 23:23:09][INFO] 0xf2d6831edc20  40 1f 44 f9 e2 03 14 aa e6 ab fd 97 88 02 40 79  @.D...........@y
[2024/10/02 23:23:09][INFO] 0xf2d6831edc30  e8 00 00 34 f5 03 1f aa e0 03 1b aa 21 00 80 52  ...4........!..R
[2024/10/02 23:23:09][INFO] 0xf2d6831edc40  79 28 fd 97 93 f8 ff b5 c1 ff ff 17 20 03 00 b4  y(.......... ...
[2024/10/02 23:23:09][INFO] =================================================================
[2024/10/02 23:23:09][INFO] 	Managed Stacktrace:
[2024/10/02 23:23:09][INFO] =================================================================
[2024/10/02 23:23:09][INFO] 	  at <unknown> <0xffffffff>
[2024/10/02 23:23:09][INFO] 	  at System.RuntimeTypeHandle:internal_from_name <0x00007>
[2024/10/02 23:23:09][INFO] 	  at System.RuntimeTypeHandle:GetTypeByName <0x000bb>
[2024/10/02 23:23:09][INFO] 	  at <unknown> <0xffffffff>
[2024/10/02 23:23:09][INFO] 	  at System.Reflection.RuntimeMethodInfo:InternalInvoke <0x00007>
[2024/10/02 23:23:09][INFO] 	  at System.Reflection.MethodBaseInvoker:InterpretedInvoke_Method <0x00053>
[2024/10/02 23:23:09][INFO] =================================================================

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "at System.RuntimeTypeHandle:internal_from_name",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng/internal/_build/results?buildId=2550314
Error message validated: [at System.RuntimeTypeHandle:internal_from_name]
Result validation: ❌ Known issue did not match with the provided build.
Validation performed at: 10/25/2024 12:22:41 PM UTC

Report

Build Definition Test Pull Request
2572662 dotnet-runtime arm64.micro_mono.net10.0.Partition14.WorkItemExecution
2572384 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2572103 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2571753 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2571528 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2571015 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2570647 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2570370 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2570098 dotnet-runtime arm64.micro_mono.net10.0.Partition14.WorkItemExecution
2569917 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2569567 dotnet-runtime arm64.micro_mono.net10.0.Partition14.WorkItemExecution
2569378 dotnet-runtime arm64.micro_mono.net10.0.Partition9.WorkItemExecution
2569152 dotnet-runtime arm64.micro_mono.net10.0.Partition14.WorkItemExecution
2568776 dotnet-runtime arm64.micro_mono.net10.0.Partition14.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 14
@matouskozak matouskozak added arch-arm64 Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro) untriaged New issue has not been triaged by the area owner labels Oct 25, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 25, 2024
@matouskozak
Copy link
Member Author

@radekdoulik Do you think #107462 could have caused these crashes?

@matouskozak matouskozak added the tenet-performance Performance related issue label Oct 25, 2024
Copy link
Contributor

Tagging subscribers to this area: @lambdageek, @steveisok
See info in area-owners.md if you want to be subscribed.

@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 28, 2024
@lambdageek
Copy link
Member

Apparently this warning emits something:

g_warning (G_STRLOC);

So we're failing to get the calling assembly name from the stack crawl mark or from the LMF. As a result we pass null for both rootimage and image to mono_reflection_get_type_checked which ends up crashing.

Is there something different about stack walks on this lane?

@lambdageek
Copy link
Member

@radekdoulik might be related to the LLVM bump?

@lambdageek
Copy link
Member

Also weird we're looking for a type called ""Mono"

@radekdoulik
Copy link
Member

@radekdoulik might be related to the LLVM bump?

if the range is correct, then it shouldn't be the bump.

let see if using sync unwind tables on all arm64 targets will help #109406. I will take closer look if it doesn't help.

@matouskozak IIRC you mentioned that you will be modifying CI to exercise this path. do you have PR around for that?

@matouskozak
Copy link
Member Author

@radekdoulik might be related to the LLVM bump?

if the range is correct, then it shouldn't be the bump.

let see if using sync unwind tables on all arm64 targets will help #109406. I will take closer look if it doesn't help.

@matouskozak IIRC you mentioned that you will be modifying CI to exercise this path. do you have PR around for that?

The arm64 AOT/fullAOT CI is broken currently, we're trying to get it running but I don't have ETA yet.

radekdoulik added a commit that referenced this issue Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-Codegen-LLVM-mono Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro) tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants