Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Linux/ARM bring-up (4/n) #97269

Merged
merged 16 commits into from
Jan 25, 2024
Merged

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented Jan 21, 2024

Contributes to dotnet/runtimelab#833

Implements thread hijacking, fixes tests with recursive generics.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 21, 2024
@ghost
Copy link

ghost commented Jan 21, 2024

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Implement large part of the thread hijacking, fixes tests with recursive generics.

Author: filipnavara
Assignees: -
Labels:

area-NativeAOT-coreclr

Milestone: -

@filipnavara filipnavara marked this pull request as ready for review January 22, 2024 20:58
@filipnavara
Copy link
Member Author

filipnavara commented Jan 22, 2024

I managed to get a stack trace from the last failing smoke test:

Process terminated. Assertion failed.
   at System.Diagnostics.DebugProvider.Fail(String, String) + 0x38
   at System.Diagnostics.Debug.Fail(String, String) + 0x56
   at System.Diagnostics.Debug.Assert(Boolean, String, String) + 0x32
   at System.Diagnostics.Debug.Assert(Boolean) + 0x26
   at System.Runtime.TypeCast.IsInstanceOfClass(MethodTable*, Object) + 0x118
   at System.Runtime.TypeCast.IsInstanceOfAny_NoCacheLookup(MethodTable*, Object) + 0xf0
   at System.Runtime.TypeCast.StelemRef_Helper_NoCacheLookup(Object&, MethodTable*, Object) + 0x22
   at System.Runtime.TypeCast.StelemRef_Helper(Object&, MethodTable*, Object) + 0x68
   at System.Runtime.TypeCast.StelemRef(Array, IntPtr, Object) + 0x100
   at System.Collections.Generic.LowLevelList`1.Add(T) + 0x5c
   at System.Reflection.Runtime.General.NamespaceChain..ctor(MetadataReader, NamespaceDefinitionHandle) + 0xd8
   at System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo.get_NamespaceChain() + 0x130
   at System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo.get_Namespace() + 0x1e
   at System.Reflection.Runtime.TypeInfos.RuntimeNamedTypeInfo.get_FullName() + 0x12c
   at System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo.ToString() + 0x25c
   at System.Reflection.Runtime.TypeInfos.RuntimeTypeInfo.EstablishDebugName() + 0x5a
   at System.Reflection.Runtime.TypeInfos.NativeFormat.NativeFormatRuntimeNamedTypeInfo.GetRuntimeNamedTypeInfo(MetadataReader, TypeDefinitionHandle, RuntimeTypeHandle) + 0xe2
   at System.Reflection.Runtime.General.TypeUnifier.GetNamedType(TypeDefinitionHandle, MetadataReader, RuntimeTypeHandle) + 0x20
   at System.Reflection.Runtime.General.TypeUnifier.GetNamedType(TypeDefinitionHandle, MetadataReader) + 0x24
   at System.Reflection.Runtime.General.TypeResolver.ResolveTypeDefinition(TypeDefinitionHandle, MetadataReader) + 0x1e
   at System.Reflection.Runtime.General.TypeResolver.TryResolve(Handle, MetadataReader, TypeContext, Exception&) + 0x60
   at System.Reflection.Runtime.General.QSignatureTypeHandle.TryResolve(TypeContext, Exception&) + 0x7e
   at System.Reflection.Runtime.General.QSignatureTypeHandle.Resolve(TypeContext) + 0x2a
   at System.Reflection.Runtime.ParameterInfos.RuntimeMethodParameterInfo.get_ParameterType() + 0x52
   at System.Reflection.Runtime.MethodInfos.RuntimeMethodInfo.get_ReturnType() + 0x40
   at System.Reflection.DynamicInvokeInfo..ctor(MethodBase, IntPtr) + 0x33a
   at Internal.Reflection.Execution.MethodInvokeInfo..ctor(MethodBase, IntPtr) + 0x1a
   at Internal.Reflection.Execution.ExecutionEnvironmentImplementation.TryGetMethodInvokeInfo(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[], MethodBase, MethodSignatureComparer&, CanonicalFormKind) + 0x12c
   at Internal.Reflection.Execution.ExecutionEnvironmentImplementation.TryGetMethodInvoker(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[]) + 0x96
   at Internal.Reflection.Core.Execution.ExecutionEnvironment.GetMethodInvoker(RuntimeTypeInfo, QMethodDefinition, RuntimeTypeInfo[], MemberInfo, Exception&) + 0x19c
   at System.Reflection.Runtime.MethodInfos.NativeFormat.NativeFormatMethodCommon.GetUncachedMethodInvoker(RuntimeTypeInfo[], MemberInfo, Exception&) + 0x6e
   at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo`1.GetUncachedMethodInvoker(RuntimeTypeInfo[], MemberInfo) + 0x30
   at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo`1.get_UncachedMethodInvoker() + 0x74
   at System.Reflection.Runtime.MethodInfos.RuntimeMethodInfo.get_MethodInvoker() + 0x4a
   at System.Reflection.Runtime.MethodInfos.RuntimeMethodInfo.Invoke(Object, BindingFlags, Binder, Object[], CultureInfo) + 0x28
   at System.Reflection.MethodBase.Invoke(Object, Object[]) + 0x3a
   at ThreadLocalStatics.TLSTesting.MakeType2(Type, Type, Boolean) + 0xa36
   at ThreadLocalStatics.TLSTesting.<>c__DisplayClass3_0.<MultiThreaded_Test>b__0() + 0x50
   at System.Threading.Tasks.Task.InnerInvoke() + 0x66
   at System.Threading.Tasks.Task.<>c.<.cctor>b__281_0(Object obj) + 0x4c
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x98
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x226
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread) + 0x72
   at System.Threading.Tasks.Task.ExecuteFromThreadPool(Thread) + 0x14
   at System.Threading.ThreadPoolWorkQueue.DispatchWorkItem(Object, Thread) + 0x54
   at System.Threading.ThreadPoolWorkQueue.Dispatch() + 0x3c8
   at System.Threading.PortableThreadPool.WorkerThread.WorkerDoWork(PortableThreadPool, Boolean&) + 0x3c
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() + 0x23c
   at DynamicGenerics!<BaseAddress>+0x80aa6d
   at System.Threading.Thread.StartHelper.RunWorker() + 0x5e
   at System.Threading.Thread.StartHelper.Run() + 0x90
   at System.Threading.Thread.StartThread(IntPtr) + 0xc8
   at System.Threading.Thread.ThreadEntryPoint(IntPtr) + 0x1a

It is flaky though, and it may be failing with other errors at different times. I saw this assert multiple times though, with the IsInstanceOfClass called with parameterized MethodTable message (IIRC).

@filipnavara
Copy link
Member Author

filipnavara commented Jan 22, 2024

Just had another failed test run, this time two tests failed:

--------------------------------
Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
--------------------------------------------------
Debug Assertion Violation

Expression: 'pvRetAddr != NULL'

File: /home/navara/runtime/src/coreclr/nativeaot/Runtime/thread.cpp, Line: 813
--------------------------------------------------
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/home/navara/runtime/artifacts/tests/coreclr/linux.arm.Debug/Tests/Core_Root
> /home/navara/runtime/artifacts/tests/coreclr/linux.arm.Debug/nativeaot/SmokeTests/DynamicGenerics/DynamicGenerics/DynamicGenerics.sh


Failed test: nativeaot_SmokeTests::_UnitTests_UnitTests_UnitTests_sh (nativeaot/SmokeTests/UnitTests/UnitTests/UnitTests.sh) (nativeaot.SmokeTests.XUnitWrapper.dll)
Process terminated. Assertion failed.
   at System.Diagnostics.DebugProvider.Fail(String, String) + 0x38
   at System.Diagnostics.Debug.Fail(String, String) + 0x56
   at System.Diagnostics.Debug.Assert(Boolean, String, String) + 0x32
   at System.Diagnostics.Debug.Assert(Boolean) + 0x26
   at System.Threading.ObjectHeader.SetSyncEntryIndex(Int32*, Int32) + 0x2c
   at System.Threading.SyncTable.AssignEntry(Object, Int32*) + 0x250
   at System.Threading.ObjectHeader.GetSyncIndex(Object) + 0x68
   at System.Threading.ObjectHeader.GetLockObject(Object) + 0x1a
   at System.Threading.Monitor.<>c.<.cctor>b__22_0(Object o) + 0x1a
   at System.Runtime.CompilerServices.ConditionalWeakTable`2.GetValueLocked(TKey, ConditionalWeakTable`2.CreateValueCallback) + 0x36
   at System.Runtime.CompilerServices.ConditionalWeakTable`2.GetValue(TKey, ConditionalWeakTable`2.CreateValueCallback) + 0x4a
   at System.Threading.Monitor.GetCondition(Object) + 0xb8
   at System.Threading.Monitor.Wait(Object, Int32) + 0x1a
   at System.Threading.ManualResetEventSlim.Wait(Int32, CancellationToken) + 0x282
   at System.Threading.ManualResetEventSlim.Wait() + 0x2e
   at ThreadTest.<>c__DisplayClass10_0.<TestConcurrentIsBackgroundProperty>b__0() + 0x4a
   at System.Threading.Thread.StartHelper.RunWorker() + 0x5e
   at System.Threading.Thread.StartHelper.Run() + 0x90
   at System.Threading.Thread.StartThread(IntPtr) + 0xc8
   at System.Threading.Thread.ThreadEntryPoint(IntPtr) + 0x1a

I have seen this before. The pvRetAddr != NULL error happens if the unwinding in the hijack callback doesn't detect the epilog or prolog. I thought I got rid of it by clearing the thumb bit in the IP which mistakenly made the prolog/epilog detection check unaligned instruction stream. In the last commit I replaced it with ASSERT because I was convinced that I fixed the root cause (PUSH_PROBE_FRAME pushes the lr value into the frame which the StackFrameIterator then treats as IP, so the bit has to be cleared there).

@VSadov
Copy link
Member

VSadov commented Jan 22, 2024

The failures look like GC holes. It can be because of hijacking/stackwalking, or because of something else. I am not sure the write barriers are up to date, for example.
It could be useful to try running this in conservative mode. If this is caused by write barriers (I think it is likely), the failures would still occur.

@VSadov
Copy link
Member

VSadov commented Jan 22, 2024

For the write barriers, you may want to try disabling the following - at least to see if that accounts for all the remaining issues.

  add_definitions(-DFEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP)
  add_definitions(-DFEATURE_MANUALLY_MANAGED_CARD_BUNDLES)

It looks like the ARM32 write barriers do not have support for that. I am not sure if that works on 32bit at all.

@filipnavara
Copy link
Member Author

Thanks for the debugging tips. I didn't do enough runs to verify or confirm it, but returning false from GetReturnAddressHijackInfo made the tests pass several times. That said, I think this PR is improvement over the status quo and I would continue the work/investigation in a next one.

(I intend to fix the DWARF info not to contain the Thumb bit in the addresses and align correctly with the instruction boundaries, but it requires fixing up the managed stack trace logic to match it.)

@filipnavara
Copy link
Member Author

filipnavara commented Jan 23, 2024

I finally managed to catch the DynamicGenerics crash in the debugger and it's actually quite fun one.

Let's start with the back trace:

* thread #11, name = 'DynamicGenerics', stop reason = signal SIGSEGV: invalid address (fault address: 0xc100ae9c)
  * frame #0: 0xc100ae9c
    frame #1: 0x00ad0c1e DynamicGenerics`S_P_TypeLoader_Internal_TypeSystem_TypeDescExtensions__IsWellKnownType(type=0xb3e641a0, wellKnownType=String) at TypeSystemExtensions.cs:27
    frame #2: 0x00aca65a DynamicGenerics`S_P_TypeLoader_Internal_TypeSystem_TypeDesc__get_IsString(this=0xb3e641a0) at TypeDesc.cs:273

The instruction on frame 1 is blx r3. So, let's have a look at the registers:

General Purpose Registers:
        r0 = 0xb3e9fe5c
        r1 = 0x00000014
        r2 = 0x00000000
        r3 = 0x0082d1a1  DynamicGenerics`__VirtualCall_S_P_TypeLoader_Internal_TypeSystem_TypeSystemContext__GetWellKnownType + 1
        r4 = 0x00000000
        r5 = 0x00000000
        r6 = 0xad0fbe08
        r7 = 0x000000af
        r8 = 0xac8fd000
        r9 = 0xb0ade78f
       r10 = 0x00000000
       r11 = 0xad0fbd08
       r12 = 0xc100ae9c
        sp = 0xad0fbcd8
        lr = 0x00ad0c1f  DynamicGenerics`S_P_TypeLoader_Internal_TypeSystem_TypeDescExtensions__IsWellKnownType + 78 at TypeSystemExtensions.cs:27
        pc = 0xc100ae9c
      cpsr = 0x60000010

$r3 is still valid and pointing to __VirtualCall_S_P_TypeLoader_Internal_TypeSystem_TypeSystemContext__GetWellKnownType, so presumably something happened inside the __VirtualCall method. That's quite a big clue since that method has no prolog/epilog.

The faulting address is 0xc100ae9c and it's in $r12. That's not surprising either, since the VirtualCall helper uses $r12 as the scratch register to compute the jump. Curiously, the address looks like shifted by one byte from something that looks like a valid destination of the virtual call:

(lldb) image lookup -va 0x00ae9c32
      Address: DynamicGenerics[0x006e9c32] (DynamicGenerics.PT_LOAD[1].__managedcode + 2232674)
      Summary: DynamicGenerics`S_P_TypeLoader_Internal_Runtime_TypeLoader_TypeLoaderTypeSystemContext__GetWellKnownType + 1 at TypeLoaderTypeSystemContext.cs:52

There's one more weird thing. The CPU seems not to be in the Thumb mode (could be a debugger quirk, though):

cpsr = 0x60000010

...and, for posterity, the thread did receive the activation signal for hijack just before the crash.

@filipnavara
Copy link
Member Author

I captured few more of the crashes. Seems like they don't happen with DOTNET_gcConservative=1. There are already trash values on a stack if I follow the code back a bit... so that goes along with the GC hole theory.

@filipnavara
Copy link
Member Author

filipnavara commented Jan 23, 2024

Multiple runs on qemu-arm-user and Raspberry Pi with DOTNET_gcConservative=1:

QEMU:

Time [secs] | Total | Passed | Failed | Skipped | Assembly Execution Summary
============================================================================
      0.197 |     1 |      1 |      0 |       0 | nativeaot.CustomMain.XUnitWrapper.dll
      0.228 |     1 |      1 |      0 |       0 | nativeaot.GenerateUnmanagedEntryPoints.XUnitWrapper.dll
     43.782 |    14 |     14 |      0 |       0 | nativeaot.SmokeTests.XUnitWrapper.dll
      0.200 |     1 |      1 |      0 |       0 | nativeaot.StartupHook.XUnitWrapper.dll
----------------------------------------------------------------------------
     44.407 |    17 |     17 |      0 |       0 | (total)

Raspberry Pi 4:

Time [secs] | Total | Passed | Failed | Skipped | Assembly Execution Summary
============================================================================
      0.535 |     1 |      1 |      0 |       0 | nativeaot.CustomMain.XUnitWrapper.dll
      0.191 |     1 |      1 |      0 |       0 | nativeaot.GenerateUnmanagedEntryPoints.XUnitWrapper.dll
     56.253 |    14 |     14 |      0 |       0 | nativeaot.SmokeTests.XUnitWrapper.dll
      0.193 |     1 |      1 |      0 |       0 | nativeaot.StartupHook.XUnitWrapper.dll
----------------------------------------------------------------------------
     57.172 |    17 |     17 |      0 |       0 | (total)

More re-runs still showed some crashes, so this is not conclusive.

@filipnavara
Copy link
Member Author

I managed to find the GC hole. Fix is committed, rebuilding and rerunning tests now.

@filipnavara
Copy link
Member Author

filipnavara commented Jan 24, 2024

The tests pass on Raspberry Pi now:

Time [secs] | Total | Passed | Failed | Skipped | Assembly Execution Summary
============================================================================
      0.522 |     1 |      1 |      0 |       0 | nativeaot.CustomMain.XUnitWrapper.dll
      0.211 |     1 |      1 |      0 |       0 | nativeaot.GenerateUnmanagedEntryPoints.XUnitWrapper.dll
     44.544 |    14 |     14 |      0 |       0 | nativeaot.SmokeTests.XUnitWrapper.dll
      0.194 |     1 |      1 |      0 |       0 | nativeaot.StartupHook.XUnitWrapper.dll
----------------------------------------------------------------------------
     45.471 |    17 |     17 |      0 |       0 | (total)

Postmortem:

The main bug was that thread hijacking failed to report the method return value in R0 register as GC reference. It was reproducible with the TLS test in DynamicGenerics. I enabled LF_GC | LF_GCROOTS logging and captured a crash with a log and a crash dump. Then I found the last stack walk on the thread with trashed values and after some poking it became obvious what went wrong.

Second bug is that the Thumb bit was often not masked out from IP during stack walks. The LR register (or custom frames) saved on the stack have the Thumb bit set. It needs to be cleared before doing the GC enumeration to ensure that subtracting 1 here actually gets to previous instruction.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thank you!

@VSadov Could you please sign-off on this change as well?

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thank you!!

@VSadov
Copy link
Member

VSadov commented Jan 25, 2024

Couple things that do not need to go in this change (perhaps better be logged as issues to follow up):

  • I am not sure if background GC can be supported on 32bit, but the current state of GC barriers on ARM32 does not support it, so it should be disabled. Disabling write watch will do that.
    We need to disable it here, similarly to card bundles.
    add_definitions(-DFEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP)

Maybe it gets disabled somehow anyways, but it seems better to be explicit about it.

I think there could be some subtle NYIs around that, for example:

// N.B. for ARM32, we would need to deal with > PointerSize alignments.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm32 area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants