[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget #2333

SingleAccretion · 2023-07-06T21:39:12Z

In this implementation:

The compiler generates an array of tuples { Stub, Target }. This is done for both simple and instantiating stubs.
At runtime, this information is sorted, with lookups done via binary search.

Costs:

The size of the array is ~5KB for the HelloWasm test, which has ~400 simple unboxing stubs and ~100 instantiating ones. For a medium-sized application that has ~50K stubs, that's 400K on 32 bit (and double that on 64 bit).
Binary search is not "cheap" and, unfortunately, we will often be hit it on the startup path because of ComputeLdftnReverseLookup. Fortunately, we should rarely hit in steady state, as it will only be used for Delegate.MethodInfo and Delegate.CreateDelegate (the latter - in rare scenarios).
Managed code that supports all of this clocks in at about 5K, which is rather large, but is mostly a reflection of how our codegen is not that great.

Code structure choices:

ILC - layered under MetadataManager like the other maps used by reflection.
Runtime - layered in CoreLib for consistency with the existing (non-portable) implementation.

Contributes to #2307.

SingleAccretion · 2023-07-06T21:48:29Z

src/coreclr/tools/aot/ILCompiler.LLVM/Compiler/DependencyAnalysis/LLVMMethodCodeNode.cs

@@ -116,39 +122,24 @@ public void InitializeDebugInfo(MethodDebugInformation debugInfo)
        public void InitializeLocalTypes(TypeDesc[] localTypes)
        {
        }
-    }
-
-    internal sealed class LlvmMethodBodyNode : LLVMMethodCodeNode, IMethodBodyNode


I deleted the LlvmMethodBodyNode/LLVMMethodCodeNode fork, it did not look to serve any meaningful purpose. MethodCodeNode does not use such pattern either.

SingleAccretion · 2023-07-06T21:49:07Z

src/coreclr/tools/aot/ILCompiler.LLVM/Compiler/DependencyAnalysis/LLVMMethodCodeNode.cs


-        public override IEnumerable<DependencyListEntry> GetStaticDependencies(NodeFactory factory)
+        public ISymbolNode GetUnboxingThunkTarget(NodeFactory factory)


ISpecialUnboxThunkNode implementation is C&P from MethodCodeNode.

...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs

SingleAccretion · 2023-07-06T22:57:31Z

@dotnet/nativeaot-llvm

jkotas · 2023-07-07T00:34:23Z

This may be interesting to upstream to address dotnet/runtime#73906

jkotas · 2023-07-07T00:42:50Z

src/coreclr/nativeaot/Runtime/RuntimeInstance.cpp

+    uint32_t instantiatingStubCount = allMappingsCount - simpleStubCount;
+
+    if (!initMap(m_unboxingStubTargetsMap, &allMappings->Data[0], simpleStubCount) ||
+        !initMap(m_unboxingAndInstantiatingStubTargetsMap, &allMappings->Data[simpleStubCount], instantiatingStubCount))


The core runtime should not need this mapping for anything. Is that correct?

Would it make more sense to initialize this map lazily and do the whole thing in C#, like we do that for other maps that are needed by reflection?

The core runtime should not need this mapping for anything. Is that correct?

Yes. It's only consumed at the CoreLib layer and above.

Would it make more sense to initialize this map lazily and do the whole thing in C#, like we do that for other maps that are needed by reflection?

I considered it. The main problem is that the functionality is used by delegate equality, which seems low-level enough to be non-trivial to initialize robustly in managed code.

Additionally, the SHash closed hashing table implementation is very nicely fitted to this use case, a managed dictionary would use 3x+ more memory.

Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.

Even if it still needed for some reason, delegate equality should not be used by low-level runtime code where the lazy initialization would lead to problems.

A reverse map like this one is a common problem for reflection-like functionality. I think it would be fine to implement this reverse map in a similar way. We have a few of those. For example, look for FunctionPointersToOffsets. (This one is also more complicated than it needs to be due to support for multimodule flavors that we do not care about anymore.)

In fact, it looks like this reverse map was implemented in managed code at some point. There are unused UnboxingAndInstantiatingStubMap and struct UnboxingAndInstantiatingStubMapEntry that one would need to implement reverse maps like this.

Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.

Doing some research, agree it looks unnecessary.

I will pursue its removal and a managed implementation for these maps.

...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs

src/coreclr/nativeaot/Runtime/RuntimeInstance.cpp

jkotas · 2023-07-09T03:44:42Z

src/coreclr/nativeaot/Runtime/RuntimeInstance.cpp

+    uint32_t instantiatingStubCount = allMappingsCount - simpleStubCount;
+
+    if (!initMap(m_unboxingStubTargetsMap, &allMappings->Data[0], simpleStubCount) ||
+        !initMap(m_unboxingAndInstantiatingStubTargetsMap, &allMappings->Data[simpleStubCount], instantiatingStubCount))


Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.

Even if it still needed for some reason, delegate equality should not be used by low-level runtime code where the lazy initialization would lead to problems.

A reverse map like this one is a common problem for reflection-like functionality. I think it would be fine to implement this reverse map in a similar way. We have a few of those. For example, look for FunctionPointersToOffsets. (This one is also more complicated than it needs to be due to support for multimodule flavors that we do not care about anymore.)

In fact, it looks like this reverse map was implemented in managed code at some point. There are unused UnboxingAndInstantiatingStubMap and struct UnboxingAndInstantiatingStubMapEntry that one would need to implement reverse maps like this.

SingleAccretion · 2023-07-13T18:20:45Z

@dotnet/nativeaot-llvm PTAL

In this implementation: 1) The compiler generates an array of tuples { Stub, Target }. This is done for both simple and instantiating stubs. 2) At runtime startup, pointers into this array are stored in two hash tables, with the stubs as keys. This allow for more or less efficient target retrieval. Costs: 1) The size of the array is ~5KB for the reflection test, which has ~500 simple unboxing stubs and ~100 instantiating ones. Thus for a medium-sized program, overhead on the order of ~50KB is expected. 2) The hash table construction startup penalty should be on the order of 10us for the reflection test and scaling linearly for larger programs, which seems acceptable.

This reverts commit 82c5c95.

dotnet/corert@08d78ae The original motivation for this was handling import stubs: ``` Function pointer equality comparison was not handling cross-module pointers correctly when optimizations were enabled (causes target pointers to be wrapped in jump stubs sometimes). The delegate equality comparison was hitting this bug. ``` We do not have import stubs anymore and unwrapping unboxing stubs serves no purpose here. Microbenchmarks of delegate equality show ~3x improvement with this change: ``` Bench_DelegateEquality_Positive_OpenStatic<10000000>() took: 355 ms Bench_DelegateEquality_Positive_ClosedStatic<10000000>() took: 367 ms Bench_DelegateEquality_Positive_ClosedInstance<10000000>() took: 371 ms Bench_DelegateEquality_Positive_OpenStatic<10000000>() took: 121 ms Bench_DelegateEquality_Positive_ClosedStatic<10000000>() took: 120 ms Bench_DelegateEquality_Positive_ClosedInstance<10000000>() took: 122 ms ``` Additionally, there is some desire to upstream changes for a portable RhGetCodeTarget implementation. Not having to deal with it at this relatively low-level layer will make things more robust.

In this implementation: 1) The compiler generates an array of tuples { Stub, Target }. This is done for both simple and instantiating stubs. 2) At runtime, this information is sorted, with lookups done via binary search. Costs: 1) The size of the array is ~5KB for the HelloWasm test, which has ~400 simple unboxing stubs and ~100 instantiating ones. For a medium-sized application that has ~50K stubs, that's 400K on 32 bit (and double that on 64 bit). 2) Binary search is not "cheap" and, unfortunately, we will often hit it on the startup path because of ComputeLdftnReverseLookup. Fortunately, we should rarely hit in steady state, as it will only be used for Delegate.MethodInfo and Delegate.CreateDelegate (the latter - in rare scenarios). 3) Managed code that supports all of this clocks in at about 5K, which is rather large, but is mostly a reflection of how our codegen is not that great. Code structure choices: 1) ILC - layered under MetadataManager like the other maps used by reflection. 2) Runtime - layered in CoreLib for consistency with the existing (non-portable) implementation.

jkotas

Thank you!

SingleAccretion mentioned this pull request Jul 4, 2023

[NativeAOT-LLVM] Enable tests and fix bugs #2307

Open

SingleAccretion commented Jul 6, 2023

View reviewed changes

...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs Outdated Show resolved Hide resolved

SingleAccretion marked this pull request as ready for review July 6, 2023 22:56

jkotas reviewed Jul 7, 2023

View reviewed changes

yowl reviewed Jul 7, 2023

View reviewed changes

...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs Outdated Show resolved Hide resolved

...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs Outdated Show resolved Hide resolved

yowl approved these changes Jul 7, 2023

View reviewed changes

jkotas reviewed Jul 9, 2023

View reviewed changes

SingleAccretion marked this pull request as draft July 10, 2023 17:03

This was referenced Jul 10, 2023

Do not use RhGetCodeTarget in delegate equality dotnet/runtime#88611

Merged

[NativeAOT-LLVM] A few ILC cleanups #2339

Merged

SingleAccretion force-pushed the RhGetCodeTarget branch 2 times, most recently from 5d120c4 to ec99e4e Compare July 13, 2023 16:22

SingleAccretion closed this Jul 13, 2023

SingleAccretion reopened this Jul 13, 2023

SingleAccretion marked this pull request as ready for review July 13, 2023 17:49

SingleAccretion added 7 commits July 13, 2023 21:36

wasmjit-diff.ps1 fix

4844053

Delete LLVMMethodBodyNode

93b2f97

Revert "Add a portable implementation of RhGetCodeTarget"

80d002d

This reverts commit 82c5c95.

Enable the test

a3b9eb2

SingleAccretion force-pushed the RhGetCodeTarget branch from ec99e4e to 42bf2dd Compare July 13, 2023 18:36

jkotas added the area-NativeAOT-LLVM LLVM generation for Native AOT compilation (including Web Assembly) label Jul 16, 2023

jkotas approved these changes Jul 17, 2023

View reviewed changes

jkotas merged commit 8657c73 into dotnet:feature/NativeAOT-LLVM Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget #2333

[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget #2333

SingleAccretion commented Jul 6, 2023 •

edited

Loading

SingleAccretion Jul 6, 2023

SingleAccretion Jul 6, 2023

SingleAccretion commented Jul 6, 2023

jkotas commented Jul 7, 2023

jkotas Jul 7, 2023

SingleAccretion Jul 7, 2023

jkotas Jul 9, 2023

SingleAccretion Jul 10, 2023

jkotas Jul 9, 2023

SingleAccretion commented Jul 13, 2023

jkotas left a comment


		public override IEnumerable<DependencyListEntry> GetStaticDependencies(NodeFactory factory)
		public ISymbolNode GetUnboxingThunkTarget(NodeFactory factory)

[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget #2333

[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget #2333

Conversation

SingleAccretion commented Jul 6, 2023 • edited Loading

SingleAccretion Jul 6, 2023

Choose a reason for hiding this comment

SingleAccretion Jul 6, 2023

Choose a reason for hiding this comment

SingleAccretion commented Jul 6, 2023

jkotas commented Jul 7, 2023

jkotas Jul 7, 2023

Choose a reason for hiding this comment

SingleAccretion Jul 7, 2023

Choose a reason for hiding this comment

jkotas Jul 9, 2023

Choose a reason for hiding this comment

SingleAccretion Jul 10, 2023

Choose a reason for hiding this comment

jkotas Jul 9, 2023

Choose a reason for hiding this comment

SingleAccretion commented Jul 13, 2023

jkotas left a comment

Choose a reason for hiding this comment

SingleAccretion commented Jul 6, 2023 •

edited

Loading