[NativeAOT] Inline access to thread statics #79521

VSadov · 2022-12-12T02:23:35Z

While implementing thin locks for NativeAOT I noticed that making calls to runtime to figure location of ManagedThreadId is a considerable expense (relatively, as all other parts of the lock implementation are fairly cheap).

Note that in NateviAOT the managed thread ID is dispensed and managed on the managed side. That is because native thread and managed thread object may outlive each other and we need to keep the ID unique as long as either is alive.

The part that the ID is dispensed and set from managed code should not prevent the optimizations. We basically need to somehow intrincify/inline the computation of the managed thread ID location, then the regular implementation that initializes the ID on demand should work just fine.

In fact only reading path is perf critical. We will write to the location not more than once per life time of the thread.

Similar optimization may be applicable to CoreClr as well. CoreClr also makes a method call when accessing managed thread ID.

ghost · 2022-12-12T02:23:41Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Re: #79519

While implementing thin locks for NativeAOT I noticed that making calls to runtime to figure location of ManagedThreadId is a considerable expense (relatively as all other parts of the lock implementation are fairly cheap).

Note that in NateviAOT the managed thread ID is dispensed and managed on the managed side. That is because native thread and managed thread object may outlive each other and we need to keep the ID unique as long as either is alive.

That should not prevent the optimizations. We basically need to somehow intrincify/inline the computation of the managed thread ID
location, then the regular implementation that initializes the ID on demand should work just fine.

In fact only reading path is perf critical.
We will write to the location not more than once per life time of the thread.

Similar optimization may be applicable to CoreClr as well. CoreClr also makes a method call when accessing managed thread ID.

Author:	VSadov
Assignees:	-
Labels:	`area-NativeAOT-coreclr`
Milestone:	-

MichalStrehovsky · 2022-12-12T02:53:24Z

In .NET Native, we intrinsified getting the native thread ID and switched things to use that instead of the managed one (see old change by @AntonLapounov: dotnet/corert@7087448).

Not saying that that's what we should do because I'm far from an expert in this area - I just happen to remember random stuff from 5 years ago so throwing this here.

VSadov · 2022-12-12T05:51:26Z

We need specifically the managed thread ID here - the one that is a small int32 number: 1, 2, 3, 4... which is the identity of a thread in the managed runtime. In theory a managed thread does not need to map 1-1 to the native thread, and the same native thread may have different IDs in different runtimes, if process loads more than one runtime.
Historically native thread ID is a Windows thing, but there is typically something that is ID-like for threads on other platforms. It could be the stack base address, pthread_t, etc.. Even on Windows the TEB address could act as a faster substitute for a thread ID.

Anyways, the main issue here is that native ID is a pointer-sized value and there is no space for that in the object header, and there is no reason for it to be a pointer, even int32 is too generous - how many threads can you have at the same time?
CoreClr thin locks support IDs in [1..1024) range, which I think is on the low side nowdays. I made NativeAOT thin locks to support IDs up to 65535 (there are few spare bits if we need more, but not a lot). Anything above that can only own a fat lock. Native thread IDs, which are pointers, can easily be outside of these ranges.

As a known quantity under runtime's control, managed thread ID is more predictable and more portable. I think we should use managed IDs in the managed runtime unless scenario really calls for the native ID (hard to think of a scenario outside of interop with native threading). And we should make accessing the managed ID faster.
It is actually quite fast, but could be faster if we eliminate the cost of a call.

jkotas · 2022-12-12T07:47:24Z

We should make all thread-statics fast by inlining them. I do not think it is worth building some special path for just ManagedThreadId.

VSadov · 2022-12-12T07:57:12Z

We should make all thread-statics fast by inlining them

If that is not too hard, it would be great.

jkotas · 2022-12-12T08:13:54Z

It should not be significantly harder than inlining ManagedThreadId access only. The JIT needs to be able to emit the platform-specific native sequence to access thread statics in either case. The only difference is in where to get the symbols or offsets to use in the sequence.

EgorBo · 2022-12-14T15:34:46Z

Related: #63619

JulieLeeMSFT · 2023-01-12T22:16:23Z

CC @kunalspathak.

VSadov · 2023-02-22T21:02:21Z

RhpGetThreadStaticBaseForType call is the top fourth call in BasicMinimalApi traces (in terms of exclusive cost). There are other calls like Monitor.Enter and Monitor.Exit that may get cheaper if threadstatic access is inlined.

I think optimizing thread statics may result in material improvements

kunalspathak · 2023-02-22T21:31:27Z

Thanks for the data @VSadov . I am currently working on the prototype for this.

MichalStrehovsky · 2023-08-02T07:31:00Z

I'm moving this issue to 9.0 since we don't expect the RyuJIT TLS work to land in 8.0 per #89472 (comment).

MichalStrehovsky · 2024-01-30T06:44:22Z

Any codegen TLS work left or can we close?

VSadov · 2024-01-30T07:10:07Z

I think arm64 is still to be supported.

kunalspathak · 2024-02-15T21:08:58Z

windows/x64 support: #89472
linux/x64 support: #97413
linux/arm64 support: #97910

agocke · 2024-06-07T22:07:12Z

@kunalspathak any plans for Windows Arm64?

kunalspathak · 2024-06-07T23:18:55Z

@kunalspathak any plans for Windows Arm64?

Hoping to do in .NET 9

kunalspathak · 2024-07-02T04:12:05Z

@kunalspathak any plans for Windows Arm64?

Hoping to do in .NET 9

#104282

kunalspathak · 2024-07-03T05:03:56Z

#104282 is done, so this issue can be closed now.

MichalStrehovsky · 2024-07-09T08:36:58Z

#104282 is done, so this issue can be closed now.

Sounds good to me!

dotnet-issue-labeler bot added the area-NativeAOT-coreclr label Dec 12, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Dec 12, 2022

VSadov mentioned this issue Dec 12, 2022

[NativeAOT] Thin locks #79519

Merged

jkotas changed the title ~~[NativeAOT] Intrincify and inline the access to ManagedThreadId location~~ [NativeAOT] Inline access to thread statics Dec 12, 2022

jkotas mentioned this issue Dec 12, 2022

NativeAOT codegen optimization opportunities #64242

Closed

15 tasks

kunalspathak mentioned this issue Mar 4, 2023

[JIT] Add support to inline the field access of primitive types marked with TLS #82973

Merged

7 tasks

MichalStrehovsky added this to the 8.0.0 milestone Jul 11, 2023

ghost removed the untriaged New issue has not been triaged by the area owner label Jul 11, 2023

kunalspathak mentioned this issue Jul 29, 2023

[NativeAOT] Inline TLS access for windows/x64 #89472

Merged

MichalStrehovsky modified the milestones: 8.0.0, 9.0.0 Aug 2, 2023

kunalspathak mentioned this issue Jul 2, 2024

NativeAOT/Windows/Arm64: Add TLS inline support #104282

Merged

MichalStrehovsky closed this as completed Jul 9, 2024

github-actions bot locked and limited conversation to collaborators Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NativeAOT] Inline access to thread statics #79521

[NativeAOT] Inline access to thread statics #79521

VSadov commented Dec 12, 2022 •

edited

Loading

ghost commented Dec 12, 2022

MichalStrehovsky commented Dec 12, 2022

VSadov commented Dec 12, 2022 •

edited

Loading

jkotas commented Dec 12, 2022

VSadov commented Dec 12, 2022

jkotas commented Dec 12, 2022

EgorBo commented Dec 14, 2022

JulieLeeMSFT commented Jan 12, 2023

VSadov commented Feb 22, 2023

kunalspathak commented Feb 22, 2023

MichalStrehovsky commented Aug 2, 2023

MichalStrehovsky commented Jan 30, 2024

VSadov commented Jan 30, 2024

kunalspathak commented Feb 15, 2024

agocke commented Jun 7, 2024

kunalspathak commented Jun 7, 2024

kunalspathak commented Jul 2, 2024

kunalspathak commented Jul 3, 2024

MichalStrehovsky commented Jul 9, 2024

[NativeAOT] Inline access to thread statics #79521

[NativeAOT] Inline access to thread statics #79521

Comments

VSadov commented Dec 12, 2022 • edited Loading

ghost commented Dec 12, 2022

MichalStrehovsky commented Dec 12, 2022

VSadov commented Dec 12, 2022 • edited Loading

jkotas commented Dec 12, 2022

VSadov commented Dec 12, 2022

jkotas commented Dec 12, 2022

EgorBo commented Dec 14, 2022

JulieLeeMSFT commented Jan 12, 2023

VSadov commented Feb 22, 2023

kunalspathak commented Feb 22, 2023

MichalStrehovsky commented Aug 2, 2023

MichalStrehovsky commented Jan 30, 2024

VSadov commented Jan 30, 2024

kunalspathak commented Feb 15, 2024

agocke commented Jun 7, 2024

kunalspathak commented Jun 7, 2024

kunalspathak commented Jul 2, 2024

kunalspathak commented Jul 3, 2024

MichalStrehovsky commented Jul 9, 2024

VSadov commented Dec 12, 2022 •

edited

Loading

VSadov commented Dec 12, 2022 •

edited

Loading