-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] Inline access to thread statics #79521
Comments
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsRe: #79519 While implementing thin locks for NativeAOT I noticed that making calls to runtime to figure location of ManagedThreadId is a considerable expense (relatively as all other parts of the lock implementation are fairly cheap). Note that in NateviAOT the managed thread ID is dispensed and managed on the managed side. That is because native thread and managed thread object may outlive each other and we need to keep the ID unique as long as either is alive. That should not prevent the optimizations. We basically need to somehow intrincify/inline the computation of the managed thread ID In fact only reading path is perf critical. Similar optimization may be applicable to CoreClr as well. CoreClr also makes a method call when accessing managed thread ID.
|
In .NET Native, we intrinsified getting the native thread ID and switched things to use that instead of the managed one (see old change by @AntonLapounov: dotnet/corert@7087448). Not saying that that's what we should do because I'm far from an expert in this area - I just happen to remember random stuff from 5 years ago so throwing this here. |
We need specifically the managed thread ID here - the one that is a small int32 number: 1, 2, 3, 4... which is the identity of a thread in the managed runtime. In theory a managed thread does not need to map 1-1 to the native thread, and the same native thread may have different IDs in different runtimes, if process loads more than one runtime. Anyways, the main issue here is that native ID is a pointer-sized value and there is no space for that in the object header, and there is no reason for it to be a pointer, even int32 is too generous - how many threads can you have at the same time? As a known quantity under runtime's control, managed thread ID is more predictable and more portable. I think we should use managed IDs in the managed runtime unless scenario really calls for the native ID (hard to think of a scenario outside of interop with native threading). And we should make accessing the managed ID faster. |
We should make all thread-statics fast by inlining them. I do not think it is worth building some special path for just ManagedThreadId. |
If that is not too hard, it would be great. |
It should not be significantly harder than inlining ManagedThreadId access only. The JIT needs to be able to emit the platform-specific native sequence to access thread statics in either case. The only difference is in where to get the symbols or offsets to use in the sequence. |
Related: #63619 |
CC @kunalspathak. |
Thanks for the data @VSadov . I am currently working on the prototype for this. |
I'm moving this issue to 9.0 since we don't expect the RyuJIT TLS work to land in 8.0 per #89472 (comment). |
Any codegen TLS work left or can we close? |
I think arm64 is still to be supported. |
@kunalspathak any plans for Windows Arm64? |
Hoping to do in .NET 9 |
|
#104282 is done, so this issue can be closed now. |
Sounds good to me! |
Re: #79519
While implementing thin locks for NativeAOT I noticed that making calls to runtime to figure location of ManagedThreadId is a considerable expense (relatively, as all other parts of the lock implementation are fairly cheap).
Note that in NateviAOT the managed thread ID is dispensed and managed on the managed side. That is because native thread and managed thread object may outlive each other and we need to keep the ID unique as long as either is alive.
The part that the ID is dispensed and set from managed code should not prevent the optimizations. We basically need to somehow intrincify/inline the computation of the managed thread ID location, then the regular implementation that initializes the ID on demand should work just fine.
In fact only reading path is perf critical. We will write to the location not more than once per life time of the thread.
Similar optimization may be applicable to CoreClr as well. CoreClr also makes a method call when accessing managed thread ID.
The text was updated successfully, but these errors were encountered: