-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcenv.interlocked's Interlocked use full memory barriers even with 8.1 Atomics #67824
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/gc Issue DetailsAll void GCHeap::SetSuspensionPending(bool fSuspensionPending)
{
if (fSuspensionPending)
{
Interlocked::Increment(&g_fSuspensionPending);
}
else
{
Interlocked::Decrement(&g_fSuspensionPending);
}
} Interlocked is implemented like this: template <typename T>
__forceinline T Interlocked::Increment(T volatile *addend)
{
#ifdef _MSC_VER
static_assert(sizeof(long) == sizeof(T), "Size of long must be the same as size of T");
return _InterlockedIncrement((long*)addend);
#else
T result = __sync_add_and_fetch(addend, 1);
ArmInterlockedOperationBarrier();
return result;
#endif
} From my understanding we don't need full memory barriers in the case when we use 8.1 atomics. Same applies to all cc @VSadov PS: Yes, when we compile CoreCLR for Apple M1 we unintentionally use
|
I removed these barriers and lowered number of Will test it on our TE infra |
Right. If Also - I think GC has its own copy of Interlocked, separate from what runtime uses. I wonder if Interlocked in the runtime could use the same change. |
thank you very much for noticing this @EgorBo ! how stressful are tests run with TE infra (I thought they were mostly small perf tests?)? we do have a stress infra for GC. |
Sure, will try the stress infra, thanks! |
Is this something that we still want to change for 7.0? |
I know that @kunalspathak has some progress in a related issue and I personally probably will just remove that explicit memory barrier for OSX-arm64 at least. |
That's right. I am currently working on a prototype to have this removed and hoping to make it to 7.0 |
thanks @kunalspathak. unless you believe the linux scenario is different (like the full barrier we are doing is somehow a lot more expensive on linux than on windows), I would say let's not do this for linux either. |
Sounds good. I will close this issue then. |
All
Interlocked
functions from gcenv.interlocked.inl seem to insert full memory barriers, e.g. consider this usage (GC seems to heavily use them under BACKGROUND_GC):Disassembled on Apple M1:
Interlocked is implemented like this:
see godbolt: https://godbolt.org/z/3jPx3Mz14
From my understanding we don't need full memory barriers in the case when we use 8.1 atomics. Same applies to all
Interlocked
functions in gcenv.interlocked.inlThese barriers are needed on 8.0 where C++ compilers lower builtin atomic intrinsics without them for some reason (see https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/)
JIT does the same, e.g. for C#:
it produces on arm64-8.0:
and this on >=8.1:
cc @VSadov
PS: Yes, when we compile CoreCLR for Apple M1 we unintentionally use
-mcpu=apple-m1
and use all the new shiny instructions e.g. arm v8.3's ldapr, 8.1 atomics, etc.. (well, makes sense 🤷♂️)The text was updated successfully, but these errors were encountered: