-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression of atomic_ref
due to extra local store
#1008
Comments
Seems to be an issue related to volatile pointer dereference. I have traced the different code paths taken by atomic ref and atomic here: ahendriksen@c9756c0 |
The fix is to add non-volatile overloads: 6892523 |
This is the beginning of a history rewrite annotation. This is a rewrite of a commit initially made in the main repo. Original hash: da70d1f9ce542853172e1f3a00d475fd01173d62.
Thanks to @wmaxey PR #1582, the code example does not produce local stores anymore. I confirmed this locally. In this CE link, I have added CCCL trunk as library, but it does not yet seem to have incorporated the latest changes.I expect the fix will reflect there shortly as well. @PointKernel, can you check if the latest PR fixes your issue? |
Thanks for the updates. Our project, cuCollections is using |
No. We don't backport features to older releases. It will require updating to a newer version of CCCL. |
Got it, thanks! Will keep you posted on the verification. Closing this issue for now. |
When benchmarking
atomic_ref::compare_exchange_strong
againstatomic::compare_exchange_strong
, we noticed that the former is always slower.Here is the isolated repro: https://godbolt.org/z/xzcjhY84W
Compared to
atomic
,atomic_ref
generates an extraSTL
SASS instruction. It's been confirmed that by replacingatomic_ref::compare_exchange_strong
withatomicCAS
, we can get about the same performance asatomic::compare_exchange_strong
so the extra local store is indeed the culprit.NCU shows the
STL
seems to come from member init in__atomic_base_storage
ctor.Tasks
The text was updated successfully, but these errors were encountered: