Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression of atomic_ref due to extra local store #1008

Closed
PointKernel opened this issue Feb 15, 2023 · 6 comments
Closed

Performance regression of atomic_ref due to extra local store #1008

PointKernel opened this issue Feb 15, 2023 · 6 comments
Labels
libcu++ For all items related to libcu++

Comments

@PointKernel
Copy link
Member

PointKernel commented Feb 15, 2023

When benchmarking atomic_ref::compare_exchange_strong against atomic::compare_exchange_strong, we noticed that the former is always slower.

Here is the isolated repro: https://godbolt.org/z/xzcjhY84W

Compared to atomic, atomic_ref generates an extra STL SASS instruction. It's been confirmed that by replacing atomic_ref::compare_exchange_strong with atomicCAS, we can get about the same performance as atomic::compare_exchange_strong so the extra local store is indeed the culprit.

image
NCU shows the STL seems to come from member init in __atomic_base_storage ctor.

Tasks

No tasks being tracked yet.
@jrhemstad jrhemstad added thrust For all items related to Thrust. libcu++ For all items related to libcu++ and removed thrust For all items related to Thrust. labels Feb 22, 2023
@jrhemstad jrhemstad removed their assignment Mar 7, 2023
@jarmak-nv jarmak-nv transferred this issue from NVIDIA/libcudacxx Nov 8, 2023
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 8, 2023
@jrhemstad jrhemstad added the blocked This PR cannot be merged due to various reasons label Feb 13, 2024
@ahendriksen
Copy link
Contributor

Seems to be an issue related to volatile pointer dereference. I have traced the different code paths taken by atomic ref and atomic here: ahendriksen@c9756c0

@ahendriksen
Copy link
Contributor

The fix is to add non-volatile overloads: 6892523

@ahendriksen ahendriksen removed the blocked This PR cannot be merged due to various reasons label Feb 14, 2024
ahendriksen referenced this issue Feb 16, 2024
This is the beginning of a history rewrite annotation.
This is a rewrite of a commit initially made in the main repo.
Original hash: da70d1f9ce542853172e1f3a00d475fd01173d62.
@ahendriksen
Copy link
Contributor

Thanks to @wmaxey PR #1582, the code example does not produce local stores anymore. I confirmed this locally.

In this CE link, I have added CCCL trunk as library, but it does not yet seem to have incorporated the latest changes.I expect the fix will reflect there shortly as well.

@PointKernel, can you check if the latest PR fixes your issue?

@PointKernel
Copy link
Member Author

Thanks for the updates.

Our project, cuCollections is using rapids-cmake for dependency control and bumping the CCCL version to the latest in rapids-cmake requires non-trivial effort. Would this change be backported to CCCL 2.2 or 2.3?

@jrhemstad
Copy link
Collaborator

Would this change be backported to CCCL 2.2 or 2.3?

No. We don't backport features to older releases. It will require updating to a newer version of CCCL.

@PointKernel
Copy link
Member Author

Would this change be backported to CCCL 2.2 or 2.3?

No. We don't backport features to older releases. It will require updating to a newer version of CCCL.

Got it, thanks!

Will keep you posted on the verification. Closing this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcu++ For all items related to libcu++
Projects
Archived in project
Development

No branches or pull requests

3 participants