You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Atomics generate poor SASS due to the fact that there are no viable non-volatile overloads in the PTX and CUDA atomic layers.
For example, in atomic_cuda.h the store dispatch will add volatile to non-volatile pointers eliminating the possibility of passing a non-volatile to a lower dispatch.
We need to duplicate the above code with a non-volatile pass to allow the generated PTX atomics header to be able to correctly process non-volatile atomic types when they are added to codegen.cpp.
Lastly we need to use LLVM filecheck to ensure that the PTX/SASS generated matches our expectations. This will give us longstanding proof that our atomic_ref implementation is as efficient as possible despite lack of compiler intrinsics for atomic operations.
Is this a duplicate?
Type of Bug
Performance
Component
libcu++
Describe the bug
Atomics generate poor SASS due to the fact that there are no viable non-volatile overloads in the PTX and CUDA atomic layers.
For example, in atomic_cuda.h the store dispatch will add volatile to non-volatile pointers eliminating the possibility of passing a non-volatile to a lower dispatch.
cccl/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h
Lines 190 to 202 in 068ee47
We need to duplicate the above code with a non-volatile pass to allow the generated PTX atomics header to be able to correctly process non-volatile atomic types when they are added to codegen.cpp.
Lastly we need to use LLVM filecheck to ensure that the PTX/SASS generated matches our expectations. This will give us longstanding proof that our
atomic_ref
implementation is as efficient as possible despite lack of compiler intrinsics for atomic operations.How to Reproduce
#1008
Expected behavior
The extraneous store should be removed.
Reproduction link
https://godbolt.org/z/xzcjhY84W
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
The text was updated successfully, but these errors were encountered: