[BUG]: Fix `atomic/atomic_ref` volatile overloads. #1424

wmaxey · 2024-02-21T16:08:44Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Type of Bug

Performance

Component

libcu++

Describe the bug

Atomics generate poor SASS due to the fact that there are no viable non-volatile overloads in the PTX and CUDA atomic layers.

For example, in atomic_cuda.h the store dispatch will add volatile to non-volatile pointers eliminating the possibility of passing a non-volatile to a lower dispatch.

cccl/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h

Lines 190 to 202 in 068ee47

    
           template <typename _Tp, int _Sco, bool _Ref> 
        
           _LIBCUDACXX_HOST_DEVICE 
        
            void __cxx_atomic_store(__cxx_atomic_base_heterogeneous_impl<_Tp, _Sco, _Ref> volatile* __a, _Tp __val, memory_order __order) { 
        
               alignas(_Tp) auto __tmp = __val; 
        
               NV_DISPATCH_TARGET( 
        
                   NV_IS_DEVICE, ( 
        
                       __atomic_store_n_cuda(__cxx_get_underlying_device_atomic(__a), __tmp, static_cast<__memory_order_underlying_t>(__order), __scope_tag<_Sco>()); 
        
                   ), 
        
                   NV_IS_HOST, ( 
        
                       __host::__cxx_atomic_store(&__a->__a_value, __tmp, __order); 
        
                   ) 
        
               ) 
        
           }

We need to duplicate the above code with a non-volatile pass to allow the generated PTX atomics header to be able to correctly process non-volatile atomic types when they are added to codegen.cpp.

Lastly we need to use LLVM filecheck to ensure that the PTX/SASS generated matches our expectations. This will give us longstanding proof that our atomic_ref implementation is as efficient as possible despite lack of compiler intrinsics for atomic operations.

How to Reproduce

#1008

Expected behavior

The extraneous store should be removed.

Reproduction link

https://godbolt.org/z/xzcjhY84W

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

The text was updated successfully, but these errors were encountered:

gonzalobg · 2024-02-23T18:05:21Z

volatile atomics should include .mmio in their lowering to uphold what CUDA C++ promises about volatile.

jrhemstad · 2024-04-10T16:18:50Z

Closed by #1582

wmaxey added the bug Something isn't working right. label Feb 21, 2024

wmaxey self-assigned this Feb 21, 2024

wmaxey mentioned this issue Mar 8, 2024

Make libcudacxx's codegen part of CI and add it to the project. #1526

Merged

2 tasks

wmaxey mentioned this issue Apr 3, 2024

Add missing non-volatile atomic overloads. #1582

Merged

2 tasks

jrhemstad mentioned this issue Apr 10, 2024

volatile atomics should include .mmio in their lowering to uphold what CUDA C++ promises about volatile #1613

Open

jrhemstad closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Fix `atomic/atomic_ref` volatile overloads. #1424

[BUG]: Fix `atomic/atomic_ref` volatile overloads. #1424

wmaxey commented Feb 21, 2024

gonzalobg commented Feb 23, 2024

jrhemstad commented Apr 10, 2024

[BUG]: Fix atomic/atomic_ref volatile overloads. #1424

[BUG]: Fix atomic/atomic_ref volatile overloads. #1424

Comments

wmaxey commented Feb 21, 2024

Is this a duplicate?

Type of Bug

Component

Describe the bug

How to Reproduce

Expected behavior

Reproduction link

Operating System

nvidia-smi output

NVCC version

gonzalobg commented Feb 23, 2024

jrhemstad commented Apr 10, 2024

[BUG]: Fix `atomic/atomic_ref` volatile overloads. #1424

[BUG]: Fix `atomic/atomic_ref` volatile overloads. #1424