Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Fix atomic/atomic_ref volatile overloads. #1424

Closed
1 task done
wmaxey opened this issue Feb 21, 2024 · 2 comments
Closed
1 task done

[BUG]: Fix atomic/atomic_ref volatile overloads. #1424

wmaxey opened this issue Feb 21, 2024 · 2 comments
Assignees
Labels
bug Something isn't working right.

Comments

@wmaxey
Copy link
Member

wmaxey commented Feb 21, 2024

Is this a duplicate?

Type of Bug

Performance

Component

libcu++

Describe the bug

Atomics generate poor SASS due to the fact that there are no viable non-volatile overloads in the PTX and CUDA atomic layers.

For example, in atomic_cuda.h the store dispatch will add volatile to non-volatile pointers eliminating the possibility of passing a non-volatile to a lower dispatch.

template <typename _Tp, int _Sco, bool _Ref>
_LIBCUDACXX_HOST_DEVICE
void __cxx_atomic_store(__cxx_atomic_base_heterogeneous_impl<_Tp, _Sco, _Ref> volatile* __a, _Tp __val, memory_order __order) {
alignas(_Tp) auto __tmp = __val;
NV_DISPATCH_TARGET(
NV_IS_DEVICE, (
__atomic_store_n_cuda(__cxx_get_underlying_device_atomic(__a), __tmp, static_cast<__memory_order_underlying_t>(__order), __scope_tag<_Sco>());
),
NV_IS_HOST, (
__host::__cxx_atomic_store(&__a->__a_value, __tmp, __order);
)
)
}

We need to duplicate the above code with a non-volatile pass to allow the generated PTX atomics header to be able to correctly process non-volatile atomic types when they are added to codegen.cpp.

Lastly we need to use LLVM filecheck to ensure that the PTX/SASS generated matches our expectations. This will give us longstanding proof that our atomic_ref implementation is as efficient as possible despite lack of compiler intrinsics for atomic operations.

How to Reproduce

#1008

Expected behavior

The extraneous store should be removed.

Reproduction link

https://godbolt.org/z/xzcjhY84W

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@wmaxey wmaxey added the bug Something isn't working right. label Feb 21, 2024
@wmaxey wmaxey self-assigned this Feb 21, 2024
@gonzalobg
Copy link
Collaborator

volatile atomics should include .mmio in their lowering to uphold what CUDA C++ promises about volatile.

@jrhemstad
Copy link
Collaborator

Closed by #1582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Archived in project
Development

No branches or pull requests

3 participants