You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that thrust::sort_by_key fails/gives a wrong result in a very specific circumstance, namely:
thrust::sort_by_key is used in a shared library, let's say library A
thrust::sort is used in a second shared library, let's say library B
both are linked to the main program
thrust::sort_by_key is called with more than 4864 elements (threshold for selecting specific sorting algorithm)
With git bisect, I determined that this problem occurs since NVIDIA/cub@c4299c4 , meaning that all thrust/cub 2.x.y versions are affected, but 1.x.y versions are fine.
The problem occurs with GCC under Linux, but not with MSVC under Windows.
When I run it with compute-sanitizer, it shows Program hit cudaErrorMissingConfiguration (error 52) due to "__global__ function call is not configured" on CUDA API call to cudaGetLastError for histogram_kernel and exclusive_sum_kernel.
My best guess what happens: some symbols in library A and library B get confused during linking, possibly because some functions (like DeviceRadixSortExclusiveSumKernel) don't have ValueT in their template parameter list (which is cub::NullType for thrust::sort and something else for thrust::sort_by_key).
This might happen with other thrust functions (unconfirmed but possible, I think).
This bug was first noticed in PointCloudLibrary/pcl#5846
Thanks for submitting this issue - the CCCL team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.
Is this a duplicate?
Type of Bug
Silent Failure
Component
Not sure
Describe the bug
I noticed that
thrust::sort_by_key
fails/gives a wrong result in a very specific circumstance, namely:thrust::sort_by_key
is used in a shared library, let's say library Athrust::sort
is used in a second shared library, let's say library Bthrust::sort_by_key
is called with more than 4864 elements (threshold for selecting specific sorting algorithm)With git bisect, I determined that this problem occurs since NVIDIA/cub@c4299c4 , meaning that all thrust/cub 2.x.y versions are affected, but 1.x.y versions are fine.
The problem occurs with GCC under Linux, but not with MSVC under Windows.
When I run it with
compute-sanitizer
, it showsProgram hit cudaErrorMissingConfiguration (error 52) due to "__global__ function call is not configured" on CUDA API call to cudaGetLastError
forhistogram_kernel
andexclusive_sum_kernel
.My best guess what happens: some symbols in library A and library B get confused during linking, possibly because some functions (like
DeviceRadixSortExclusiveSumKernel
) don't haveValueT
in their template parameter list (which iscub::NullType
forthrust::sort
and something else forthrust::sort_by_key
).This might happen with other thrust functions (unconfirmed but possible, I think).
This bug was first noticed in PointCloudLibrary/pcl#5846
How to Reproduce
Here is a minimal reproducible example: thrust_test.zip
Expected behavior
thrust::sort_by_key
always give the correct result (sorted)Reproduction link
No response
Operating System
Linux (exact version or distro does not matter)
nvidia-smi output
Not relevant for the problem, as far as I can tell
NVCC version
Not relevant, but thrust/cub must be version 2.0.0 or newer (as described above)
The text was updated successfully, but these errors were encountered: