Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Regression: Using Thrust OpenMP and CUDA back ends at the same time #2098

Closed
1 task done
dkolsen-pgi opened this issue Jul 29, 2024 · 9 comments · Fixed by #2332 or #2341
Closed
1 task done

[BUG]: Regression: Using Thrust OpenMP and CUDA back ends at the same time #2098

dkolsen-pgi opened this issue Jul 29, 2024 · 9 comments · Fixed by #2332 or #2341
Assignees
Labels
bug Something isn't working right.

Comments

@dkolsen-pgi
Copy link
Collaborator

Is this a duplicate?

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

It used to be possible, at least through CCCL 2.3.0, to use the Thrust OpenMP and CUDA back ends in the same compilation, choosing which back end to use at each algorithm call by specifying different execution policies, thrust::omp::par and thrust::cuda::par respectively. This was done by defining

#define THRUST_HOST_SYSTEM   THRUST_HOST_SYSTEM_OMP
#define THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA

before including any Thrust headers.

This no longer works with the CCCL main branch. It results in compilation errors. See the reproduction steps below.

This Thrust feature is necessary to implement nvc++ -stdpar=gpu,multicore, which is a much requested feature that we are hoping to implement for the next NVC++ release.

How to Reproduce

Test file:

#define THRUST_HOST_SYSTEM   THRUST_HOST_SYSTEM_OMP
#define THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA
#include <thrust/execution_policy.h>
int main() { }

To reproduce with the NVC++ dev compiler, which uses the latest CCCL main branch.

$ nvc++ -stdpar thrust.cpp
"/home/dolsen/work/pgi/dev/nv/Linux_x86_64/mine/compilers/include-stdpar/thrust/execution_policy.h", line 58: error: namespace "thrust::THRUST_200600_86_NS::system::cpp::detail" has no member "par_t"
  using host_t = thrust::system::cpp::detail::par_t;
                                              ^

"/home/dolsen/work/pgi/dev/nv/Linux_x86_64/mine/compilers/include-stdpar/thrust/execution_policy.h", line 291: error: const variable "thrust::THRUST_200600_86_NS::host" requires an initializer
  static const detail::host_t host;
                                  ^

2 errors detected in the compilation of "thrust.cpp".

It can also be reproduced with NVCC (since many don't have easy access to the NVC++ dev compiler). The paths here are in the NVHPC environment, but it should be easy enough to change the paths to work in the CCCL dev environment.

$ /proj/cuda/12.5/Linux_x86_64/bin/nvcc -x cu -I/proj/cuda/cccl/main/thrust -I/proj/cuda/cccl/main/cub -I /proj/cuda/cccl/main/libcudacxx/include thrust.cpp
/proj/cuda/cccl/main/thrust/thrust/execution_policy.h(58): error: namespace "thrust::THRUST_200600_520_NS::system::cpp::detail" has no member "par_t"
  using host_t = thrust::system::cpp::detail::par_t;
                                              ^

/proj/cuda/cccl/main/thrust/thrust/execution_policy.h(291): error: const variable "thrust::THRUST_200600_520_NS::host" requires an initializer
  static const detail::host_t host;
                                  ^

2 errors detected in the compilation of "thrust.cpp".

Expected behavior

Everything should just work when using Thrust after:

#define THRUST_HOST_SYSTEM   THRUST_HOST_SYSTEM_OMP
#define THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@dkolsen-pgi dkolsen-pgi added the bug Something isn't working right. label Jul 29, 2024
@jrhemstad
Copy link
Collaborator

Confirmed reproducer on CCCL main and nvcc 12.4 https://godbolt.org/z/G1cs38xYb

@jrhemstad
Copy link
Collaborator

@alliepiper would you mind looking into this?

@dkolsen-pgi
Copy link
Collaborator Author

This is blocking my implementation of nvc++ -stdpar=gpu,multicore. I was hoping to check in the feature this week, but I can't check it in to the NVHPC dev branch until this issue is fixed in the CCCL main branch. And we can't release the feature, scheduled for HPC SDK 24.9 in late September, until the fix for this issue is in a CCCL release.

So this issue is urgent.

(My apologies for not speaking up about this sooner. I was on vacation for a couple weeks and forgot about this.)

@alliepiper alliepiper self-assigned this Aug 29, 2024
@alliepiper
Copy link
Collaborator

Taking a look now.

The break bisects to 91b78d8.

@alliepiper
Copy link
Collaborator

@bernhardmgruber
Copy link
Contributor

bernhardmgruber commented Aug 30, 2024

If this is an important use case for nvc++, can we extend the CI to cover this scenario? And if it is already covered, do we know why the CI missed this issue? Never mind, just saw the issue: #2333

@dkolsen-pgi
Copy link
Collaborator Author

Even if CCCL had excellent CI coverage for NVC++, it wouldn't have caught this bug. This bug only affects a new feature that I am working on, which is not in production yet. The bug may have been caught if #2333 had been implemented, but it wouldn't have been caught by NVC++ coverage.

@dkolsen-pgi
Copy link
Collaborator Author

Thanks for the fix, Allison. Compilation gets farther along. But it runs into another error that looks like the same problem.

"/home/dolsen/work/pgi/dev/nv/Linux_x86_64/mine/compilers/include-stdpar/thrust/mr/host_memory_resource.h", line 36: error: namespace "thrust::THRUST_200600_86_NS::system::cpp" has no member "memory_resource"
  using host_memory_resource = thrust::system::cpp::memory_resource;
                                                    ^

That line in <thrust/mr/host_memory_resource.h> looks wrong and I assume should have the same fix.

I also found three other uses of thrust::system::cpp::detail::execution_policy in files that are not in thrust/system/cpp. I don't know if thrust::system::cpp::detail::execution_policy exists when the host system is not CPP, so I don't know if these are actual problems. The files in question are thrust/detail/overlapped_copy.h, thrust/system/omp/detail/execution_policy.h, and thrust/system/tbb/detail/execution_policy.h.

@dkolsen-pgi dkolsen-pgi reopened this Aug 30, 2024
bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this issue Aug 30, 2024
This change was erroneously introduced in 91b78d8

Fixes: NVIDIA#2098
@bernhardmgruber
Copy link
Contributor

I looked into the offending commit using git show 91b78d8 | grep -C 3 __THRUST_HOST_SYSTEM_NAMESPACE and I find two occurences where __THRUST_HOST_SYSTEM_NAMESPACE was erroneously replaced by cpp. One of them was fixes in #2332, the other one is proposed here: #2341

I also found three other uses of thrust::system::cpp::detail::execution_policy in files that are not in thrust/system/cpp. I don't know if thrust::system::cpp::detail::execution_policy exists when the host system is not CPP, so I don't know if these are actual problems. The files in question are thrust/detail/overlapped_copy.h, thrust/system/omp/detail/execution_policy.h, and thrust/system/tbb/detail/execution_policy.h.

I looked at those files and annotated the locations with git blame, but those locations were not touched in commit 91b78d8. Notice that the offending commit only rewrote typedef to using.

wmaxey pushed a commit that referenced this issue Sep 4, 2024
This change was erroneously introduced in 91b78d8

Fixes: #2098
wmaxey pushed a commit that referenced this issue Sep 4, 2024
This change was erroneously introduced in 91b78d8

Fixes: #2098
wmaxey added a commit that referenced this issue Sep 10, 2024
* Update CCCL version to 2.6.1

* Do not hardcode Thrust's host system to cpp. (#2332)

* Fix hardcoding __THRUST_HOST_SYSTEM_NAMESPACE to cpp (#2341)

This change was erroneously introduced in 91b78d8

Fixes: #2098

---------

Co-authored-by: Allison Piper <[email protected]>
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: In Review
4 participants