-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA vector_add
sample project
#2160
Conversation
🟨 CI finished in 1h 16m: Pass: 96%/56 | Total: 2h 42m | Avg: 2m 54s | Max: 11m 09s | Hits: 96%/2650
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 4h 08m: Pass: 94%/56 | Total: 2h 37m | Avg: 2m 48s | Max: 12m 26s | Hits: 97%/2600
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 20h 55m: Pass: 96%/56 | Total: 2h 38m | Avg: 2m 49s | Max: 12m 26s | Hits: 97%/2650
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
|
||
# Default to building for the GPU on the current system | ||
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES) | ||
set(CMAKE_CUDA_ARCHITECTURES 86) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem isn't the architecture value, it's that the way this test is set up, it requires running on a GPU runner, but is ending up on a CPU runner.
@alliepiper can help you get it sorted.
// Define the kernel launch parameters | ||
constexpr int threadsPerBlock = 256; | ||
int blocksPerGrid = (numElements + threadsPerBlock - 1) / threadsPerBlock; | ||
auto dims = cudax::make_hierarchy(cudax::grid_dims(blocksPerGrid), cudax::block_dims<threadsPerBlock>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #2001 merged we could use at_least(numElements) for the grid dimensions.
We could also try to come up with some shorthands for the entire hierarchy dimensions like that, they are super common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this for a shorthand:
auto dims = cudax::distribute<256>(numElements);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that
🟨 CI finished in 42m 13s: Pass: 94%/56 | Total: 2h 41m | Avg: 2m 53s | Max: 11m 17s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 1h 00m: Pass: 89%/56 | Total: 2h 38m | Avg: 2m 49s | Max: 11m 19s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
@miscco something seems to be going wrong with the mdspan concepts portability macros with msvc. sccache "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin\nvcc.exe" -forward-unknown-to-host-compiler -DLIBCUDACXX_ENABLE_EXCEPTIONS -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -IC:\cccl\build\cudax-cpp20\cudax\samples\src\cudax_samples-build\_deps\cccl-src\libcudacxx\lib\cmake\libcudacxx\..\..\..\include -IC:\cccl\build\cudax-cpp20\cudax\samples\src\cudax_samples-build\_deps\cccl-src\thrust\thrust\cmake\..\.. -IC:\cccl\build\cudax-cpp20\cudax\samples\src\cudax_samples-build\_deps\cccl-src\cub\cub\cmake\..\.. -isystem C:\cccl\build\cudax-cpp20\cudax\samples\src\cudax_samples-build\_deps\cccl-src\cudax\lib\cmake\cudax\..\..\..\include -D_WINDOWS -Xcompiler="/W3 /GR /EHsc" -Xcompiler="-MDd -Zi -Ob0 -Od /RTC1" -std=c++17 "--generate-code=arch=compute_86,code=[compute_86,sm_86]" -Xcompiler=/Zc:__cplusplus -Xcompiler=/Zc:preprocessor -MD -MT CMakeFiles\vector_add.dir\vector_add\vector_add.cu.obj -MF CMakeFiles\vector_add.dir\vector_add\vector_add.cu.obj.d -x cu -c C:\cccl\cudax\samples\vector_add\vector_add.cu -o CMakeFiles\vector_add.dir\vector_add\vector_add.cu.obj -Xcompiler=-FdCMakeFiles\vector_add.dir\,-FS
vector_add.cu
C:\cccl\build\cudax-cpp20\cudax\samples\src\cudax_samples-build\_deps\cccl-src\libcudacxx\include\cuda/std/__mdspan/default_accessor.h(75): error C4002: too many arguments for function-like macro invocation '__MDSPAN_PP_CAT_IMPL'
C:\cccl\build\cudax-cpp20\cudax\samples\src\cudax_samples-build\_deps\cccl-src\libcudacxx\include\cuda/std/__mdspan/macros.h(276): note: in expansion of macro '__MDSPAN_TEMPLATE_REQUIRES'
C:\cccl\build\cudax-cpp20\cudax\samples\src\cudax_samples-build\_deps\cccl-src\libcudacxx\include\cuda/std/__mdspan/macros.h(242): note: in expansion of macro '__MDSPAN_PP_CAT' i get this with |
Yeah our I have a branch with a complete rewrite lying around, but need to implement |
it doesn't seem to work with c++20 either: https://github.com/NVIDIA/cccl/actions/runs/10314858890/job/28553983052?pr=2160 halp! |
@robertmaynard I recheckd and it seems that the samples are pulling in cccl main :( |
It is proving difficult to handle for msvc and also the one we are using in libcu++ it much cleaner Gets NVIDIA#2160 compiling on MSVC
It is proving difficult to handle for msvc and also the one we are using in libcu++ it much cleaner Gets #2160 compiling on MSVC
🟩 CI finished in 1h 48m: Pass: 100%/56 | Total: 2h 37m | Avg: 2m 48s | Max: 11m 08s | Hits: 80%/102
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
It is proving difficult to handle for msvc and also the one we are using in libcu++ it much cleaner Gets NVIDIA#2160 compiling on MSVC
--------- Co-authored-by: pciolkosz <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>
Description
This adds a sample project for cudax, initially populated with the standard CUDA
vector_add
sample. We will morph this into something beautiful usingcudax
.closes #2159