Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add meta dimensions specifiers to cudax::launch #2001

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

pciolkosz
Copy link
Contributor

This change adds new types to describe dimensions in a hierarchy, called meta dimensions. These types are not holding specific values and instead communicate a specific intent. Dimensions built using them can be later finalized to replace the meta dimensions with specific values calculated for a specific device and kernel function. The finalization is automatic inside cudax::launch, but can also be done manually with finalize function, if the calculated values are needed ahead of time, for example to scale some buffers passed into launch.
Last piece provided is finalized_t type alias template that allows to get the type of the finalized hierarchy without calling finalize.
All of the above also works on kernel_config type, in which case it just operates on the hierarchy contained in the configuration.

This PR also adds support for hierarchy queries with the same level being the unit, in which case extents<1, 1, 1> or just 1 is returned.

While working on this change I noticed there is an issue with is_invocable and extended lambda in how it's used in cudax::launch to detect if the lambda takes hierarchy/configuration as the first argument. Because of that issue, right now launch forces the extended lambda to accept it as the first argument until a better solution is found.

There are couple of smaller TODOs around meta dimensions like adding some safety to make sure someone passes the same function to finalize and launch when using the finalize or deciding if using finalized_t for kernel instantiation is always mandatory or only when using meta dimensions.

@pciolkosz pciolkosz requested a review from a team as a code owner July 17, 2024 23:14
@pciolkosz pciolkosz requested a review from wmaxey July 17, 2024 23:14
Copy link
Contributor

🟨 CI finished in 11m 16s: Pass: 92%/56 | Total: 2h 54m | Avg: 3m 06s | Max: 10m 56s | Hits: 70%/1581
  • 🟨 cudax: Pass: 92%/55 | Total: 2h 43m | Avg: 2m 58s | Max: 8m 21s | Hits: 70%/1581

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  92%/51  | Total:  2h 32m | Avg:  2m 59s | Max:  8m 21s | Hits:  71%/1457  
      🟩 arm64              Pass: 100%/4   | Total: 11m 08s | Avg:  2m 47s | Max:  3m 09s | Hits:  64%/124   
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  91%/47  | Total:  2h 09m | Avg:  2m 45s | Max:  8m 21s | Hits:  66%/1333  
      🟩 Test               Pass: 100%/8   | Total: 33m 56s | Avg:  4m 14s | Max:  4m 54s | Hits:  96%/248   
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  5m 02s | Avg:  2m 31s | Max:  2m 34s | Hits:  67%/62    
      🟩 Clang10            Pass: 100%/2   | Total:  5m 32s | Avg:  2m 46s | Max:  2m 56s | Hits:  67%/62    
      🟩 Clang11            Pass: 100%/4   | Total:  9m 56s | Avg:  2m 29s | Max:  2m 42s | Hits:  67%/124   
      🟩 Clang12            Pass: 100%/4   | Total:  9m 49s | Avg:  2m 27s | Max:  2m 35s | Hits:  67%/124   
      🟩 Clang13            Pass: 100%/4   | Total:  9m 22s | Avg:  2m 20s | Max:  2m 27s | Hits:  75%/124   
      🟩 Clang14            Pass: 100%/6   | Total: 17m 29s | Avg:  2m 54s | Max:  3m 54s | Hits:  78%/186   
      🟩 Clang15            Pass: 100%/2   | Total:  5m 20s | Avg:  2m 40s | Max:  2m 45s | Hits:  67%/62    
      🟩 Clang16            Pass: 100%/6   | Total: 20m 18s | Avg:  3m 23s | Max:  4m 53s | Hits:  78%/186   
      🟥 GCC9               Pass:   0%/2   | Total:  5m 00s | Avg:  2m 30s | Max:  2m 40s
      🟩 GCC10              Pass: 100%/4   | Total:  9m 19s | Avg:  2m 19s | Max:  2m 23s | Hits:  61%/124   
      🟩 GCC11              Pass: 100%/4   | Total:  9m 56s | Avg:  2m 29s | Max:  2m 47s | Hits:  61%/124   
      🟩 GCC12              Pass: 100%/12  | Total: 36m 59s | Avg:  3m 04s | Max:  4m 54s | Hits:  72%/372   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s | Hits:  67%/31    
      🟥 MSVC14.36          Pass:   0%/1   | Total:  7m 47s | Avg:  7m 47s | Max:  7m 47s
      🟥 MSVC14.39          Pass:   0%/1   | Total:  8m 21s | Avg:  8m 21s | Max:  8m 21s
    🟨 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 22m | Avg:  2m 45s | Max:  4m 53s | Hits:  73%/930   
      🟨 GCC                Pass:  90%/22  | Total:  1h 01m | Avg:  2m 47s | Max:  4m 54s | Hits:  67%/620   
      🟩 Intel              Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s | Hits:  67%/31    
      🟥 MSVC               Pass:   0%/2   | Total: 16m 08s | Avg:  8m 04s | Max:  8m 21s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  92%/55  | Total:  2h 43m | Avg:  2m 58s | Max:  8m 21s | Hits:  70%/1581  
    🟨 gpu
      🟨 v100               Pass:  92%/55  | Total:  2h 43m | Avg:  2m 58s | Max:  8m 21s | Hits:  70%/1581  
    🟨 ctk
      🟨 12.0               Pass:  91%/23  | Total:  1h 08m | Avg:  2m 58s | Max:  7m 47s | Hits:  71%/651   
      🟨 12.5               Pass:  93%/32  | Total:  1h 34m | Avg:  2m 57s | Max:  8m 21s | Hits:  70%/930   
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  91%/23  | Total:  1h 08m | Avg:  2m 58s | Max:  7m 47s | Hits:  71%/651   
      🟨 nvcc12.5           Pass:  93%/32  | Total:  1h 34m | Avg:  2m 57s | Max:  8m 21s | Hits:  70%/930   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 54s | Avg:  1m 54s | Max:  1m 54s | Hits:  61%/31    
      🟩 90a                Pass: 100%/1   | Total:  2m 21s | Avg:  2m 21s | Max:  2m 21s | Hits:  61%/31    
    🟨 std
      🟨 17                 Pass:  93%/31  | Total:  1h 25m | Avg:  2m 45s | Max:  4m 53s | Hits:  69%/899   
      🟨 20                 Pass:  91%/24  | Total:  1h 17m | Avg:  3m 14s | Max:  8m 21s | Hits:  72%/682   
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 56s | Avg: 10m 56s | Max: 10m 56s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 11m 47s: Pass: 96%/56 | Total: 2h 44m | Avg: 2m 56s | Max: 11m 29s | Hits: 83%/1643
  • 🟨 cudax: Pass: 96%/55 | Total: 2h 33m | Avg: 2m 47s | Max: 7m 54s | Hits: 83%/1643

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/51  | Total:  2h 22m | Avg:  2m 47s | Max:  7m 54s | Hits:  84%/1519  
      🟩 arm64              Pass: 100%/4   | Total: 11m 03s | Avg:  2m 45s | Max:  3m 02s | Hits:  80%/124   
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/30  | Total:  1h 18m | Avg:  2m 37s | Max:  4m 39s | Hits:  86%/930   
      🟩 GCC                Pass: 100%/22  | Total: 56m 00s | Avg:  2m 32s | Max:  3m 57s | Hits:  81%/682   
      🟩 Intel              Pass: 100%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s | Hits:  83%/31    
      🔥 MSVC               Pass:   0%/2   | Total: 15m 46s | Avg:  7m 53s | Max:  7m 54s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/47  | Total:  2h 00m | Avg:  2m 34s | Max:  7m 54s | Hits:  81%/1395  
      🟩 Test               Pass: 100%/8   | Total: 32m 38s | Avg:  4m 04s | Max:  4m 39s | Hits:  96%/248   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/31  | Total:  1h 19m | Avg:  2m 34s | Max:  4m 39s | Hits:  83%/961   
      🔍 20                 Pass:  91%/24  | Total:  1h 13m | Avg:  3m 04s | Max:  7m 54s | Hits:  84%/682   
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 25s | Avg:  2m 12s | Max:  2m 14s | Hits:  83%/62    
      🟩 Clang10            Pass: 100%/2   | Total:  4m 34s | Avg:  2m 17s | Max:  2m 17s | Hits:  83%/62    
      🟩 Clang11            Pass: 100%/4   | Total:  8m 48s | Avg:  2m 12s | Max:  2m 13s | Hits:  83%/124   
      🟩 Clang12            Pass: 100%/4   | Total:  9m 36s | Avg:  2m 24s | Max:  2m 43s | Hits:  83%/124   
      🟩 Clang13            Pass: 100%/4   | Total:  8m 52s | Avg:  2m 13s | Max:  2m 14s | Hits:  83%/124   
      🟩 Clang14            Pass: 100%/6   | Total: 18m 20s | Avg:  3m 03s | Max:  4m 39s | Hits:  89%/186   
      🟩 Clang15            Pass: 100%/2   | Total:  4m 39s | Avg:  2m 19s | Max:  2m 26s | Hits:  83%/62    
      🟩 Clang16            Pass: 100%/6   | Total: 19m 18s | Avg:  3m 13s | Max:  4m 39s | Hits:  89%/186   
      🟩 GCC9               Pass: 100%/2   | Total:  4m 12s | Avg:  2m 06s | Max:  2m 13s | Hits:  85%/62    
      🟩 GCC10              Pass: 100%/4   | Total:  9m 39s | Avg:  2m 24s | Max:  3m 00s | Hits:  77%/124   
      🟩 GCC11              Pass: 100%/4   | Total:  8m 35s | Avg:  2m 08s | Max:  2m 12s | Hits:  77%/124   
      🟩 GCC12              Pass: 100%/12  | Total: 33m 34s | Avg:  2m 47s | Max:  3m 57s | Hits:  82%/372   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s | Hits:  83%/31    
      🟥 MSVC14.36          Pass:   0%/1   | Total:  7m 54s | Avg:  7m 54s | Max:  7m 54s
      🟥 MSVC14.39          Pass:   0%/1   | Total:  7m 52s | Avg:  7m 52s | Max:  7m 52s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  96%/55  | Total:  2h 33m | Avg:  2m 47s | Max:  7m 54s | Hits:  83%/1643  
    🟨 gpu
      🟨 v100               Pass:  96%/55  | Total:  2h 33m | Avg:  2m 47s | Max:  7m 54s | Hits:  83%/1643  
    🟨 ctk
      🟨 12.0               Pass:  95%/23  | Total:  1h 04m | Avg:  2m 49s | Max:  7m 54s | Hits:  84%/682   
      🟨 12.5               Pass:  96%/32  | Total:  1h 28m | Avg:  2m 45s | Max:  7m 52s | Hits:  83%/961   
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  95%/23  | Total:  1h 04m | Avg:  2m 49s | Max:  7m 54s | Hits:  84%/682   
      🟨 nvcc12.5           Pass:  96%/32  | Total:  1h 28m | Avg:  2m 45s | Max:  7m 52s | Hits:  83%/961   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s | Hits:  77%/31    
      🟩 90a                Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s | Hits:  77%/31    
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

template <typename Dims>
struct dimensions_handler
struct dimensions_handler : public base_dimensions_handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: struct base classes are public by default.

Suggested change
struct dimensions_handler : public base_dimensions_handler
struct dimensions_handler : base_dimensions_handler

I don't know why this triggers me enough to write a comment. Feel free to ignore!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to prefer explicitness about these things. Makes the code more accessible to less experienced contributors.

Comment on lines +82 to +86
template <typename Dims>
inline constexpr bool usable_for_queries = false;

template <typename T, size_t... Extents>
inline constexpr bool usable_for_queries<dimensions<T, Extents...>> = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: I love variable templates as traits instead of structs. They are shorter and more to the point. They require C++14 though, which is why we don't see many of those around here.

Copy link
Contributor

🟨 CI finished in 11m 56s: Pass: 96%/56 | Total: 2h 48m | Avg: 3m 00s | Max: 11m 56s | Hits: 83%/1643
  • 🟨 cudax: Pass: 96%/55 | Total: 2h 36m | Avg: 2m 50s | Max: 8m 27s | Hits: 83%/1643

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/51  | Total:  2h 25m | Avg:  2m 50s | Max:  8m 27s | Hits:  83%/1519  
      🟩 arm64              Pass: 100%/4   | Total: 11m 43s | Avg:  2m 55s | Max:  3m 08s | Hits:  80%/124   
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/30  | Total:  1h 20m | Avg:  2m 41s | Max:  5m 39s | Hits:  86%/930   
      🟩 GCC                Pass: 100%/22  | Total: 56m 38s | Avg:  2m 34s | Max:  4m 32s | Hits:  80%/682   
      🟩 Intel              Pass: 100%/1   | Total:  2m 52s | Avg:  2m 52s | Max:  2m 52s | Hits:  83%/31    
      🔥 MSVC               Pass:   0%/2   | Total: 16m 24s | Avg:  8m 12s | Max:  8m 27s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/47  | Total:  2h 02m | Avg:  2m 36s | Max:  8m 27s | Hits:  81%/1395  
      🟩 Test               Pass: 100%/8   | Total: 34m 04s | Avg:  4m 15s | Max:  5m 39s | Hits:  96%/248   
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/31  | Total:  1h 21m | Avg:  2m 36s | Max:  5m 39s | Hits:  83%/961   
      🔍 20                 Pass:  91%/24  | Total:  1h 15m | Avg:  3m 09s | Max:  8m 27s | Hits:  84%/682   
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 35s | Avg:  2m 17s | Max:  2m 18s | Hits:  83%/62    
      🟩 Clang10            Pass: 100%/2   | Total:  4m 48s | Avg:  2m 24s | Max:  2m 29s | Hits:  83%/62    
      🟩 Clang11            Pass: 100%/4   | Total:  9m 09s | Avg:  2m 17s | Max:  2m 23s | Hits:  83%/124   
      🟩 Clang12            Pass: 100%/4   | Total:  9m 43s | Avg:  2m 25s | Max:  2m 38s | Hits:  83%/124   
      🟩 Clang13            Pass: 100%/4   | Total:  9m 23s | Avg:  2m 20s | Max:  2m 32s | Hits:  83%/124   
      🟩 Clang14            Pass: 100%/6   | Total: 17m 54s | Avg:  2m 59s | Max:  4m 23s | Hits:  89%/186   
      🟩 Clang15            Pass: 100%/2   | Total:  4m 36s | Avg:  2m 18s | Max:  2m 19s | Hits:  83%/62    
      🟩 Clang16            Pass: 100%/6   | Total: 20m 41s | Avg:  3m 26s | Max:  5m 39s | Hits:  89%/186   
      🟩 GCC9               Pass: 100%/2   | Total:  4m 23s | Avg:  2m 11s | Max:  2m 13s | Hits:  77%/62    
      🟩 GCC10              Pass: 100%/4   | Total:  8m 52s | Avg:  2m 13s | Max:  2m 18s | Hits:  77%/124   
      🟩 GCC11              Pass: 100%/4   | Total:  8m 54s | Avg:  2m 13s | Max:  2m 16s | Hits:  77%/124   
      🟩 GCC12              Pass: 100%/12  | Total: 34m 29s | Avg:  2m 52s | Max:  4m 32s | Hits:  82%/372   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 52s | Avg:  2m 52s | Max:  2m 52s | Hits:  83%/31    
      🟥 MSVC14.36          Pass:   0%/1   | Total:  8m 27s | Avg:  8m 27s | Max:  8m 27s
      🟥 MSVC14.39          Pass:   0%/1   | Total:  7m 57s | Avg:  7m 57s | Max:  7m 57s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  96%/55  | Total:  2h 36m | Avg:  2m 50s | Max:  8m 27s | Hits:  83%/1643  
    🟨 gpu
      🟨 v100               Pass:  96%/55  | Total:  2h 36m | Avg:  2m 50s | Max:  8m 27s | Hits:  83%/1643  
    🟨 ctk
      🟨 12.0               Pass:  95%/23  | Total:  1h 06m | Avg:  2m 52s | Max:  8m 27s | Hits:  83%/682   
      🟨 12.5               Pass:  96%/32  | Total:  1h 30m | Avg:  2m 50s | Max:  7m 57s | Hits:  83%/961   
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  95%/23  | Total:  1h 06m | Avg:  2m 52s | Max:  8m 27s | Hits:  83%/682   
      🟨 nvcc12.5           Pass:  96%/32  | Total:  1h 30m | Avg:  2m 50s | Max:  7m 57s | Hits:  83%/961   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s | Hits:  77%/31    
      🟩 90a                Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s | Hits:  77%/31    
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 56s | Avg: 11m 56s | Max: 11m 56s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 16m 11s: Pass: 100%/56 | Total: 2h 52m | Avg: 3m 04s | Max: 12m 31s | Hits: 93%/1693
  • 🟩 cudax: Pass: 100%/55 | Total: 2h 39m | Avg: 2m 54s | Max: 7m 56s | Hits: 93%/1693

    🟩 cpu
      🟩 amd64              Pass: 100%/51  | Total:  2h 27m | Avg:  2m 53s | Max:  7m 56s | Hits:  93%/1569  
      🟩 arm64              Pass: 100%/4   | Total: 12m 32s | Avg:  3m 08s | Max:  3m 19s | Hits:  93%/124   
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 06m | Avg:  2m 53s | Max:  7m 56s | Hits:  93%/707   
      🟩 12.5               Pass: 100%/32  | Total:  1h 33m | Avg:  2m 54s | Max:  7m 55s | Hits:  93%/986   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 06m | Avg:  2m 53s | Max:  7m 56s | Hits:  93%/707   
      🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 33m | Avg:  2m 54s | Max:  7m 55s | Hits:  93%/986   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/55  | Total:  2h 39m | Avg:  2m 54s | Max:  7m 56s | Hits:  93%/1693  
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  5m 03s | Avg:  2m 31s | Max:  2m 35s | Hits:  96%/62    
      🟩 Clang10            Pass: 100%/2   | Total:  4m 58s | Avg:  2m 29s | Max:  2m 33s | Hits:  96%/62    
      🟩 Clang11            Pass: 100%/4   | Total:  9m 43s | Avg:  2m 25s | Max:  2m 30s | Hits:  96%/124   
      🟩 Clang12            Pass: 100%/4   | Total:  9m 45s | Avg:  2m 26s | Max:  2m 29s | Hits:  96%/124   
      🟩 Clang13            Pass: 100%/4   | Total:  9m 41s | Avg:  2m 25s | Max:  2m 33s | Hits:  96%/124   
      🟩 Clang14            Pass: 100%/6   | Total: 17m 16s | Avg:  2m 52s | Max:  3m 50s | Hits:  97%/186   
      🟩 Clang15            Pass: 100%/2   | Total:  4m 49s | Avg:  2m 24s | Max:  2m 26s | Hits:  96%/62    
      🟩 Clang16            Pass: 100%/6   | Total: 20m 16s | Avg:  3m 22s | Max:  4m 19s | Hits:  97%/186   
      🟩 GCC9               Pass: 100%/2   | Total:  4m 34s | Avg:  2m 17s | Max:  2m 20s | Hits:  90%/62    
      🟩 GCC10              Pass: 100%/4   | Total:  9m 18s | Avg:  2m 19s | Max:  2m 23s | Hits:  90%/124   
      🟩 GCC11              Pass: 100%/4   | Total:  9m 18s | Avg:  2m 19s | Max:  2m 24s | Hits:  90%/124   
      🟩 GCC12              Pass: 100%/12  | Total: 36m 10s | Avg:  3m 00s | Max:  4m 42s | Hits:  91%/372   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s | Hits:  96%/31    
      🟩 MSVC14.36          Pass: 100%/1   | Total:  7m 56s | Avg:  7m 56s | Max:  7m 56s | Hits:  64%/25    
      🟩 MSVC14.39          Pass: 100%/1   | Total:  7m 55s | Avg:  7m 55s | Max:  7m 55s | Hits:  64%/25    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 21m | Avg:  2m 43s | Max:  4m 19s | Hits:  97%/930   
      🟩 GCC                Pass: 100%/22  | Total: 59m 20s | Avg:  2m 41s | Max:  4m 42s | Hits:  90%/682   
      🟩 Intel              Pass: 100%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s | Hits:  96%/31    
      🟩 MSVC               Pass: 100%/2   | Total: 15m 51s | Avg:  7m 55s | Max:  7m 56s | Hits:  64%/50    
    🟩 gpu
      🟩 v100               Pass: 100%/55  | Total:  2h 39m | Avg:  2m 54s | Max:  7m 56s | Hits:  93%/1693  
    🟩 jobs
      🟩 Build              Pass: 100%/47  | Total:  2h 07m | Avg:  2m 43s | Max:  7m 56s | Hits:  93%/1445  
      🟩 Test               Pass: 100%/8   | Total: 32m 00s | Avg:  4m 00s | Max:  4m 42s | Hits:  96%/248   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 10s | Avg:  2m 10s | Max:  2m 10s | Hits:  90%/31    
      🟩 90a                Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s | Hits:  90%/31    
    🟩 std
      🟩 17                 Pass: 100%/31  | Total:  1h 21m | Avg:  2m 38s | Max:  4m 14s | Hits:  94%/961   
      🟩 20                 Pass: 100%/24  | Total:  1h 17m | Avg:  3m 14s | Max:  7m 56s | Hits:  92%/732   
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 12m 31s | Avg: 12m 31s | Max: 12m 31s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Co-authored-by: Bernhard Manfred Gruber <[email protected]>
@pciolkosz pciolkosz added the CUDA Next Feature intended for the Cuda Next experimental library label Jul 23, 2024
@pciolkosz pciolkosz requested review from a team as code owners August 4, 2024 01:06
@pciolkosz pciolkosz force-pushed the dims_meta_specifier branch 2 times, most recently from 21ba25f to 1b14e37 Compare August 4, 2024 01:22
Copy link
Contributor

github-actions bot commented Aug 4, 2024

🟩 CI finished in 10m 33s: Pass: 100%/56 | Total: 2h 34m | Avg: 2m 45s | Max: 10m 33s | Hits: 88%/2848
  • 🟩 cudax: Pass: 100%/55 | Total: 2h 23m | Avg: 2m 36s | Max: 7m 38s | Hits: 88%/2848

    🟩 cpu
      🟩 amd64              Pass: 100%/51  | Total:  2h 12m | Avg:  2m 35s | Max:  7m 38s | Hits:  87%/2640  
      🟩 arm64              Pass: 100%/4   | Total: 10m 57s | Avg:  2m 44s | Max:  3m 09s | Hits:  96%/208   
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total: 59m 11s | Avg:  2m 34s | Max:  6m 49s | Hits:  86%/1190  
      🟩 12.5               Pass: 100%/32  | Total:  1h 24m | Avg:  2m 38s | Max:  7m 38s | Hits:  89%/1658  
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total: 59m 11s | Avg:  2m 34s | Max:  6m 49s | Hits:  86%/1190  
      🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 24m | Avg:  2m 38s | Max:  7m 38s | Hits:  89%/1658  
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/55  | Total:  2h 23m | Avg:  2m 36s | Max:  7m 38s | Hits:  88%/2848  
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 38s | Avg:  2m 19s | Max:  2m 27s | Hits:  88%/104   
      🟩 Clang10            Pass: 100%/2   | Total:  4m 28s | Avg:  2m 14s | Max:  2m 24s | Hits:  88%/104   
      🟩 Clang11            Pass: 100%/4   | Total:  8m 38s | Avg:  2m 09s | Max:  2m 16s | Hits:  88%/208   
      🟩 Clang12            Pass: 100%/4   | Total:  8m 41s | Avg:  2m 10s | Max:  2m 20s | Hits:  88%/208   
      🟩 Clang13            Pass: 100%/4   | Total:  9m 16s | Avg:  2m 19s | Max:  2m 25s | Hits:  88%/208   
      🟩 Clang14            Pass: 100%/6   | Total: 16m 07s | Avg:  2m 41s | Max:  3m 42s | Hits:  88%/312   
      🟩 Clang15            Pass: 100%/2   | Total:  4m 04s | Avg:  2m 02s | Max:  2m 07s | Hits:  98%/104   
      🟩 Clang16            Pass: 100%/6   | Total: 18m 23s | Avg:  3m 03s | Max:  3m 52s | Hits:  89%/312   
      🟩 GCC9               Pass: 100%/2   | Total:  3m 56s | Avg:  1m 58s | Max:  2m 05s | Hits:  94%/104   
      🟩 GCC10              Pass: 100%/4   | Total:  7m 45s | Avg:  1m 56s | Max:  2m 12s | Hits:  94%/208   
      🟩 GCC11              Pass: 100%/4   | Total:  7m 24s | Avg:  1m 51s | Max:  2m 06s | Hits:  94%/208   
      🟩 GCC12              Pass: 100%/12  | Total: 32m 44s | Avg:  2m 43s | Max:  3m 33s | Hits:  85%/624   
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s | Hits:  86%/52    
      🟩 MSVC14.36          Pass: 100%/1   | Total:  6m 49s | Avg:  6m 49s | Max:  6m 49s | Hits:  67%/46    
      🟩 MSVC14.39          Pass: 100%/1   | Total:  7m 38s | Avg:  7m 38s | Max:  7m 38s | Hits:  69%/46    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 14m | Avg:  2m 28s | Max:  3m 52s | Hits:  89%/1560  
      🟩 GCC                Pass: 100%/22  | Total: 51m 49s | Avg:  2m 21s | Max:  3m 33s | Hits:  89%/1144  
      🟩 Intel              Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s | Hits:  86%/52    
      🟩 MSVC               Pass: 100%/2   | Total: 14m 27s | Avg:  7m 13s | Max:  7m 38s | Hits:  68%/92    
    🟩 gpu
      🟩 v100               Pass: 100%/55  | Total:  2h 23m | Avg:  2m 36s | Max:  7m 38s | Hits:  88%/2848  
    🟩 jobs
      🟩 Build              Pass: 100%/47  | Total:  1h 54m | Avg:  2m 26s | Max:  7m 38s | Hits:  86%/2432  
      🟩 Test               Pass: 100%/8   | Total: 28m 49s | Avg:  3m 36s | Max:  3m 52s | Hits:  98%/416   
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 42s | Avg:  1m 42s | Max:  1m 42s | Hits:  94%/52    
      🟩 90a                Pass: 100%/1   | Total:  1m 58s | Avg:  1m 58s | Max:  1m 58s | Hits:  94%/52    
    🟩 std
      🟩 17                 Pass: 100%/31  | Total:  1h 13m | Avg:  2m 23s | Max:  3m 51s | Hits:  89%/1612  
      🟩 20                 Pass: 100%/24  | Total:  1h 09m | Avg:  2m 53s | Max:  7m 38s | Hits:  87%/1236  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

* a largest size that still allows full occupancy.
* This type is usable only to describe dimensions at block level
*/
struct best_occupancy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_occupancy

* @param kernel
* Kernel functor that the configuration are intended for
*/
# ifdef _MSC_VER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use CCCL macro here

Copy link
Contributor

🟨 CI finished in 30m 19s: Pass: 96%/56 | Total: 2h 39m | Avg: 2m 51s | Max: 12m 44s
  • 🟨 cudax: Pass: 96%/55 | Total: 2h 27m | Avg: 2m 40s | Max: 9m 20s

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/51  | Total:  2h 19m | Avg:  2m 44s | Max:  9m 20s
      🟩 arm64              Pass: 100%/4   | Total:  7m 23s | Avg:  1m 50s | Max:  1m 53s
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/30  | Total:  1h 12m | Avg:  2m 24s | Max:  4m 45s
      🟩 GCC                Pass: 100%/22  | Total: 53m 41s | Avg:  2m 26s | Max:  4m 30s
      🟩 Intel              Pass: 100%/1   | Total:  2m 50s | Avg:  2m 50s | Max:  2m 50s
      🔥 MSVC               Pass:   0%/2   | Total: 18m 00s | Avg:  9m 00s | Max:  9m 20s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/47  | Total:  1h 54m | Avg:  2m 25s | Max:  9m 20s
      🟩 Test               Pass: 100%/8   | Total: 32m 40s | Avg:  4m 05s | Max:  4m 45s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/31  | Total:  1h 15m | Avg:  2m 25s | Max:  4m 45s
      🔍 20                 Pass:  91%/24  | Total:  1h 11m | Avg:  2m 59s | Max:  9m 20s
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 11s | Avg:  2m 05s | Max:  2m 12s
      🟩 Clang10            Pass: 100%/2   | Total:  4m 12s | Avg:  2m 06s | Max:  2m 08s
      🟩 Clang11            Pass: 100%/4   | Total:  8m 30s | Avg:  2m 07s | Max:  2m 11s
      🟩 Clang12            Pass: 100%/4   | Total:  8m 26s | Avg:  2m 06s | Max:  2m 16s
      🟩 Clang13            Pass: 100%/4   | Total:  8m 40s | Avg:  2m 10s | Max:  2m 22s
      🟩 Clang14            Pass: 100%/6   | Total: 17m 17s | Avg:  2m 52s | Max:  4m 45s
      🟩 Clang15            Pass: 100%/2   | Total:  4m 36s | Avg:  2m 18s | Max:  2m 19s
      🟩 Clang16            Pass: 100%/6   | Total: 16m 37s | Avg:  2m 46s | Max:  4m 14s
      🟩 GCC9               Pass: 100%/2   | Total:  4m 13s | Avg:  2m 06s | Max:  2m 08s
      🟩 GCC10              Pass: 100%/4   | Total:  8m 32s | Avg:  2m 08s | Max:  2m 20s
      🟩 GCC11              Pass: 100%/4   | Total:  8m 34s | Avg:  2m 08s | Max:  2m 23s
      🟩 GCC12              Pass: 100%/12  | Total: 32m 22s | Avg:  2m 41s | Max:  4m 30s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 50s | Avg:  2m 50s | Max:  2m 50s
      🟥 MSVC14.36          Pass:   0%/1   | Total:  8m 40s | Avg:  8m 40s | Max:  8m 40s
      🟥 MSVC14.39          Pass:   0%/1   | Total:  9m 20s | Avg:  9m 20s | Max:  9m 20s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  96%/55  | Total:  2h 27m | Avg:  2m 40s | Max:  9m 20s
    🟨 gpu
      🟨 v100               Pass:  96%/55  | Total:  2h 27m | Avg:  2m 40s | Max:  9m 20s
    🟨 ctk
      🟨 12.0               Pass:  95%/23  | Total:  1h 02m | Avg:  2m 43s | Max:  8m 40s
      🟨 12.5               Pass:  96%/32  | Total:  1h 24m | Avg:  2m 38s | Max:  9m 20s
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  95%/23  | Total:  1h 02m | Avg:  2m 43s | Max:  8m 40s
      🟨 nvcc12.5           Pass:  96%/32  | Total:  1h 24m | Avg:  2m 38s | Max:  9m 20s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 50s | Avg:  1m 50s | Max:  1m 50s
      🟩 90a                Pass: 100%/1   | Total:  1m 55s | Avg:  1m 55s | Max:  1m 55s
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 22m 22s: Pass: 96%/56 | Total: 2h 37m | Avg: 2m 48s | Max: 11m 29s
  • 🟨 cudax: Pass: 96%/55 | Total: 2h 25m | Avg: 2m 38s | Max: 10m 30s

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/51  | Total:  2h 18m | Avg:  2m 42s | Max: 10m 30s
      🟩 arm64              Pass: 100%/4   | Total:  7m 12s | Avg:  1m 48s | Max:  1m 51s
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/30  | Total:  1h 12m | Avg:  2m 24s | Max:  4m 39s
      🟩 GCC                Pass: 100%/22  | Total: 51m 39s | Avg:  2m 20s | Max:  4m 14s
      🟩 Intel              Pass: 100%/1   | Total:  2m 49s | Avg:  2m 49s | Max:  2m 49s
      🔥 MSVC               Pass:   0%/2   | Total: 18m 48s | Avg:  9m 24s | Max: 10m 30s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/47  | Total:  1h 53m | Avg:  2m 25s | Max: 10m 30s
      🟩 Test               Pass: 100%/8   | Total: 31m 59s | Avg:  3m 59s | Max:  4m 39s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/31  | Total:  1h 13m | Avg:  2m 22s | Max:  4m 39s
      🔍 20                 Pass:  91%/24  | Total:  1h 12m | Avg:  3m 00s | Max: 10m 30s
    🟨 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 22s | Avg:  2m 11s | Max:  2m 22s
      🟩 Clang10            Pass: 100%/2   | Total:  4m 05s | Avg:  2m 02s | Max:  2m 05s
      🟩 Clang11            Pass: 100%/4   | Total:  9m 00s | Avg:  2m 15s | Max:  2m 36s
      🟩 Clang12            Pass: 100%/4   | Total:  8m 16s | Avg:  2m 04s | Max:  2m 10s
      🟩 Clang13            Pass: 100%/4   | Total:  8m 32s | Avg:  2m 08s | Max:  2m 10s
      🟩 Clang14            Pass: 100%/6   | Total: 17m 20s | Avg:  2m 53s | Max:  4m 39s
      🟩 Clang15            Pass: 100%/2   | Total:  4m 24s | Avg:  2m 12s | Max:  2m 18s
      🟩 Clang16            Pass: 100%/6   | Total: 16m 28s | Avg:  2m 44s | Max:  4m 05s
      🟩 GCC9               Pass: 100%/2   | Total:  3m 51s | Avg:  1m 55s | Max:  1m 58s
      🟩 GCC10              Pass: 100%/4   | Total:  8m 42s | Avg:  2m 10s | Max:  2m 21s
      🟩 GCC11              Pass: 100%/4   | Total:  8m 21s | Avg:  2m 05s | Max:  2m 21s
      🟩 GCC12              Pass: 100%/12  | Total: 30m 45s | Avg:  2m 33s | Max:  4m 14s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 49s | Avg:  2m 49s | Max:  2m 49s
      🟥 MSVC14.36          Pass:   0%/1   | Total:  8m 18s | Avg:  8m 18s | Max:  8m 18s
      🟥 MSVC14.39          Pass:   0%/1   | Total: 10m 30s | Avg: 10m 30s | Max: 10m 30s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  96%/55  | Total:  2h 25m | Avg:  2m 38s | Max: 10m 30s
    🟨 gpu
      🟨 v100               Pass:  96%/55  | Total:  2h 25m | Avg:  2m 38s | Max: 10m 30s
    🟨 ctk
      🟨 12.0               Pass:  95%/23  | Total:  1h 00m | Avg:  2m 38s | Max:  8m 18s
      🟨 12.5               Pass:  96%/32  | Total:  1h 25m | Avg:  2m 39s | Max: 10m 30s
    🟨 cudacxx
      🟨 nvcc12.0           Pass:  95%/23  | Total:  1h 00m | Avg:  2m 38s | Max:  8m 18s
      🟨 nvcc12.5           Pass:  96%/32  | Total:  1h 25m | Avg:  2m 39s | Max: 10m 30s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  1m 55s | Avg:  1m 55s | Max:  1m 55s
      🟩 90a                Pass: 100%/1   | Total:  1m 56s | Avg:  1m 56s | Max:  1m 56s
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 31m 57s: Pass: 100%/56 | Total: 2h 50m | Avg: 3m 02s | Max: 13m 39s | Hits: 71%/106
  • 🟩 cudax: Pass: 100%/55 | Total: 2h 36m | Avg: 2m 50s | Max: 10m 37s | Hits: 71%/106

    🟩 cpu
      🟩 amd64              Pass: 100%/51  | Total:  2h 27m | Avg:  2m 53s | Max: 10m 37s | Hits:  71%/106   
      🟩 arm64              Pass: 100%/4   | Total:  8m 54s | Avg:  2m 13s | Max:  2m 53s
    🟩 ctk
      🟩 12.0               Pass: 100%/23  | Total:  1h 08m | Avg:  2m 59s | Max: 10m 16s | Hits:  71%/53    
      🟩 12.5               Pass: 100%/32  | Total:  1h 27m | Avg:  2m 43s | Max: 10m 37s | Hits:  71%/53    
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/23  | Total:  1h 08m | Avg:  2m 59s | Max: 10m 16s | Hits:  71%/53    
      🟩 nvcc12.5           Pass: 100%/32  | Total:  1h 27m | Avg:  2m 43s | Max: 10m 37s | Hits:  71%/53    
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/55  | Total:  2h 36m | Avg:  2m 50s | Max: 10m 37s | Hits:  71%/106   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  4m 46s | Avg:  2m 23s | Max:  2m 37s
      🟩 Clang10            Pass: 100%/2   | Total:  4m 45s | Avg:  2m 22s | Max:  2m 41s
      🟩 Clang11            Pass: 100%/4   | Total:  8m 33s | Avg:  2m 08s | Max:  2m 20s
      🟩 Clang12            Pass: 100%/4   | Total:  8m 43s | Avg:  2m 10s | Max:  2m 17s
      🟩 Clang13            Pass: 100%/4   | Total:  8m 51s | Avg:  2m 12s | Max:  2m 31s
      🟩 Clang14            Pass: 100%/6   | Total: 20m 18s | Avg:  3m 23s | Max:  5m 54s
      🟩 Clang15            Pass: 100%/2   | Total:  4m 31s | Avg:  2m 15s | Max:  2m 17s
      🟩 Clang16            Pass: 100%/6   | Total: 17m 25s | Avg:  2m 54s | Max:  4m 15s
      🟩 GCC9               Pass: 100%/2   | Total:  3m 59s | Avg:  1m 59s | Max:  2m 11s
      🟩 GCC10              Pass: 100%/4   | Total:  8m 48s | Avg:  2m 12s | Max:  2m 29s
      🟩 GCC11              Pass: 100%/4   | Total:  8m 21s | Avg:  2m 05s | Max:  2m 31s
      🟩 GCC12              Pass: 100%/12  | Total: 33m 41s | Avg:  2m 48s | Max:  4m 49s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  2m 47s | Avg:  2m 47s | Max:  2m 47s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 16s | Avg: 10m 16s | Max: 10m 16s | Hits:  71%/53    
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 37s | Avg: 10m 37s | Max: 10m 37s | Hits:  71%/53    
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  1h 17m | Avg:  2m 35s | Max:  5m 54s
      🟩 GCC                Pass: 100%/22  | Total: 54m 49s | Avg:  2m 29s | Max:  4m 49s
      🟩 Intel              Pass: 100%/1   | Total:  2m 47s | Avg:  2m 47s | Max:  2m 47s
      🟩 MSVC               Pass: 100%/2   | Total: 20m 53s | Avg: 10m 26s | Max: 10m 37s | Hits:  71%/106   
    🟩 gpu
      🟩 v100               Pass: 100%/55  | Total:  2h 36m | Avg:  2m 50s | Max: 10m 37s | Hits:  71%/106   
    🟩 jobs
      🟩 Build              Pass: 100%/47  | Total:  2h 00m | Avg:  2m 33s | Max: 10m 37s | Hits:  71%/106   
      🟩 Test               Pass: 100%/8   | Total: 36m 07s | Avg:  4m 30s | Max:  5m 54s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
      🟩 90a                Pass: 100%/1   | Total:  2m 20s | Avg:  2m 20s | Max:  2m 20s
    🟩 std
      🟩 17                 Pass: 100%/31  | Total:  1h 18m | Avg:  2m 31s | Max:  5m 38s
      🟩 20                 Pass: 100%/24  | Total:  1h 17m | Avg:  3m 14s | Max: 10m 37s | Hits:  71%/106   
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 13m 39s | Avg: 13m 39s | Max: 13m 39s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
pycuda

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
+/- pycuda

🏃‍ Runner counts (total jobs: 56)

# Runner
41 linux-amd64-cpu16
9 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Next Feature intended for the Cuda Next experimental library
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

cuda::launch kernel-launch API Standard abstraction for specifying thread grid hierarchy and dimensions
3 participants