-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add meta dimensions specifiers to cudax::launch #2001
base: main
Are you sure you want to change the base?
Conversation
This also adds support for empty queries needed for the above
🟨 CI finished in 11m 16s: Pass: 92%/56 | Total: 2h 54m | Avg: 3m 06s | Max: 10m 56s | Hits: 70%/1581
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 11m 47s: Pass: 96%/56 | Total: 2h 44m | Avg: 2m 56s | Max: 11m 29s | Hits: 83%/1643
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
template <typename Dims> | ||
struct dimensions_handler | ||
struct dimensions_handler : public base_dimensions_handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: struct
base classes are public
by default.
struct dimensions_handler : public base_dimensions_handler | |
struct dimensions_handler : base_dimensions_handler |
I don't know why this triggers me enough to write a comment. Feel free to ignore!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to prefer explicitness about these things. Makes the code more accessible to less experienced contributors.
cudax/include/cuda/experimental/__hierarchy/level_dimensions.cuh
Outdated
Show resolved
Hide resolved
template <typename Dims> | ||
inline constexpr bool usable_for_queries = false; | ||
|
||
template <typename T, size_t... Extents> | ||
inline constexpr bool usable_for_queries<dimensions<T, Extents...>> = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: I love variable templates as traits instead of structs. They are shorter and more to the point. They require C++14 though, which is why we don't see many of those around here.
🟨 CI finished in 11m 56s: Pass: 96%/56 | Total: 2h 48m | Avg: 3m 00s | Max: 11m 56s | Hits: 83%/1643
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟩 CI finished in 16m 11s: Pass: 100%/56 | Total: 2h 52m | Avg: 3m 04s | Max: 12m 31s | Hits: 93%/1693
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
Co-authored-by: Bernhard Manfred Gruber <[email protected]>
21ba25f
to
1b14e37
Compare
🟩 CI finished in 10m 33s: Pass: 100%/56 | Total: 2h 34m | Avg: 2m 45s | Max: 10m 33s | Hits: 88%/2848
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
* a largest size that still allows full occupancy. | ||
* This type is usable only to describe dimensions at block level | ||
*/ | ||
struct best_occupancy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_occupancy
* @param kernel | ||
* Kernel functor that the configuration are intended for | ||
*/ | ||
# ifdef _MSC_VER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CCCL macro here
🟨 CI finished in 30m 19s: Pass: 96%/56 | Total: 2h 39m | Avg: 2m 51s | Max: 12m 44s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟨 CI finished in 22m 22s: Pass: 96%/56 | Total: 2h 37m | Avg: 2m 48s | Max: 11m 29s
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
🟩 CI finished in 31m 57s: Pass: 100%/56 | Total: 2h 50m | Avg: 3m 02s | Max: 13m 39s | Hits: 71%/106
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 56)
# | Runner |
---|---|
41 | linux-amd64-cpu16 |
9 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
This change adds new types to describe dimensions in a hierarchy, called meta dimensions. These types are not holding specific values and instead communicate a specific intent. Dimensions built using them can be later finalized to replace the meta dimensions with specific values calculated for a specific device and kernel function. The finalization is automatic inside cudax::launch, but can also be done manually with finalize function, if the calculated values are needed ahead of time, for example to scale some buffers passed into launch.
Last piece provided is finalized_t type alias template that allows to get the type of the finalized hierarchy without calling finalize.
All of the above also works on kernel_config type, in which case it just operates on the hierarchy contained in the configuration.
This PR also adds support for hierarchy queries with the same level being the unit, in which case extents<1, 1, 1> or just 1 is returned.
While working on this change I noticed there is an issue with is_invocable and extended lambda in how it's used in cudax::launch to detect if the lambda takes hierarchy/configuration as the first argument. Because of that issue, right now launch forces the extended lambda to accept it as the first argument until a better solution is found.
There are couple of smaller TODOs around meta dimensions like adding some safety to make sure someone passes the same function to finalize and launch when using the finalize or deciding if using finalized_t for kernel instantiation is always mandatory or only when using meta dimensions.