This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 447
Update CDP support macros for if-target compatibility #486
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
alliepiper
added
the
P0: must have
Absolutely necessary. Critical issue, major blocker, etc.
label
May 18, 2022
alliepiper
added
the
release: breaking change
Include in "Breaking Changes" section of release notes.
label
May 18, 2022
alliepiper
changed the title
WIP if-target CDP refactor
Update CDP support macros for if-target compatibility
May 19, 2022
alliepiper
added
the
release: notes
PR description contains pre-written release notes.
label
May 19, 2022
gevtushenko
approved these changes
May 24, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a significant improvement and a huge work. Thank you for your efforts! There are but a few minor comments below.
The definition of CUB_RUNTIME_FUNCTION is modified to be consistently defined across all NVCC compilation passes. Rather than depending on __CUDA_ARCH__, its definition now only depends on whether or not RDC is enabled. A new CUB_RDC_ENABLED macro has been added as a replacement for CUB_RUNTIME_ENABLED. CUB_RUNTIME_ENABLED also depends on the definition of __CUDA_ARCH__ and should no longer be used in CUB code. It is provided for legacy purposes only. Usages should be replaced by NV_IF_TARGET and the new CUB_RDC_ENABLED macro. To summarize the differences between the old and new macros: - `CUB_RUNTIME_FUNCTION` - Old: - RDC enabled: Defined to `__host__ __device__`. - RDC not enabled: - NVCC host pass: Defined to `__host__ __device__`. - NVCC device pass: Defined to `__host__`. - New: - RDC enabled: Defined to `__host__ __device__`. - RDC not enabled: Defined to `__host__`. - `CUB_*_ENABLED` - `RUNTIME` (old) - RDC enabled: Macro is defined. - RDC not enabled: - NVCC host pass: Macro is defined. - NVCC device pass: Macro is not defined. - `RDC` (new) - RDC enabled: Macro is defined. - RDC not enabled: Macro is not defined. The most common pattern used with the old macros looked like this: ``` CUB_RUNTIME_FUNCTION cudaError_t SomeFunction() { return cudaErrorNotSupported; // Actual implementation here return cudaSuccess; } ``` This pattern can now be simplified to: ``` CUB_RUNTIME_FUNCTION cudaError_t SomeFunction() { // Actual implementation here return cudaSuccess; } ```
alliepiper
force-pushed
the
if_target_cdp
branch
from
June 28, 2022 21:06
a314434
to
d825c47
Compare
Some test already had some support for CDP testing, this commit uses the %PARAM% system to enable these. The radix sort tests saw some addition changes that remove some excessive testing. More work could be done here.
`SyncStream` should continue to be used for required synchronizations. `DebugSyncStream` should only be used for sync that are non-essential and only used to handle the `debug_synchronous` flag of the device algorithms.
alliepiper
force-pushed
the
if_target_cdp
branch
from
June 28, 2022 23:40
d825c47
to
da7f771
Compare
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
P0: must have
Absolutely necessary. Critical issue, major blocker, etc.
release: breaking change
Include in "Breaking Changes" section of release notes.
release: notes
PR description contains pre-written release notes.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goes with NVIDIA/thrust#1661. Updates the CDP/CUDART macros to be consistent across host/device passes and thus compatible with NV_IF_TARGET.
All device algorithm tests with existing CDP tests have been updated to exercise this functionality. Some device algorithm tests do not have CDP tests and will need to be updated in a future PR.
Pre-written Release Notes
Breaking Changes
NV_IF_TARGET
. They are now defined consistently across all host/device compilation passes. This should not affect most usages of these macros, but may require changes for some edge cases.CUB_RUNTIME_FUNCTION
: Execution space annotations for functions that invoke CUDA Runtime APIs.__host__ __device__
__host__ __device__
__host__
__host__ __device__
__host__
CUB_RUNTIME_ENABLED
: No change in behavior, but no longer used in CUB. Provided for legacy support only. Legacy behavior:CUB_RDC_ENABLED
: New macro, may be combined withNV_IF_TARGET
to replace most usages ofCUB_RUNTIME_ENABLED
. Behavior:Other Enhancements
debug_synchronous
flag in the device algorithm layer now print a log message.