Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ptx usage to account for PTX ISA availability #1359

Merged
merged 9 commits into from
Feb 21, 2024

Conversation

miscco
Copy link
Collaborator

@miscco miscco commented Feb 9, 2024

We encountered CI failures when trying to update rmm.

It turns out that it is indeed valid to use an older CTK to build on a new Hopper machine.

That means we cannot solely guard ptx on the available architectures, but also need to account for the avilability of PTX ISA.

To ensure that this information is globally available, we move the ptx_isa detection into __cccl_config and use the appropriate feature test macros in the code

@miscco miscco requested review from a team as code owners February 9, 2024 11:11
@miscco miscco added libcu++ For all items related to libcu++ backport branch/2.3.x For backporting to the 2.3.x release branch bug: functional labels Feb 9, 2024
Copy link
Contributor

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one nit.

Overall comment/question on testability:

  • Does our use of __CUDA_MINIMUM_ARCH__ disable the features when compiled for multiple architectures (say SM80+SM90)?
  • Should we test this in CI with all major architectures? Or would our use of __CUDA_MINIMUM_ARCH__ defeat the purpose?

@ahendriksen
Copy link
Contributor

Wait.. We wouldn't be able to catch this anyway. I don't see 11.8 in the CI matrix?

@miscco
Copy link
Collaborator Author

miscco commented Feb 9, 2024

Wait.. We wouldn't be able to catch this anyway. I don't see 11.8 in the CI matrix?

We would not, we need to expand out test matrix for that

@miscco miscco requested a review from a team as a code owner February 9, 2024 13:02
ci/matrix.yaml Outdated Show resolved Hide resolved
ci/matrix.yaml Outdated Show resolved Hide resolved
@miscco miscco force-pushed the fix_ptx_isa_availability branch 5 times, most recently from 181060f to 1be7dee Compare February 12, 2024 10:23
ci/matrix.yaml Outdated Show resolved Hide resolved
@miscco miscco force-pushed the fix_ptx_isa_availability branch 4 times, most recently from 52af97b to 704364b Compare February 14, 2024 20:29
We want this to be globally available
@miscco miscco force-pushed the fix_ptx_isa_availability branch 7 times, most recently from 63a957e to 47f524a Compare February 16, 2024 13:49
@miscco
Copy link
Collaborator Author

miscco commented Feb 16, 2024

I have decided to punt on the CI enhancements, as those are really icky to get right-

Also we most likely wont backport those, so just reducing this to a pure product PR seems fine

@@ -50,7 +50,7 @@ _LIBCUDACXX_BEGIN_NAMESPACE_CUDA_DEVICE_EXPERIMENTAL
// capability 9.0 and above. The check for (!defined(__CUDA_MINIMUM_ARCH__)) is
// necessary to prevent cudafe from ripping out the device functions before
// device compilation begins.
#if (!defined(__CUDA_MINIMUM_ARCH__)) || (defined(__CUDA_MINIMUM_ARCH__) && 900 <= __CUDA_MINIMUM_ARCH__)
#ifdef __cccl_lib_experimental_ctk12_cp_async_exposure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function below are not strictly speaking part of the experimental exposure, but the check for the feature is currently the same as the check for availability of cp.async.bulk would be. Not a blocker imho, just want to note this.

Copy link
Contributor

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. It makes sense to disable the mbarrier.expect_tx and cp.async.bulk tests on nvcc 11, as they aren't supported there.

I think the architecture conditional code is now properly guarded by both PTX ISA version and NV_IF_TARGET + linker error hack.

Copy link
Contributor

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed on of the tests was failing. Suggested a fix.

Copy link
Contributor

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good now.

@ahendriksen
Copy link
Contributor

Can we merge this PR?

@miscco miscco merged commit f6903bf into NVIDIA:main Feb 21, 2024
538 checks passed
@miscco miscco deleted the fix_ptx_isa_availability branch February 21, 2024 08:50
Copy link
Contributor

Backport failed for branch/2.3.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin branch/2.3.x
git worktree add -d .worktree/backport-1359-to-branch/2.3.x origin/branch/2.3.x
cd .worktree/backport-1359-to-branch/2.3.x
git checkout -b backport-1359-to-branch/2.3.x
ancref=$(git merge-base c8dde0ec2e42573069b1add37dfb83c5fc7a1673 555ac64435e6f0d175a09c34c4bdad9fa0ead91d)
git cherry-pick -x $ancref..555ac64435e6f0d175a09c34c4bdad9fa0ead91d

miscco added a commit to miscco/cccl that referenced this pull request Feb 21, 2024
Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine.

Consequently we need to strengthen our checks against available PTX ISA
miscco added a commit to miscco/cccl that referenced this pull request Feb 21, 2024
Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine.

Consequently we need to strengthen our checks against available PTX ISA
jrhemstad pushed a commit that referenced this pull request Feb 26, 2024
…#1421)

* Fix ptx usage to account for PTX ISA availability (#1359)

Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine.

Consequently we need to strengthen our checks against available PTX ISA

* Do not use VLAs in `cp_async_bulk_tensor_*` tests

VLAs are a compiler extension and are correctly errored out by some compilers. As we always know the exact size of the array anyway just swtich to a `cuda::std::array`

Fixes nvbug4476664

* Use proper shared memory size

Authored-by: Allard Hendriksen <[email protected]>

* Fix incorrect linker issue

* Ensure runfail tests do not fail without execution

* Ensure that __cccl_ptx_isa properly guards feature flags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport branch/2.3.x For backporting to the 2.3.x release branch bug: functional libcu++ For all items related to libcu++
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants