Fix ptx usage to account for PTX ISA availability #1359

miscco · 2024-02-09T11:11:00Z

We encountered CI failures when trying to update rmm.

It turns out that it is indeed valid to use an older CTK to build on a new Hopper machine.

That means we cannot solely guard ptx on the available architectures, but also need to account for the avilability of PTX ISA.

To ensure that this information is globally available, we move the ptx_isa detection into __cccl_config and use the appropriate feature test macros in the code

ahendriksen

Looks good. Just one nit.

Overall comment/question on testability:

Does our use of __CUDA_MINIMUM_ARCH__ disable the features when compiled for multiple architectures (say SM80+SM90)?
Should we test this in CI with all major architectures? Or would our use of __CUDA_MINIMUM_ARCH__ defeat the purpose?

libcudacxx/include/cuda/std/detail/libcxx/include/__cccl/ptx_isa.h

ahendriksen · 2024-02-09T12:18:11Z

Wait.. We wouldn't be able to catch this anyway. I don't see 11.8 in the CI matrix?

miscco · 2024-02-09T12:52:40Z

Wait.. We wouldn't be able to catch this anyway. I don't see 11.8 in the CI matrix?

We would not, we need to expand out test matrix for that

ci/matrix.yaml

libcudacxx/include/cuda/std/detail/libcxx/include/__cccl/ptx_isa.h

ci/matrix.yaml

We want this to be globally available

miscco · 2024-02-16T13:50:42Z

I have decided to punt on the CI enhancements, as those are really icky to get right-

Also we most likely wont backport those, so just reducing this to a pure product PR seems fine

ahendriksen · 2024-02-16T15:49:27Z

libcudacxx/include/cuda/barrier

@@ -50,7 +50,7 @@ _LIBCUDACXX_BEGIN_NAMESPACE_CUDA_DEVICE_EXPERIMENTAL
 // capability 9.0 and above. The check for (!defined(__CUDA_MINIMUM_ARCH__)) is
 // necessary to prevent cudafe from ripping out the device functions before
 // device compilation begins.
-#if (!defined(__CUDA_MINIMUM_ARCH__)) || (defined(__CUDA_MINIMUM_ARCH__) && 900 <= __CUDA_MINIMUM_ARCH__)
+#ifdef __cccl_lib_experimental_ctk12_cp_async_exposure


The function below are not strictly speaking part of the experimental exposure, but the check for the feature is currently the same as the check for availability of cp.async.bulk would be. Not a blocker imho, just want to note this.

ahendriksen

Looks good. It makes sense to disable the mbarrier.expect_tx and cp.async.bulk tests on nvcc 11, as they aren't supported there.

I think the architecture conditional code is now properly guarded by both PTX ISA version and NV_IF_TARGET + linker error hack.

ahendriksen

I noticed on of the tests was failing. Suggested a fix.

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h

ahendriksen

Code looks good now.

ahendriksen · 2024-02-21T08:20:01Z

Can we merge this PR?

github-actions · 2024-02-21T08:51:04Z

Backport failed for branch/2.3.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin branch/2.3.x
git worktree add -d .worktree/backport-1359-to-branch/2.3.x origin/branch/2.3.x
cd .worktree/backport-1359-to-branch/2.3.x
git checkout -b backport-1359-to-branch/2.3.x
ancref=$(git merge-base c8dde0ec2e42573069b1add37dfb83c5fc7a1673 555ac64435e6f0d175a09c34c4bdad9fa0ead91d)
git cherry-pick -x $ancref..555ac64435e6f0d175a09c34c4bdad9fa0ead91d

Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine. Consequently we need to strengthen our checks against available PTX ISA

…#1421) * Fix ptx usage to account for PTX ISA availability (#1359) Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine. Consequently we need to strengthen our checks against available PTX ISA * Do not use VLAs in `cp_async_bulk_tensor_*` tests VLAs are a compiler extension and are correctly errored out by some compilers. As we always know the exact size of the array anyway just swtich to a `cuda::std::array` Fixes nvbug4476664 * Use proper shared memory size Authored-by: Allard Hendriksen <[email protected]> * Fix incorrect linker issue * Ensure runfail tests do not fail without execution * Ensure that __cccl_ptx_isa properly guards feature flags

miscco requested review from a team as code owners February 9, 2024 11:11

miscco requested review from wmaxey, ericniebler, ahendriksen, gevtushenko and jrhemstad February 9, 2024 11:11

miscco added libcu++ For all items related to libcu++ backport branch/2.3.x For backporting to the 2.3.x release branch bug: functional labels Feb 9, 2024

ahendriksen reviewed Feb 9, 2024

View reviewed changes

libcudacxx/include/cuda/std/detail/libcxx/include/__cccl/ptx_isa.h Show resolved Hide resolved

libcudacxx/include/cuda/std/detail/libcxx/include/__cccl/ptx_isa.h Show resolved Hide resolved

miscco requested a review from a team as a code owner February 9, 2024 13:02

miscco force-pushed the fix_ptx_isa_availability branch from c75420d to 4744c02 Compare February 9, 2024 13:04

jrhemstad reviewed Feb 9, 2024

View reviewed changes

ci/matrix.yaml Outdated Show resolved Hide resolved

miscco force-pushed the fix_ptx_isa_availability branch from 4744c02 to ea8a188 Compare February 9, 2024 16:35

jrhemstad reviewed Feb 9, 2024

View reviewed changes

ci/matrix.yaml Outdated Show resolved Hide resolved

miscco force-pushed the fix_ptx_isa_availability branch 5 times, most recently from 181060f to 1be7dee Compare February 12, 2024 10:23

miscco commented Feb 12, 2024

View reviewed changes

libcudacxx/include/cuda/std/detail/libcxx/include/__cccl/ptx_isa.h Outdated Show resolved Hide resolved

miscco commented Feb 12, 2024

View reviewed changes

ci/matrix.yaml Outdated Show resolved Hide resolved

miscco force-pushed the fix_ptx_isa_availability branch 4 times, most recently from 52af97b to 704364b Compare February 14, 2024 20:29

Move __cccl_ptx_isa into __cccl_config

33590ac

We want this to be globally available

miscco added 5 commits February 15, 2024 09:22

Future proof ptx isa detection

52e744d

Add __cccl_lib_cp_async_{bulk_}available convenience macros

41a2c34

Ensure that we can build barrier on old CTK and Hopper

820d670

Add comment on max ISA

00401b9

More test fixes

47f524a

miscco force-pushed the fix_ptx_isa_availability branch 7 times, most recently from 63a957e to 47f524a Compare February 16, 2024 13:49

miscco requested review from jrhemstad and ahendriksen February 16, 2024 13:50

ahendriksen reviewed Feb 16, 2024

View reviewed changes

ahendriksen approved these changes Feb 16, 2024

View reviewed changes

Do not introduce linker error when we there is a fallback

c5be115

ahendriksen requested changes Feb 16, 2024

View reviewed changes

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h Outdated Show resolved Hide resolved

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h Show resolved Hide resolved

ahendriksen approved these changes Feb 16, 2024

View reviewed changes

Merge branch 'main' into fix_ptx_isa_availability

555ac64

gevtushenko approved these changes Feb 20, 2024

View reviewed changes

ahendriksen mentioned this pull request Feb 20, 2024

PTX: Add cuda::ptx:cp_async_bulk_* #1403

Merged

2 tasks

miscco merged commit f6903bf into NVIDIA:main Feb 21, 2024
538 checks passed

miscco deleted the fix_ptx_isa_availability branch February 21, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ptx usage to account for PTX ISA availability #1359

Fix ptx usage to account for PTX ISA availability #1359

miscco commented Feb 9, 2024 •

edited

Loading

ahendriksen left a comment

ahendriksen commented Feb 9, 2024

miscco commented Feb 9, 2024

miscco commented Feb 16, 2024

ahendriksen Feb 16, 2024

ahendriksen left a comment

ahendriksen left a comment

ahendriksen left a comment

ahendriksen commented Feb 21, 2024

github-actions bot commented Feb 21, 2024

Fix ptx usage to account for PTX ISA availability #1359

Fix ptx usage to account for PTX ISA availability #1359

Conversation

miscco commented Feb 9, 2024 • edited Loading

ahendriksen left a comment

Choose a reason for hiding this comment

ahendriksen commented Feb 9, 2024

miscco commented Feb 9, 2024

miscco commented Feb 16, 2024

ahendriksen Feb 16, 2024

Choose a reason for hiding this comment

ahendriksen left a comment

Choose a reason for hiding this comment

ahendriksen left a comment

Choose a reason for hiding this comment

ahendriksen left a comment

Choose a reason for hiding this comment

ahendriksen commented Feb 21, 2024

github-actions bot commented Feb 21, 2024

miscco commented Feb 9, 2024 •

edited

Loading