Add `cuda::ptx::*` namespace #574

ahendriksen · 2023-10-17T15:06:48Z

Description

Add cuda/ptx header and cuda::ptx namespace .

closes #659

Add the header cuda/ptx
Add the namespace cuda::ptx
Add a single PTX wrapper for mbarrier.arrive.expect_tx to demonstrate implementation, testing, documentation, and internal use by higher-level libcu++ APIs.

Summary of intent:

Provide stable PTX wrapping API that can be used by libcu++ internal code, and included using #include <cuda/ptx> by outside users.
Expose PTX instructions according to the following priorities: (1) instructions that expose new hardware features, (2) instructions that are not (yet) exposed through other APIs, or (3) instructions that are useful for power users.
The PTX instructions are tested to assemble correctly (that is: they do not lead to compilation errors)
The documentation of the PTX instruction is shallow. The PTX ISA documentation is linked for each instruction, and the CTK/CCCL version that the instruction was introduced is noted. Usage or other documentation is optional.
The exposure uses C++ language features like type-level parameters to ensure the scalable exposure of PTX instructions variants, introduced by .sem, .space, .scope, .op and others.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

miscco

I really like this idea. Thanks for bringing it forward.

What I am not 100% sure is if we really want to put this in libcu++ or whether we want to make this its completely own subproject, that just inherits some of the common machinery.

The advantage would be that we would be "separated" from CTK releases which would make it a bit more explicit that we are not providing any guarantees here

libcudacxx/docs/extended_api/ptx.md

libcudacxx/include/cuda/ptx

Co-authored-by: Michael Schellenberger Costa <[email protected]>

ahendriksen

Thanks for your comments. I have left responses inline.

libcudacxx/include/cuda/ptx

libcudacxx/.upstream-tests/test/cuda/ptx/mbarrier_arrive_tx.pass.cpp

libcudacxx/docs/extended_api/ptx.md

libcudacxx/include/cuda/ptx

Co-authored-by: Michael Schellenberger Costa <[email protected]>

ahendriksen

Thanks for the comments. Many have been resolved. I replied inline to the open discussions.

libcudacxx/.upstream-tests/test/cuda/ptx/mbarrier_arrive_tx.pass.cpp

libcudacxx/include/cuda/ptx

leofang · 2023-10-21T03:52:11Z

This is amazing work. Whatever the team decides, I am happy to see the fruition.

For analogy, this is like CUDA Python providing low-level Python bindings for CUDA C APIs, so that Python users can access CUDA from Python. We are witnessing the advent of low-level C++ binding for PTX APIs (so that C++ users can access CUDA from C++, without writing inline PTX)!

fduguet-nv · 2023-10-24T13:47:16Z

This is a great initiative.
I think it would be very helpful to get such intrinsic style functions to access ptx instructions, especially on the memory model (e.g. ld.acquire.xxx). I would definitely use this API as I find using inline ptx is error-prone and cumbersome.
Also, I tend to believe that this thin layer allows updates in the code in the future at this header-only library level, without requiring us developers to update code at the pace GPU architecture would evolve.

The test would previously not fail when invalid ptx was present. Fixed now.

libcudacxx/include/cuda/ptx

...libcxx/include/__cuda/ptx/parallel_synchronization_and_communication_instructions_mbarrier.h

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_dot_variants.h

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_dot_variants.h

...libcxx/include/__cuda/ptx/parallel_synchronization_and_communication_instructions_mbarrier.h

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_isa_target_macros.h

Use the original spellings as in PTX ISA 70 and 78 and also expose in C++ as such.

ahendriksen · 2023-11-02T17:08:15Z

I think this PR is ready to merge. All comments have been incorporated. In addition, I have made some small improvements after using the API in barrier.h:

Rename space_shared_cluster to space_cluster. The original name was very close to the PTX naming scheme of shared::cluster, but was unnecessarily long.
Remove unlikely to be used state spaces, like const, sreg, reg etc.
Stay even closer to the original PTX exposure by exposing the initial short instructions and also the newer longer variants. The mapping from PTX ISA (CTK 11.0, 11.8, 12.0) to cuda::ptx exposure is documented in the comments of cuda::ptx::mbarrier_arrive.

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h

libcudacxx/docs/extended_api/ptx.md

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h

jrhemstad · 2023-11-03T21:19:34Z

/backport

github-actions · 2023-11-03T21:19:53Z

Backport failed for branch/2.3.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin branch/2.3.x
git worktree add -d .worktree/backport-574-to-branch/2.3.x origin/branch/2.3.x
cd .worktree/backport-574-to-branch/2.3.x
git checkout -b backport-574-to-branch/2.3.x
ancref=$(git merge-base 83b3365bbc12a9db248b67c22052413d41fae97e 9e9fb70b712a6799791997d3c70db6a28a71af72)
git cherry-pick -x $ancref..9e9fb70b712a6799791997d3c70db6a28a71af72

* Add `cuda::ptx::*` namespace (#574) * fixup `___CUDA_VPTX` -> `_CUDA_VPTX` (#664) * fixup `___CUDA_VPTX` -> `_CUDA_VPTX` * Fix warning for unused variable in branches that are constexpr disabled. --------- Co-authored-by: Allard Hendriksen <[email protected]> Co-authored-by: Wesley Maxey <[email protected]>

ahendriksen added 3 commits October 17, 2023 16:27

Initial proof-of-concept for PTX header

16ad54a

Add docs

9b31cc8

Reformat docs

229704a

miscco reviewed Oct 17, 2023

View reviewed changes

ahendriksen and others added 2 commits October 17, 2023 18:26

Use PTX wrapper in internal code

dad93de

Apply suggestions from code review

220d475

Co-authored-by: Michael Schellenberger Costa <[email protected]>

ahendriksen commented Oct 18, 2023

View reviewed changes

Address review comments

ae1a084

miscco reviewed Oct 18, 2023

View reviewed changes

Apply suggestions from code review

ecbb6fe

Co-authored-by: Michael Schellenberger Costa <[email protected]>

ahendriksen commented Oct 18, 2023

View reviewed changes

libcudacxx/.upstream-tests/test/cuda/ptx/mbarrier_arrive_tx.pass.cpp Outdated Show resolved Hide resolved

libcudacxx/.upstream-tests/test/cuda/ptx/mbarrier_arrive_tx.pass.cpp Outdated Show resolved Hide resolved

libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved

Address review comments

cf19e53

miscco and others added 7 commits October 25, 2023 11:58

Merge branch 'main' into pr/ahendriksen/574

b159338

Fix typo

1d57b02

Add targeting macros and a few more helper functions

21050e8

Add PTX ISA 8.3 macro

986d990

Improve code organization

82d1b85

Format code

e356271

Fix test and ifdefs

bb91eb7

The test would previously not fail when invalid ptx was present. Fixed now.

miscco reviewed Oct 25, 2023

View reviewed changes

ahendriksen and others added 8 commits October 25, 2023 14:35

Update ptx.md

b514e2d

Use numerical PTX ISA/SM target macros

e351c79

Move bulk of ptx header into detail/ptx.h

9006317

Rename include guards

42710f9

Fix missing includes

4144d43

Remove redundant comment

8a609cd

Rename __as_smem_ptr -> __as_ptr_smem for disambiguation

6953ea0

Use uint32_t

eae5df6

ahendriksen changed the title ~~Add cuda::ptx::* proof-of-concept~~ Add cuda::ptx::* namespace Nov 2, 2023

ahendriksen added 4 commits November 2, 2023 11:50

Rename space_shared_cluster -> space_cluster

bd24265

Ensure PTX test is actually assembled

4f26aa2

Rename test

9555532

Stay closer to original PTX exposure

ffa1f30

Use the original spellings as in PTX ISA 70 and 78 and also expose in C++ as such.

jrhemstad requested review from gevtushenko, wmaxey, miscco and griwes November 2, 2023 17:20

miscco approved these changes Nov 2, 2023

View reviewed changes

miscco reviewed Nov 2, 2023

View reviewed changes

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h Outdated Show resolved Hide resolved

gevtushenko approved these changes Nov 2, 2023

View reviewed changes

libcudacxx/docs/extended_api/ptx.md Outdated Show resolved Hide resolved

libcudacxx/docs/extended_api/ptx.md Outdated Show resolved Hide resolved

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h Outdated Show resolved Hide resolved

jrhemstad added the backport branch/2.3.x For backporting to the 2.3.x release branch label Nov 2, 2023

miscco and others added 4 commits November 3, 2023 08:22

Merge branch 'main' into pr/ahendriksen/574

90df5a4

Address review feedback

8b03da3

Do not require set arch

614326b

Do not expose remote mbarrier arrive with .cta scope

9e9fb70

ahendriksen mentioned this pull request Nov 3, 2023

[FEA]: First iteration of SM90 PTX instruction exposure #609

Closed

1 task

miscco merged commit bea203d into NVIDIA:main Nov 3, 2023
519 checks passed

jrhemstad mentioned this pull request Nov 3, 2023

PTX wrappers for mbarrier.arrive variants #659

Closed

jrhemstad mentioned this pull request Nov 6, 2023

Backport PR574 to branch/2.3.x #662

Closed

miscco pushed a commit to miscco/cccl that referenced this pull request Nov 6, 2023

Add cuda::ptx::* namespace (NVIDIA#574)

e1bb33c

jrhemstad pushed a commit to jrhemstad/cccl that referenced this pull request Nov 6, 2023

Add cuda::ptx::* namespace (NVIDIA#574)

b99a626

jrhemstad mentioned this pull request Nov 6, 2023

PR commits are cherry-picked instead of the squashed commit korthout/backport-action#342

Closed

jrhemstad pushed a commit to miscco/cccl that referenced this pull request Nov 7, 2023

Add cuda::ptx::* namespace (NVIDIA#574)

ed292f4

jrhemstad removed the backport branch/2.3.x For backporting to the 2.3.x release branch label Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cuda::ptx::*` namespace #574

Add `cuda::ptx::*` namespace #574

ahendriksen commented Oct 17, 2023 •

edited by jrhemstad

Loading

miscco left a comment

ahendriksen left a comment

ahendriksen left a comment

leofang commented Oct 21, 2023 •

edited

Loading

fduguet-nv commented Oct 24, 2023

ahendriksen commented Nov 2, 2023

jrhemstad commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

Add cuda::ptx::* namespace #574

Add cuda::ptx::* namespace #574

Conversation

ahendriksen commented Oct 17, 2023 • edited by jrhemstad Loading

Description

Checklist

miscco left a comment

Choose a reason for hiding this comment

ahendriksen left a comment

Choose a reason for hiding this comment

ahendriksen left a comment

Choose a reason for hiding this comment

leofang commented Oct 21, 2023 • edited Loading

fduguet-nv commented Oct 24, 2023

ahendriksen commented Nov 2, 2023

jrhemstad commented Nov 3, 2023

github-actions bot commented Nov 3, 2023

Add `cuda::ptx::*` namespace #574

Add `cuda::ptx::*` namespace #574

ahendriksen commented Oct 17, 2023 •

edited by jrhemstad

Loading

leofang commented Oct 21, 2023 •

edited

Loading