Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuda::ptx::* namespace #574

Merged
merged 52 commits into from
Nov 3, 2023
Merged

Conversation

ahendriksen
Copy link
Contributor

@ahendriksen ahendriksen commented Oct 17, 2023

Description

Add cuda/ptx header and cuda::ptx namespace .

closes #659

Summary of intent:

  • Provide stable PTX wrapping API that can be used by libcu++ internal code, and included using #include <cuda/ptx> by outside users.
  • Expose PTX instructions according to the following priorities: (1) instructions that expose new hardware features, (2) instructions that are not (yet) exposed through other APIs, or (3) instructions that are useful for power users.
  • The PTX instructions are tested to assemble correctly (that is: they do not lead to compilation errors)
  • The documentation of the PTX instruction is shallow. The PTX ISA documentation is linked for each instruction, and the CTK/CCCL version that the instruction was introduced is noted. Usage or other documentation is optional.
  • The exposure uses C++ language features like type-level parameters to ensure the scalable exposure of PTX instructions variants, introduced by .sem, .space, .scope, .op and others.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea. Thanks for bringing it forward.

What I am not 100% sure is if we really want to put this in libcu++ or whether we want to make this its completely own subproject, that just inherits some of the common machinery.

The advantage would be that we would be "separated" from CTK releases which would make it a bit more explicit that we are not providing any guarantees here

libcudacxx/docs/extended_api/ptx.md Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments. I have left responses inline.

libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
libcudacxx/docs/extended_api/ptx.md Outdated Show resolved Hide resolved
libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
Co-authored-by: Michael Schellenberger Costa <[email protected]>
Copy link
Contributor Author

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. Many have been resolved. I replied inline to the open discussions.

libcudacxx/include/cuda/ptx Outdated Show resolved Hide resolved
@leofang
Copy link
Member

leofang commented Oct 21, 2023

This is amazing work. Whatever the team decides, I am happy to see the fruition.

For analogy, this is like CUDA Python providing low-level Python bindings for CUDA C APIs, so that Python users can access CUDA from Python. We are witnessing the advent of low-level C++ binding for PTX APIs (so that C++ users can access CUDA from C++, without writing inline PTX)!

@fduguet-nv
Copy link

This is a great initiative.
I think it would be very helpful to get such intrinsic style functions to access ptx instructions, especially on the memory model (e.g. ld.acquire.xxx). I would definitely use this API as I find using inline ptx is error-prone and cumbersome.
Also, I tend to believe that this thin layer allows updates in the code in the future at this header-only library level, without requiring us developers to update code at the pace GPU architecture would evolve.

@ahendriksen ahendriksen changed the title Add cuda::ptx::* proof-of-concept Add cuda::ptx::* namespace Nov 2, 2023
@ahendriksen
Copy link
Contributor Author

I think this PR is ready to merge. All comments have been incorporated. In addition, I have made some small improvements after using the API in barrier.h:

  • Rename space_shared_cluster to space_cluster. The original name was very close to the PTX naming scheme of shared::cluster, but was unnecessarily long.
  • Remove unlikely to be used state spaces, like const, sreg, reg etc.
  • Stay even closer to the original PTX exposure by exposing the initial short instructions and also the newer longer variants. The mapping from PTX ISA (CTK 11.0, 11.8, 12.0) to cuda::ptx exposure is documented in the comments of cuda::ptx::mbarrier_arrive.

libcudacxx/docs/extended_api/ptx.md Outdated Show resolved Hide resolved
libcudacxx/docs/extended_api/ptx.md Outdated Show resolved Hide resolved
@jrhemstad jrhemstad added the backport branch/2.3.x For backporting to the 2.3.x release branch label Nov 2, 2023
@miscco miscco merged commit bea203d into NVIDIA:main Nov 3, 2023
519 checks passed
@jrhemstad
Copy link
Collaborator

/backport

Copy link
Contributor

github-actions bot commented Nov 3, 2023

Backport failed for branch/2.3.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin branch/2.3.x
git worktree add -d .worktree/backport-574-to-branch/2.3.x origin/branch/2.3.x
cd .worktree/backport-574-to-branch/2.3.x
git checkout -b backport-574-to-branch/2.3.x
ancref=$(git merge-base 83b3365bbc12a9db248b67c22052413d41fae97e 9e9fb70b712a6799791997d3c70db6a28a71af72)
git cherry-pick -x $ancref..9e9fb70b712a6799791997d3c70db6a28a71af72

miscco pushed a commit to miscco/cccl that referenced this pull request Nov 6, 2023
jrhemstad pushed a commit to jrhemstad/cccl that referenced this pull request Nov 6, 2023
jrhemstad pushed a commit to miscco/cccl that referenced this pull request Nov 7, 2023
jrhemstad pushed a commit that referenced this pull request Nov 8, 2023
* Add `cuda::ptx::*` namespace (#574)

* fixup `___CUDA_VPTX` -> `_CUDA_VPTX` (#664)

* fixup `___CUDA_VPTX` -> `_CUDA_VPTX`

* Fix warning for unused variable in branches that are constexpr disabled.

---------

Co-authored-by: Allard Hendriksen <[email protected]>
Co-authored-by: Wesley Maxey <[email protected]>
@jrhemstad jrhemstad removed the backport branch/2.3.x For backporting to the 2.3.x release branch label Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

PTX wrappers for mbarrier.arrive variants
7 participants