Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMake CUDA features #9677

Merged
merged 5 commits into from
Oct 20, 2023
Merged

CMake CUDA features #9677

merged 5 commits into from
Oct 20, 2023

Conversation

chuckatkins
Copy link
Contributor

@chuckatkins chuckatkins commented Oct 14, 2023

This adds a few useful features to the CMake code wrt how it interacts with CUDA:

  1. Remove old policies and conditional branches for CMAKE_VERSION < 3.18
  2. Allow the builtin CMAKE_CUDA_ARCHITECTURES variable to be used instead of GPU_COMPUTE_VER, falling back to GPU_COMPUTE_VER if CMake is too old or CMAKE_CUDA_ARCHITECTURES is not specified.
  3. Add a USE_CUDA_LTO option to enable device-code link-time-optimization.
  4. Allow CUDA_HOST_COMPILER and CUDA_RUNTIME_LIBRARY to be user-overridden.

The above features are implemented in such a way to preserve existing behavior when they are not specified or enabled.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on modernizing the CMake code! The CMAKE_CUDA_ARCHITECTURES change looks good to me, and I agree the LTO might be useful. But I have a couple questions about related changes in comments.

cmake/Utils.cmake Outdated Show resolved Hide resolved
cmake/Utils.cmake Show resolved Hide resolved
@chuckatkins
Copy link
Contributor Author

@trivialfis I think this is good to go now

@trivialfis
Copy link
Member

Let me take another look tomorrow, thank you for the patience!

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
cmake/Utils.cmake Outdated Show resolved Hide resolved
cmake/Utils.cmake Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@trivialfis
Copy link
Member

Hi, could you please take a look into the errors in the CI?

@chuckatkins
Copy link
Contributor Author

Hi, could you please take a look into the errors in the CI?

Looks like tests/ci_build/prune_libnccl.sh is using some of the modified bits that needed to be changed. Should be fixed now.

@chuckatkins
Copy link
Contributor Author

Hi, could you please take a look into the errors in the CI?

Looks like tests/ci_build/prune_libnccl.sh is using some of the modified bits that needed to be changed. Should be fixed now.

@trivialfis @robertmaynard coincidentally, in fixing this I think this might have come a cross a bug in nvprune and it's argumet parsing. Initially the changes to compute CMAKE_CUDA_ARCHITECTURES would generate something like 60-real;70-real;80 to generate cubin for 60, 70, and 80 and ptx for 80. CMake translates this to nvcc flags --generate-code=arch=compute_60,code=[sm_60] --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80]. According to nvprune --help it's supposed to accept the same --generate-code code flags as nvcc but it chokes on the last --generate-code=arch=compute_80,code=[compute_80,sm_80] with multiple elements in the code= portion. Adjusting the CMake code to instead compute 60-real;70-real;80-real;80-virtual changes the nvcc flags for 80 to --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] which is effectively the same thing but parsed by nvprune without issue.

@robertmaynard
Copy link
Contributor

Hi, could you please take a look into the errors in the CI?

Looks like tests/ci_build/prune_libnccl.sh is using some of the modified bits that needed to be changed. Should be fixed now.

@trivialfis @robertmaynard coincidentally, in fixing this I think this might have come a cross a bug in nvprune and it's argumet parsing. Initially the changes to compute CMAKE_CUDA_ARCHITECTURES would generate something like 60-real;70-real;80 to generate cubin for 60, 70, and 80 and ptx for 80. CMake translates this to nvcc flags --generate-code=arch=compute_60,code=[sm_60] --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80]. According to nvprune --help it's supposed to accept the same --generate-code code flags as nvcc but it chokes on the last --generate-code=arch=compute_80,code=[compute_80,sm_80] with multiple elements in the code= portion. Adjusting the CMake code to instead compute 60-real;70-real;80-real;80-virtual changes the nvcc flags for 80 to --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] which is effectively the same thing but parsed by nvprune without issue.

I will file a bug on nvprune about this, it does look like a regression of support

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your excellent work on enabling LTO and removing the old GPU arch!

@trivialfis trivialfis merged commit 83cdf14 into dmlc:master Oct 20, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants