CMake CUDA features #9677

chuckatkins · 2023-10-14T21:37:58Z

This adds a few useful features to the CMake code wrt how it interacts with CUDA:

Remove old policies and conditional branches for CMAKE_VERSION < 3.18
Allow the builtin CMAKE_CUDA_ARCHITECTURES variable to be used instead of GPU_COMPUTE_VER, falling back to GPU_COMPUTE_VER if CMake is too old or CMAKE_CUDA_ARCHITECTURES is not specified.
Add a USE_CUDA_LTO option to enable device-code link-time-optimization.
Allow CUDA_HOST_COMPILER and CUDA_RUNTIME_LIBRARY to be user-overridden.

The above features are implemented in such a way to preserve existing behavior when they are not specified or enabled.

trivialfis

Thank you for working on modernizing the CMake code! The CMAKE_CUDA_ARCHITECTURES change looks good to me, and I agree the LTO might be useful. But I have a couple questions about related changes in comments.

cmake/Utils.cmake

chuckatkins · 2023-10-17T14:55:15Z

@trivialfis I think this is good to go now

trivialfis · 2023-10-17T16:40:50Z

Let me take another look tomorrow, thank you for the patience!

CMakeLists.txt

cmake/Utils.cmake

CMakeLists.txt

robertmaynard

Looks great!

trivialfis · 2023-10-19T10:18:18Z

Hi, could you please take a look into the errors in the CI?

chuckatkins · 2023-10-19T13:30:20Z

Hi, could you please take a look into the errors in the CI?

Looks like tests/ci_build/prune_libnccl.sh is using some of the modified bits that needed to be changed. Should be fixed now.

chuckatkins · 2023-10-19T13:48:58Z

Hi, could you please take a look into the errors in the CI?

Looks like tests/ci_build/prune_libnccl.sh is using some of the modified bits that needed to be changed. Should be fixed now.

@trivialfis @robertmaynard coincidentally, in fixing this I think this might have come a cross a bug in nvprune and it's argumet parsing. Initially the changes to compute CMAKE_CUDA_ARCHITECTURES would generate something like 60-real;70-real;80 to generate cubin for 60, 70, and 80 and ptx for 80. CMake translates this to nvcc flags --generate-code=arch=compute_60,code=[sm_60] --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80]. According to nvprune --help it's supposed to accept the same --generate-code code flags as nvcc but it chokes on the last --generate-code=arch=compute_80,code=[compute_80,sm_80] with multiple elements in the code= portion. Adjusting the CMake code to instead compute 60-real;70-real;80-real;80-virtual changes the nvcc flags for 80 to --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] which is effectively the same thing but parsed by nvprune without issue.

robertmaynard · 2023-10-19T15:00:30Z

Hi, could you please take a look into the errors in the CI?

Looks like tests/ci_build/prune_libnccl.sh is using some of the modified bits that needed to be changed. Should be fixed now.

@trivialfis @robertmaynard coincidentally, in fixing this I think this might have come a cross a bug in nvprune and it's argumet parsing. Initially the changes to compute CMAKE_CUDA_ARCHITECTURES would generate something like 60-real;70-real;80 to generate cubin for 60, 70, and 80 and ptx for 80. CMake translates this to nvcc flags --generate-code=arch=compute_60,code=[sm_60] --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80]. According to nvprune --help it's supposed to accept the same --generate-code code flags as nvcc but it chokes on the last --generate-code=arch=compute_80,code=[compute_80,sm_80] with multiple elements in the code= portion. Adjusting the CMake code to instead compute 60-real;70-real;80-real;80-virtual changes the nvcc flags for 80 to --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] which is effectively the same thing but parsed by nvprune without issue.

I will file a bug on nvprune about this, it does look like a regression of support

trivialfis

Thank you for your excellent work on enabling LTO and removing the old GPU arch!

chuckatkins force-pushed the cmake branch from 79952c6 to 344d53e Compare October 14, 2023 21:47

trivialfis reviewed Oct 16, 2023

View reviewed changes

cmake/Utils.cmake Outdated Show resolved Hide resolved

cmake/Utils.cmake Show resolved Hide resolved

chuckatkins force-pushed the cmake branch from 344d53e to 34b36fd Compare October 16, 2023 15:54

robertmaynard suggested changes Oct 17, 2023

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

cmake/Utils.cmake Outdated Show resolved Hide resolved

cmake/Utils.cmake Show resolved Hide resolved

chuckatkins force-pushed the cmake branch from 34b36fd to 7907081 Compare October 17, 2023 21:01

robertmaynard reviewed Oct 17, 2023

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

chuckatkins force-pushed the cmake branch from 7907081 to 5e78e0b Compare October 18, 2023 17:11

robertmaynard approved these changes Oct 18, 2023

View reviewed changes

chuckatkins added 5 commits October 19, 2023 09:26

cmake: Remove unnecessary policies

e960b67

cmake: Use CMAKE_CUDA_ARCHITECTURES if defiend

57386e1

cmake: Use CMAKE_CUDA_HOST_COMPILER if defined

2da2dd7

cmake: Use CMAKE_CUDA_RUNTIME_LIBRARY if defined

0b20f27

cmake: Add an option to enable CUDA device LTO

da64709

chuckatkins force-pushed the cmake branch from 5e78e0b to da64709 Compare October 19, 2023 13:27

trivialfis approved these changes Oct 20, 2023

View reviewed changes

trivialfis merged commit 83cdf14 into dmlc:master Oct 20, 2023
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMake CUDA features #9677

CMake CUDA features #9677

chuckatkins commented Oct 14, 2023 •

edited

Loading

trivialfis left a comment

chuckatkins commented Oct 17, 2023

trivialfis commented Oct 17, 2023

robertmaynard left a comment

trivialfis commented Oct 19, 2023

chuckatkins commented Oct 19, 2023

chuckatkins commented Oct 19, 2023

robertmaynard commented Oct 19, 2023

trivialfis left a comment

CMake CUDA features #9677

CMake CUDA features #9677

Conversation

chuckatkins commented Oct 14, 2023 • edited Loading

trivialfis left a comment

Choose a reason for hiding this comment

chuckatkins commented Oct 17, 2023

trivialfis commented Oct 17, 2023

robertmaynard left a comment

Choose a reason for hiding this comment

trivialfis commented Oct 19, 2023

chuckatkins commented Oct 19, 2023

chuckatkins commented Oct 19, 2023

robertmaynard commented Oct 19, 2023

trivialfis left a comment

Choose a reason for hiding this comment

chuckatkins commented Oct 14, 2023 •

edited

Loading