Skip to content

Commit

Permalink
Build nccl after installing cuda (pytorch#1670)
Browse files Browse the repository at this point in the history
Fix: pytorch/pytorch#116977

Nccl 2.19.3 don't exist for cuda 11.8 and cuda 12.1. Refer to https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-19-3.html#rel_2-19-3 CUDA 12.0, 12.2, 12.3 are supported.

Hence we do manual build. Follow this build process:
https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build

We want nccl version be exactly the same as installed here:
https://github.com/pytorch/pytorch/blob/main/.github/scripts/generate_binary_build_matrix.py#L45
  • Loading branch information
atalman committed Jan 9, 2024
1 parent 68a5236 commit 2bc6df7
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions common/install_cuda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,13 @@ function install_118 {
rm -rf tmp_cudnn

# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
mkdir tmp_nccl && cd tmp_nccl
wget -q https://developer.download.nvidia.com/compute/redist/nccl/v2.15.5/nccl_2.15.5-1+cuda11.8_x86_64.txz
tar xf nccl_2.15.5-1+cuda11.8_x86_64.txz
cp -a nccl_2.15.5-1+cuda11.8_x86_64/include/* /usr/local/cuda/include/
cp -a nccl_2.15.5-1+cuda11.8_x86_64/lib/* /usr/local/cuda/lib64/
# Follow build: https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build
git clone -b v2.19.3-1 --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_nccl
rm -rf nccl

install_cusparselt_040

Expand All @@ -66,13 +66,13 @@ function install_121 {
rm -rf tmp_cudnn

# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
mkdir tmp_nccl && cd tmp_nccl
wget -q https://developer.download.nvidia.com/compute/redist/nccl/v2.18.1/nccl_2.18.1-1+cuda12.1_x86_64.txz
tar xf nccl_2.18.1-1+cuda12.1_x86_64.txz
cp -a nccl_2.18.1-1+cuda12.1_x86_64/include/* /usr/local/cuda/include/
cp -a nccl_2.18.1-1+cuda12.1_x86_64/lib/* /usr/local/cuda/lib64/
# Follow build: https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build
git clone -b v2.19.3-1 --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_nccl
rm -rf nccl

install_cusparselt_040

Expand Down

0 comments on commit 2bc6df7

Please sign in to comment.