cudaPackages: multiple outputs for redistributables #240498

ConnorBaker · 2023-06-29T08:26:39Z

Description of changes

This change which involves creating multiple outputs for CUDA redistributable packages.

We use a script to find out, ahead of time, the outputs each redist package provides. From that, we are able to create multiple outputs for supported redist packages, allowing users to specify exactly which components they require.

Beyond the script which finds outputs ahead of time, there is some custom code involved in making this happen. For example, the way Nixpkgs typically handles multiple outputs involves making dev the default output when available, and adding out to dev's propagatedBuildInputs.

Instead, we make each output independent of the others. If a user wants only to include the headers found in a redist package, they can do so by choosing the dev output. If they want to include dynamic libraries, they can do so by specifying the lib output, or static for static libraries.

To avoid breakages, we continue to provide the out output, which becomes the union of all other outputs, effectively making the split outputs opt-in.

Additional changes:

Overrides for CUDA 12.x packages to ensure patchelf succeeds.
Refactored overrides for CUDA 12.x packages to use only the outputs they require from other CUDA packages.
autoPatchelfIgnoreMissingDeps set to ignore libcuda.so.1 specifically instead of all libs, removing the possibility of breakages by accidentally ignoring libraries we should be linking.

Show us the numbers!

All found by building against master and this branch, with --impure and a local Nixpkgs configuration of

{
  allowUnfree = true;
  cudaSupport = true;
  cudaCapabilities = [ "8.9" ];
  cudaForwardCompat = false;
}

cuDNN closure/NAR size went from 2.4G to 1.1G.
- Comparing master's cudnn to this branch's cudnn.lib.
Magma closure size went from 2.9G to 1.6G.
- Comparing master with this branch (same attribute name).
- Build targeted capability 8.9 (Ada Lovelace) with CUDA 11.8.
- Mostly as a result of dropping dependencies on static cuBLAS and cuSPARSE.
PyTorch closure size went from 9.7G to 8.4G.
- Comparing master with this branch (same attribute name).
- Build targeted capability 8.9 (Ada Lovelace) with CUDA 11.8.
- Much greater gains to be had in migrating away from CUDA Toolkit.
  - Decrease comes entirely from switch to cudnn.lib instead of cudnn.
  - Magma does not contribute to the closure size, because we link against it statically.

Things done

SomeoneSerge · 2023-07-08T22:20:20Z

Wow, this is huge (no pun intended xD)!

I suppose this PR is to be reviewed later, after you mark it ready? Just a comment before then: it seems like this change deals with aarch64 support, cross-compilation support, and splayed outputs support at once. When we start merging this, it might be easier to merge in smaller chunks? I'd also be wary of exposing addBuildInputs&c as attributes: pipe looks like a great improvement compositionality-wise, but it's also important that our codebase doesn't end up looking alien to a randomly sampled nixpkgs maintainer. I think this usage of pipe is kind of similar to that of the module system's lib.mkMerge, which we might be able to move to in time.

ConnorBaker · 2023-07-10T05:49:48Z

Wow, this is huge (no pun intended xD)!

You're right, there was a fair amount of scope creep as I was working on this. It needs to be split up.

I suppose this PR is to be reviewed later, after you mark it ready?

Yep! I try not to mark PRs as ready for review until I'm near-done or want to make it a collaborative effort/trigger some CI checks.

Just a comment before then: it seems like this change deals with aarch64 support, cross-compilation support, and splayed outputs support at once. When we start merging this, it might be easier to merge in smaller chunks?

Definitely! I can see a few different components in this:

Multi-output CUDA redistributables
- The actual goal of this PR
Multi-arch support
- Support for just Jetson is a specific code-path in the more general multi-arch support feature
Cross-compilation support
- Implemented through splicing

I'd also be wary of exposing addBuildInputs&c as attributes: pipe looks like a great improvement compositionality-wise, but it's also important that our codebase doesn't end up looking alien to a randomly sampled nixpkgs maintainer. I think this usage of pipe is kind of similar to that of the module system's lib.mkMerge, which we might be able to move to in time.

I haven't played much with the module system -- most of the inspiration I took came from utility functions the Haskell packaging has: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/haskell-modules/lib/compose.nix.

There's a lot of boilerplate in what's being done in the overrides which is why I started introducing those abstractions.

As I was writing some of this code I caught myself trying to make everything point-free (a bad habit from my Haskell tendencies).

You're right of course that addBuildInputs and friends should be much more limited in scope.

ConnorBaker · 2023-07-12T02:44:08Z

Focused only on multiple outputs now and rebased on master.

Prior to stripping the multiple arch work, I made a copy of it here: https://github.com/ConnorBaker/nixpkgs/tree/feat/cuda-redist-multiple-arch.

SomeoneSerge · 2023-08-22T19:34:26Z

Personally, I'd like to see this merged sooner rather than later,

☝🏻 ☝🏻 ☝🏻

if we go through with this, part of the critical path for CUDA packaging will live outside of nixpkgs... are we ok with that?

First of all, this is more like an automation or code-generation tool to simplify the maintenance, I wouldn't call it "critical". Sort of like poetry2nix or nix-update (also live out-of-tree) only it's nvidia2nix. Bash, for that matter, is more critical to us and isn't maintained in-tree. Once we've generated these augmented manifests, the users don't even need to know the tool was ever involved, unless maybe they look to override something. I also like that the the tool being under Connor's account it's easier to treat it as a blackbox: I was about to complain about pydantic being to heavy, but now that it's isolated I literally don't care.

unless maybe they look to override something

Now that's an important use-case we may be damaging and which we maybe want to handle later. I suspect these extra jsons may make it harder to diverge (as in, using overlays and overrides) from the nixpkgs tree

having the source on my personal account (@ConnorBaker) instead of a company account (https://github.com/tweag) or the community account (https://github.com/nix-community)?

If the script sticks around and evolves further maybe we can look into moving it to some organization's namespace, for better consolidation and discoverability. Otherwise, I don't a even a bit mind you having it under yours, as long as it's permissively licensed and fork-able 🤷🏻

To conclude,

...from my point of view the only thing holding this PR back is that it touches the documentation and I'd like to see that documentation updated first, because I think it could confuse people in its current form

ConnorBaker · 2023-08-22T23:45:55Z

...from my point of view the only thing holding this PR back is that it touches the documentation and I'd like to see that documentation updated first, because I think it could confuse people in its current form

Any particular portion of the docs you'd want to see updated, @SomeoneSerge? I purposefully didn't want to change things outside of adding a section on update procedures, but I will if you feel it's necessary.

doc/languages-frameworks/cuda.section.md

ConnorBaker · 2023-08-24T13:45:20Z

Updated with feedback on the docs section (CUDA Toolkit -> CUDA Toolkit runfile installer; explicitly mention cudaPackages.cudatoolkit), rebased, and squashed.

doc/languages-frameworks/cuda.section.md

pkgs/development/compilers/cudatoolkit/redist/overrides.nix

pkgs/development/libraries/science/math/magma/generic.nix

pkgs/development/libraries/science/math/cudnn/generic.nix

pkgs/development/compilers/cudatoolkit/redist/extension.nix

doc/languages-frameworks/cuda.section.md

ConnorBaker · 2023-08-28T13:30:56Z

Rebased, updated, squashed, and force-pushed.

ConnorBaker · 2023-08-28T20:40:15Z

Result of nixpkgs-review pr 240498 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "8.9" ]; }' run on x86_64-linux 1

16 packages marked as broken and skipped:

cudaPackages.nvidia_driver
cudaPackages.nvidia_driver.bin
cudaPackages.nvidia_driver.lib
python310Packages.caffeWithCuda
python310Packages.caffeWithCuda.bin
python310Packages.tensorflowWithCuda
python310Packages.tensorflowWithCuda.dist
python310Packages.theanoWithCuda
python310Packages.theanoWithCuda.dist
python311Packages.tensorflowWithCuda
python311Packages.tensorflowWithCuda.dist
python311Packages.theanoWithCuda
python311Packages.theanoWithCuda.dist
truecrack-cuda
tts
tts.dist

22 packages failed to build:

caffeWithCuda
caffeWithCuda.bin
cudaPackages.tensorrt (cudaPackages.tensorrt_8_5_3)
cudaPackages.tensorrt.dev (cudaPackages.tensorrt_8_5_3.dev)
cudaPackages.tensorrt_8_5_1
cudaPackages.tensorrt_8_5_1.dev
cudaPackages.tensorrt_8_5_2
cudaPackages.tensorrt_8_5_2.dev
cudaPackages.tensorrt_8_6_0
cudaPackages.tensorrt_8_6_0.dev
katagoTensorRT
mathematica-cuda
python310Packages.tensorrt
python310Packages.tensorrt.dist
python311Packages.caffeWithCuda
python311Packages.caffeWithCuda.bin
python311Packages.tensorrt
python311Packages.tensorrt.dist
python311Packages.torchWithRocm
python311Packages.torchWithRocm.dev
python311Packages.torchWithRocm.dist
python311Packages.torchWithRocm.lib

142 packages built:

colmapWithCuda
cudaPackages.cuda-samples
cudaPackages.cuda_cccl (cudaPackages.cuda_cccl.dev)
cudaPackages.cuda_cudart (cudaPackages.cuda_cudart.dev ,cudaPackages.cuda_cudart.lib ,cudaPackages.cuda_cudart.static)
cudaPackages.cuda_cuobjdump (cudaPackages.cuda_cuobjdump.bin)
cudaPackages.cuda_cupti (cudaPackages.cuda_cupti.dev ,cudaPackages.cuda_cupti.lib ,cudaPackages.cuda_cupti.sample ,cudaPackages.cuda_cupti.static)
cudaPackages.cuda_cuxxfilt (cudaPackages.cuda_cuxxfilt.bin ,cudaPackages.cuda_cuxxfilt.dev ,cudaPackages.cuda_cuxxfilt.static)
cudaPackages.cuda_demo_suite
cudaPackages.cuda_documentation
cudaPackages.cuda_gdb (cudaPackages.cuda_gdb.bin)
cudaPackages.cuda_memcheck (cudaPackages.cuda_memcheck.bin)
cudaPackages.cuda_nsight (cudaPackages.cuda_nsight.bin)
cudaPackages.cuda_nvcc (cudaPackages.cuda_nvcc.bin ,cudaPackages.cuda_nvcc.dev ,cudaPackages.cuda_nvcc.static)
cudaPackages.cuda_nvdisasm (cudaPackages.cuda_nvdisasm.bin)
cudaPackages.cuda_nvml_dev (cudaPackages.cuda_nvml_dev.dev ,cudaPackages.cuda_nvml_dev.lib)
cudaPackages.cuda_nvprof (cudaPackages.cuda_nvprof.bin ,cudaPackages.cuda_nvprof.lib)
cudaPackages.cuda_nvprune (cudaPackages.cuda_nvprune.bin)
cudaPackages.cuda_nvrtc (cudaPackages.cuda_nvrtc.dev ,cudaPackages.cuda_nvrtc.lib ,cudaPackages.cuda_nvrtc.static)
cudaPackages.cuda_nvtx (cudaPackages.cuda_nvtx.dev ,cudaPackages.cuda_nvtx.lib)
cudaPackages.cuda_nvvp (cudaPackages.cuda_nvvp.bin)
cudaPackages.cuda_profiler_api (cudaPackages.cuda_profiler_api.dev)
cudaPackages.cuda_sanitizer_api (cudaPackages.cuda_sanitizer_api.bin)
cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
cudatoolkit.doc (cudaPackages.cudatoolkit.doc ,cudatoolkit_11.doc)
cudatoolkit.lib (cudaPackages.cudatoolkit.lib ,cudatoolkit_11.lib)
cudaPackages.cudnn (cudaPackages.cudnn.dev ,cudaPackages.cudnn.lib ,cudaPackages.cudnn.static ,cudaPackages.cudnn_8_9 ,cudaPackages.cudnn_8_9.dev ,cudaPackages.cudnn_8_9.lib ,cudaPackages.cudnn_8_9.static)
cudaPackages.cudnn_8_6 (cudaPackages.cudnn_8_6.dev ,cudaPackages.cudnn_8_6.lib ,cudaPackages.cudnn_8_6.static)
cudaPackages.cudnn_8_7 (cudaPackages.cudnn_8_7.dev ,cudaPackages.cudnn_8_7.lib ,cudaPackages.cudnn_8_7.static)
cudaPackages.cudnn_8_8 (cudaPackages.cudnn_8_8.dev ,cudaPackages.cudnn_8_8.lib ,cudaPackages.cudnn_8_8.static)
cudaPackages.cutensor
cudaPackages.cutensor.dev
cudaPackages.fabricmanager (cudaPackages.fabricmanager.bin ,cudaPackages.fabricmanager.dev ,cudaPackages.fabricmanager.lib)
cudaPackages.libcublas (cudaPackages.libcublas.dev ,cudaPackages.libcublas.lib ,cudaPackages.libcublas.static)
cudaPackages.libcufft (cudaPackages.libcufft.dev ,cudaPackages.libcufft.lib ,cudaPackages.libcufft.static)
cudaPackages.libcufile (cudaPackages.libcufile.dev ,cudaPackages.libcufile.lib ,cudaPackages.libcufile.sample ,cudaPackages.libcufile.static)
cudaPackages.libcurand (cudaPackages.libcurand.dev ,cudaPackages.libcurand.lib ,cudaPackages.libcurand.static)
cudaPackages.libcusolver (cudaPackages.libcusolver.dev ,cudaPackages.libcusolver.lib ,cudaPackages.libcusolver.static)
cudaPackages.libcusparse (cudaPackages.libcusparse.dev ,cudaPackages.libcusparse.lib ,cudaPackages.libcusparse.static)
cudaPackages.libnpp (cudaPackages.libnpp.dev ,cudaPackages.libnpp.lib ,cudaPackages.libnpp.static)
cudaPackages.libnvidia_nscq (cudaPackages.libnvidia_nscq.lib)
cudaPackages.libnvjpeg (cudaPackages.libnvjpeg.dev ,cudaPackages.libnvjpeg.lib ,cudaPackages.libnvjpeg.static)
cudaPackages.nccl
cudaPackages.nccl-tests
cudaPackages.nccl.dev
cudaPackages.nsight_compute (cudaPackages.nsight_compute.bin)
cudaPackages.nsight_systems (cudaPackages.nsight_systems.bin)
cudaPackages.nvidia_fs
cudaPackages.saxpy
cudaPackages.setupCudaHook
dcgm
faissWithCuda
faissWithCuda.demos
forge
gpu-burn
gpu-screen-recorder
gpu-screen-recorder-gtk
gromacsCudaMpi
gwe
hip-nvidia
hip-nvidia.doc
katagoWithCuda
librealsenseWithCuda
librealsenseWithCuda.dev
magma (magma-cuda ,magma_2_7_1)
magma-cuda-static
magma_2_6_2
nvidia-thrust-cuda
nvtop
nvtop-nvidia
prometheus-dcgm-exporter
python310Packages.bentoml
python310Packages.bentoml.dist
python310Packages.bitsandbytes
python310Packages.bitsandbytes.dist
python310Packages.cupy
python310Packages.cupy.dist
python310Packages.encodec
python310Packages.encodec.dist
python310Packages.jaxlibWithCuda
python310Packages.jaxlibWithCuda.dist
python310Packages.numbaWithCuda
python310Packages.numbaWithCuda.dist
python310Packages.openai-triton
python310Packages.openai-triton-bin
python310Packages.openai-triton-bin.dist
python310Packages.openai-triton.dist
python310Packages.pycuda
python310Packages.pycuda.dist
python310Packages.pynvml
python310Packages.pynvml.dist
python310Packages.pyrealsense2WithCuda
python310Packages.pyrealsense2WithCuda.dev
python310Packages.tiny-cuda-nn
python310Packages.torchWithCuda
python310Packages.torchWithCuda.dev
python310Packages.torchWithCuda.dist
python310Packages.torchWithCuda.lib
python310Packages.torchWithRocm
python310Packages.torchWithRocm.dev
python310Packages.torchWithRocm.dist
python310Packages.torchWithRocm.lib
python310Packages.torchaudio-bin
python310Packages.torchaudio-bin.dist
python310Packages.torchvision-bin
python310Packages.torchvision-bin.dist
python310Packages.trainer
python310Packages.trainer.dist
python311Packages.bentoml
python311Packages.bentoml.dist
python311Packages.bitsandbytes
python311Packages.bitsandbytes.dist
python311Packages.cupy
python311Packages.cupy.dist
python311Packages.encodec
python311Packages.encodec.dist
python311Packages.jaxlibWithCuda
python311Packages.jaxlibWithCuda.dist
python311Packages.openai-triton
python311Packages.openai-triton-bin
python311Packages.openai-triton-bin.dist
python311Packages.openai-triton.dist
python311Packages.pycuda
python311Packages.pycuda.dist
python311Packages.pynvml
python311Packages.pynvml.dist
python311Packages.pyrealsense2WithCuda
python311Packages.pyrealsense2WithCuda.dev
python311Packages.tiny-cuda-nn
python311Packages.torchWithCuda
python311Packages.torchWithCuda.dev
python311Packages.torchWithCuda.dist
python311Packages.torchWithCuda.lib
python311Packages.torchaudio-bin
python311Packages.torchaudio-bin.dist
python311Packages.torchvision-bin
python311Packages.torchvision-bin.dist
python311Packages.trainer
python311Packages.trainer.dist
tiny-cuda-nn
xgboostWithCuda
xpraWithNvenc
xpraWithNvenc.dist

ConnorBaker · 2023-08-29T05:27:09Z

Those failures are all of the usual suspects/reproducible on master.

@SomeoneSerge I'm confident this is ready to be merged; have I addressed all your doc concerns?

@samuela any blockers you see which would prevent merging?

ConnorBaker · 2023-08-30T17:10:05Z

I intend to merge this tomorrow morning (August 31st) at 13:00 UTC barring any strong objects.

nixos-discourse · 2023-08-30T18:03:00Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/cuda-team-roadmap-update-2023-08-29/32379/1

pkgs/development/compilers/cudatoolkit/redist/overrides.nix

This change which involves creating multiple outputs for CUDA redistributable packages. We use a script to find out, ahead of time, the outputs each redist package provides. From that, we are able to create multiple outputs for supported redist packages, allowing users to specify exactly which components they require. Beyond the script which finds outputs ahead of time, there is some custom code involved in making this happen. For example, the way Nixpkgs typically handles multiple outputs involves making `dev` the default output when available, and adding `out` to `dev`'s `propagatedBuildInputs`. Instead, we make each output independent of the others. If a user wants only to include the headers found in a redist package, they can do so by choosing the `dev` output. If they want to include dynamic libraries, they can do so by specifying the `lib` output, or `static` for static libraries. To avoid breakages, we continue to provide the `out` output, which becomes the union of all other outputs, effectively making the split outputs opt-in.

ConnorBaker added 0.kind: enhancement Add something new 6.topic: cuda Parallel computing platform and API labels Jun 29, 2023

ConnorBaker self-assigned this Jun 29, 2023

ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux 2.status: merge conflict This PR has merge conflicts with the target branch labels Jun 29, 2023

ConnorBaker mentioned this pull request Jul 7, 2023

cudaPackages: add jetson support #242050

Closed

4 tasks

ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch 7 times, most recently from fb5aefe to ff4dc49 Compare July 12, 2023 01:21

ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 12, 2023

ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch 2 times, most recently from 7c518af to 1614286 Compare July 12, 2023 02:42

ofborg bot requested review from SomeoneSerge and samuela July 12, 2023 03:12

github-actions bot added the 6.topic: python label Jul 13, 2023

ofborg bot requested review from thoughtpolice and mdaiter July 13, 2023 02:48

SomeoneSerge reviewed Aug 23, 2023

View reviewed changes

doc/languages-frameworks/cuda.section.md Outdated Show resolved Hide resolved

ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch from cc29e78 to 1c16f6b Compare August 24, 2023 13:44

ConnorBaker requested a review from SomeoneSerge August 24, 2023 13:45