Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages: multiple outputs for redistributables #240498

Merged

Conversation

ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented Jun 29, 2023

Description of changes

This change which involves creating multiple outputs for CUDA redistributable packages.

We use a script to find out, ahead of time, the outputs each redist package provides. From that, we are able to create multiple outputs for supported redist packages, allowing users to specify exactly which components they require.

Beyond the script which finds outputs ahead of time, there is some custom code involved in making this happen. For example, the way Nixpkgs typically handles multiple outputs involves making dev the default output when available, and adding out to dev's propagatedBuildInputs.

Instead, we make each output independent of the others. If a user wants only to include the headers found in a redist package, they can do so by choosing the dev output. If they want to include dynamic libraries, they can do so by specifying the lib output, or static for static libraries.

To avoid breakages, we continue to provide the out output, which becomes the union of all other outputs, effectively making the split outputs opt-in.

Additional changes:

  • Overrides for CUDA 12.x packages to ensure patchelf succeeds.
  • Refactored overrides for CUDA 12.x packages to use only the outputs they require from other CUDA packages.
  • autoPatchelfIgnoreMissingDeps set to ignore libcuda.so.1 specifically instead of all libs, removing the possibility of breakages by accidentally ignoring libraries we should be linking.
Show us the numbers!

All found by building against master and this branch, with --impure and a local Nixpkgs configuration of

{
  allowUnfree = true;
  cudaSupport = true;
  cudaCapabilities = [ "8.9" ];
  cudaForwardCompat = false;
}
  • cuDNN closure/NAR size went from 2.4G to 1.1G.
    • Comparing master's cudnn to this branch's cudnn.lib.
  • Magma closure size went from 2.9G to 1.6G.
    • Comparing master with this branch (same attribute name).
    • Build targeted capability 8.9 (Ada Lovelace) with CUDA 11.8.
    • Mostly as a result of dropping dependencies on static cuBLAS and cuSPARSE.
  • PyTorch closure size went from 9.7G to 8.4G.
    • Comparing master with this branch (same attribute name).
    • Build targeted capability 8.9 (Ada Lovelace) with CUDA 11.8.
    • Much greater gains to be had in migrating away from CUDA Toolkit.
      • Decrease comes entirely from switch to cudnn.lib instead of cudnn.
      • Magma does not contribute to the closure size, because we link against it statically.
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@ConnorBaker ConnorBaker added 0.kind: enhancement Add something new 6.topic: cuda Parallel computing platform and API labels Jun 29, 2023
@ConnorBaker ConnorBaker self-assigned this Jun 29, 2023
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux 2.status: merge conflict This PR has merge conflicts with the target branch labels Jun 29, 2023
@SomeoneSerge
Copy link
Contributor

Wow, this is huge (no pun intended xD)!

I suppose this PR is to be reviewed later, after you mark it ready? Just a comment before then: it seems like this change deals with aarch64 support, cross-compilation support, and splayed outputs support at once. When we start merging this, it might be easier to merge in smaller chunks? I'd also be wary of exposing addBuildInputs&c as attributes: pipe looks like a great improvement compositionality-wise, but it's also important that our codebase doesn't end up looking alien to a randomly sampled nixpkgs maintainer. I think this usage of pipe is kind of similar to that of the module system's lib.mkMerge, which we might be able to move to in time.

@ConnorBaker
Copy link
Contributor Author

Wow, this is huge (no pun intended xD)!

You're right, there was a fair amount of scope creep as I was working on this. It needs to be split up.

I suppose this PR is to be reviewed later, after you mark it ready?

Yep! I try not to mark PRs as ready for review until I'm near-done or want to make it a collaborative effort/trigger some CI checks.

Just a comment before then: it seems like this change deals with aarch64 support, cross-compilation support, and splayed outputs support at once. When we start merging this, it might be easier to merge in smaller chunks?

Definitely! I can see a few different components in this:

  • Multi-output CUDA redistributables
    • The actual goal of this PR
  • Multi-arch support
    • Support for just Jetson is a specific code-path in the more general multi-arch support feature
  • Cross-compilation support
    • Implemented through splicing

I'd also be wary of exposing addBuildInputs&c as attributes: pipe looks like a great improvement compositionality-wise, but it's also important that our codebase doesn't end up looking alien to a randomly sampled nixpkgs maintainer. I think this usage of pipe is kind of similar to that of the module system's lib.mkMerge, which we might be able to move to in time.

I haven't played much with the module system -- most of the inspiration I took came from utility functions the Haskell packaging has: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/haskell-modules/lib/compose.nix.

There's a lot of boilerplate in what's being done in the overrides which is why I started introducing those abstractions.

As I was writing some of this code I caught myself trying to make everything point-free (a bad habit from my Haskell tendencies).

You're right of course that addBuildInputs and friends should be much more limited in scope.

@ConnorBaker ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch 7 times, most recently from fb5aefe to ff4dc49 Compare July 12, 2023 01:21
@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Jul 12, 2023
@ConnorBaker ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch 2 times, most recently from 7c518af to 1614286 Compare July 12, 2023 02:42
@ConnorBaker
Copy link
Contributor Author

Focused only on multiple outputs now and rebased on master.

Prior to stripping the multiple arch work, I made a copy of it here: https://github.com/ConnorBaker/nixpkgs/tree/feat/cuda-redist-multiple-arch.

@ofborg ofborg bot requested review from SomeoneSerge and samuela July 12, 2023 03:12
@ofborg ofborg bot added 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 12, 2023
@ofborg ofborg bot requested review from thoughtpolice and mdaiter July 13, 2023 02:48
@SomeoneSerge
Copy link
Contributor

Personally, I'd like to see this merged sooner rather than later,

☝🏻 ☝🏻 ☝🏻

if we go through with this, part of the critical path for CUDA packaging will live outside of nixpkgs... are we ok with that?

First of all, this is more like an automation or code-generation tool to simplify the maintenance, I wouldn't call it "critical". Sort of like poetry2nix or nix-update (also live out-of-tree) only it's nvidia2nix. Bash, for that matter, is more critical to us and isn't maintained in-tree. Once we've generated these augmented manifests, the users don't even need to know the tool was ever involved, unless maybe they look to override something. I also like that the the tool being under Connor's account it's easier to treat it as a blackbox: I was about to complain about pydantic being to heavy, but now that it's isolated I literally don't care.

unless maybe they look to override something

Now that's an important use-case we may be damaging and which we maybe want to handle later. I suspect these extra jsons may make it harder to diverge (as in, using overlays and overrides) from the nixpkgs tree

having the source on my personal account (@ConnorBaker) instead of a company account (https://github.com/tweag) or the community account (https://github.com/nix-community)?

If the script sticks around and evolves further maybe we can look into moving it to some organization's namespace, for better consolidation and discoverability. Otherwise, I don't a even a bit mind you having it under yours, as long as it's permissively licensed and fork-able 🤷🏻

To conclude,

...from my point of view the only thing holding this PR back is that it touches the documentation and I'd like to see that documentation updated first, because I think it could confuse people in its current form

@ConnorBaker
Copy link
Contributor Author

...from my point of view the only thing holding this PR back is that it touches the documentation and I'd like to see that documentation updated first, because I think it could confuse people in its current form

Any particular portion of the docs you'd want to see updated, @SomeoneSerge? I purposefully didn't want to change things outside of adding a section on update procedures, but I will if you feel it's necessary.

@ConnorBaker ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch from cc29e78 to 1c16f6b Compare August 24, 2023 13:44
@ConnorBaker
Copy link
Contributor Author

Updated with feedback on the docs section (CUDA Toolkit -> CUDA Toolkit runfile installer; explicitly mention cudaPackages.cudatoolkit), rebased, and squashed.

@ConnorBaker ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch from aae745d to 2556525 Compare August 28, 2023 13:30
@ConnorBaker
Copy link
Contributor Author

Rebased, updated, squashed, and force-pushed.

@ConnorBaker
Copy link
Contributor Author

Result of nixpkgs-review pr 240498 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "8.9" ]; }' run on x86_64-linux 1

16 packages marked as broken and skipped:
  • cudaPackages.nvidia_driver
  • cudaPackages.nvidia_driver.bin
  • cudaPackages.nvidia_driver.lib
  • python310Packages.caffeWithCuda
  • python310Packages.caffeWithCuda.bin
  • python310Packages.tensorflowWithCuda
  • python310Packages.tensorflowWithCuda.dist
  • python310Packages.theanoWithCuda
  • python310Packages.theanoWithCuda.dist
  • python311Packages.tensorflowWithCuda
  • python311Packages.tensorflowWithCuda.dist
  • python311Packages.theanoWithCuda
  • python311Packages.theanoWithCuda.dist
  • truecrack-cuda
  • tts
  • tts.dist
22 packages failed to build:
  • caffeWithCuda
  • caffeWithCuda.bin
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_5_3)
  • cudaPackages.tensorrt.dev (cudaPackages.tensorrt_8_5_3.dev)
  • cudaPackages.tensorrt_8_5_1
  • cudaPackages.tensorrt_8_5_1.dev
  • cudaPackages.tensorrt_8_5_2
  • cudaPackages.tensorrt_8_5_2.dev
  • cudaPackages.tensorrt_8_6_0
  • cudaPackages.tensorrt_8_6_0.dev
  • katagoTensorRT
  • mathematica-cuda
  • python310Packages.tensorrt
  • python310Packages.tensorrt.dist
  • python311Packages.caffeWithCuda
  • python311Packages.caffeWithCuda.bin
  • python311Packages.tensorrt
  • python311Packages.tensorrt.dist
  • python311Packages.torchWithRocm
  • python311Packages.torchWithRocm.dev
  • python311Packages.torchWithRocm.dist
  • python311Packages.torchWithRocm.lib
142 packages built:
  • colmapWithCuda
  • cudaPackages.cuda-samples
  • cudaPackages.cuda_cccl (cudaPackages.cuda_cccl.dev)
  • cudaPackages.cuda_cudart (cudaPackages.cuda_cudart.dev ,cudaPackages.cuda_cudart.lib ,cudaPackages.cuda_cudart.static)
  • cudaPackages.cuda_cuobjdump (cudaPackages.cuda_cuobjdump.bin)
  • cudaPackages.cuda_cupti (cudaPackages.cuda_cupti.dev ,cudaPackages.cuda_cupti.lib ,cudaPackages.cuda_cupti.sample ,cudaPackages.cuda_cupti.static)
  • cudaPackages.cuda_cuxxfilt (cudaPackages.cuda_cuxxfilt.bin ,cudaPackages.cuda_cuxxfilt.dev ,cudaPackages.cuda_cuxxfilt.static)
  • cudaPackages.cuda_demo_suite
  • cudaPackages.cuda_documentation
  • cudaPackages.cuda_gdb (cudaPackages.cuda_gdb.bin)
  • cudaPackages.cuda_memcheck (cudaPackages.cuda_memcheck.bin)
  • cudaPackages.cuda_nsight (cudaPackages.cuda_nsight.bin)
  • cudaPackages.cuda_nvcc (cudaPackages.cuda_nvcc.bin ,cudaPackages.cuda_nvcc.dev ,cudaPackages.cuda_nvcc.static)
  • cudaPackages.cuda_nvdisasm (cudaPackages.cuda_nvdisasm.bin)
  • cudaPackages.cuda_nvml_dev (cudaPackages.cuda_nvml_dev.dev ,cudaPackages.cuda_nvml_dev.lib)
  • cudaPackages.cuda_nvprof (cudaPackages.cuda_nvprof.bin ,cudaPackages.cuda_nvprof.lib)
  • cudaPackages.cuda_nvprune (cudaPackages.cuda_nvprune.bin)
  • cudaPackages.cuda_nvrtc (cudaPackages.cuda_nvrtc.dev ,cudaPackages.cuda_nvrtc.lib ,cudaPackages.cuda_nvrtc.static)
  • cudaPackages.cuda_nvtx (cudaPackages.cuda_nvtx.dev ,cudaPackages.cuda_nvtx.lib)
  • cudaPackages.cuda_nvvp (cudaPackages.cuda_nvvp.bin)
  • cudaPackages.cuda_profiler_api (cudaPackages.cuda_profiler_api.dev)
  • cudaPackages.cuda_sanitizer_api (cudaPackages.cuda_sanitizer_api.bin)
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudatoolkit.doc (cudaPackages.cudatoolkit.doc ,cudatoolkit_11.doc)
  • cudatoolkit.lib (cudaPackages.cudatoolkit.lib ,cudatoolkit_11.lib)
  • cudaPackages.cudnn (cudaPackages.cudnn.dev ,cudaPackages.cudnn.lib ,cudaPackages.cudnn.static ,cudaPackages.cudnn_8_9 ,cudaPackages.cudnn_8_9.dev ,cudaPackages.cudnn_8_9.lib ,cudaPackages.cudnn_8_9.static)
  • cudaPackages.cudnn_8_6 (cudaPackages.cudnn_8_6.dev ,cudaPackages.cudnn_8_6.lib ,cudaPackages.cudnn_8_6.static)
  • cudaPackages.cudnn_8_7 (cudaPackages.cudnn_8_7.dev ,cudaPackages.cudnn_8_7.lib ,cudaPackages.cudnn_8_7.static)
  • cudaPackages.cudnn_8_8 (cudaPackages.cudnn_8_8.dev ,cudaPackages.cudnn_8_8.lib ,cudaPackages.cudnn_8_8.static)
  • cudaPackages.cutensor
  • cudaPackages.cutensor.dev
  • cudaPackages.fabricmanager (cudaPackages.fabricmanager.bin ,cudaPackages.fabricmanager.dev ,cudaPackages.fabricmanager.lib)
  • cudaPackages.libcublas (cudaPackages.libcublas.dev ,cudaPackages.libcublas.lib ,cudaPackages.libcublas.static)
  • cudaPackages.libcufft (cudaPackages.libcufft.dev ,cudaPackages.libcufft.lib ,cudaPackages.libcufft.static)
  • cudaPackages.libcufile (cudaPackages.libcufile.dev ,cudaPackages.libcufile.lib ,cudaPackages.libcufile.sample ,cudaPackages.libcufile.static)
  • cudaPackages.libcurand (cudaPackages.libcurand.dev ,cudaPackages.libcurand.lib ,cudaPackages.libcurand.static)
  • cudaPackages.libcusolver (cudaPackages.libcusolver.dev ,cudaPackages.libcusolver.lib ,cudaPackages.libcusolver.static)
  • cudaPackages.libcusparse (cudaPackages.libcusparse.dev ,cudaPackages.libcusparse.lib ,cudaPackages.libcusparse.static)
  • cudaPackages.libnpp (cudaPackages.libnpp.dev ,cudaPackages.libnpp.lib ,cudaPackages.libnpp.static)
  • cudaPackages.libnvidia_nscq (cudaPackages.libnvidia_nscq.lib)
  • cudaPackages.libnvjpeg (cudaPackages.libnvjpeg.dev ,cudaPackages.libnvjpeg.lib ,cudaPackages.libnvjpeg.static)
  • cudaPackages.nccl
  • cudaPackages.nccl-tests
  • cudaPackages.nccl.dev
  • cudaPackages.nsight_compute (cudaPackages.nsight_compute.bin)
  • cudaPackages.nsight_systems (cudaPackages.nsight_systems.bin)
  • cudaPackages.nvidia_fs
  • cudaPackages.saxpy
  • cudaPackages.setupCudaHook
  • dcgm
  • faissWithCuda
  • faissWithCuda.demos
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • hip-nvidia.doc
  • katagoWithCuda
  • librealsenseWithCuda
  • librealsenseWithCuda.dev
  • magma (magma-cuda ,magma_2_7_1)
  • magma-cuda-static
  • magma_2_6_2
  • nvidia-thrust-cuda
  • nvtop
  • nvtop-nvidia
  • prometheus-dcgm-exporter
  • python310Packages.bentoml
  • python310Packages.bentoml.dist
  • python310Packages.bitsandbytes
  • python310Packages.bitsandbytes.dist
  • python310Packages.cupy
  • python310Packages.cupy.dist
  • python310Packages.encodec
  • python310Packages.encodec.dist
  • python310Packages.jaxlibWithCuda
  • python310Packages.jaxlibWithCuda.dist
  • python310Packages.numbaWithCuda
  • python310Packages.numbaWithCuda.dist
  • python310Packages.openai-triton
  • python310Packages.openai-triton-bin
  • python310Packages.openai-triton-bin.dist
  • python310Packages.openai-triton.dist
  • python310Packages.pycuda
  • python310Packages.pycuda.dist
  • python310Packages.pynvml
  • python310Packages.pynvml.dist
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.pyrealsense2WithCuda.dev
  • python310Packages.tiny-cuda-nn
  • python310Packages.torchWithCuda
  • python310Packages.torchWithCuda.dev
  • python310Packages.torchWithCuda.dist
  • python310Packages.torchWithCuda.lib
  • python310Packages.torchWithRocm
  • python310Packages.torchWithRocm.dev
  • python310Packages.torchWithRocm.dist
  • python310Packages.torchWithRocm.lib
  • python310Packages.torchaudio-bin
  • python310Packages.torchaudio-bin.dist
  • python310Packages.torchvision-bin
  • python310Packages.torchvision-bin.dist
  • python310Packages.trainer
  • python310Packages.trainer.dist
  • python311Packages.bentoml
  • python311Packages.bentoml.dist
  • python311Packages.bitsandbytes
  • python311Packages.bitsandbytes.dist
  • python311Packages.cupy
  • python311Packages.cupy.dist
  • python311Packages.encodec
  • python311Packages.encodec.dist
  • python311Packages.jaxlibWithCuda
  • python311Packages.jaxlibWithCuda.dist
  • python311Packages.openai-triton
  • python311Packages.openai-triton-bin
  • python311Packages.openai-triton-bin.dist
  • python311Packages.openai-triton.dist
  • python311Packages.pycuda
  • python311Packages.pycuda.dist
  • python311Packages.pynvml
  • python311Packages.pynvml.dist
  • python311Packages.pyrealsense2WithCuda
  • python311Packages.pyrealsense2WithCuda.dev
  • python311Packages.tiny-cuda-nn
  • python311Packages.torchWithCuda
  • python311Packages.torchWithCuda.dev
  • python311Packages.torchWithCuda.dist
  • python311Packages.torchWithCuda.lib
  • python311Packages.torchaudio-bin
  • python311Packages.torchaudio-bin.dist
  • python311Packages.torchvision-bin
  • python311Packages.torchvision-bin.dist
  • python311Packages.trainer
  • python311Packages.trainer.dist
  • tiny-cuda-nn
  • xgboostWithCuda
  • xpraWithNvenc
  • xpraWithNvenc.dist

@ConnorBaker
Copy link
Contributor Author

Those failures are all of the usual suspects/reproducible on master.

@SomeoneSerge I'm confident this is ready to be merged; have I addressed all your doc concerns?

@samuela any blockers you see which would prevent merging?

@ConnorBaker
Copy link
Contributor Author

I intend to merge this tomorrow morning (August 31st) at 13:00 UTC barring any strong objects.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/cuda-team-roadmap-update-2023-08-29/32379/1

This change which involves creating multiple outputs for CUDA
redistributable packages.

We use a script to find out, ahead of time, the outputs each redist
package provides. From that, we are able to create multiple outputs for
supported redist packages, allowing users to specify exactly which
components they require.

Beyond the script which finds outputs ahead of time, there is some custom
code involved in making this happen. For example, the way Nixpkgs
typically handles multiple outputs involves making `dev` the default
output when available, and adding `out` to `dev`'s
`propagatedBuildInputs`.

Instead, we make each output independent of the others. If a user wants
only to include the headers found in a redist package, they can do so by
choosing the `dev` output. If they want to include dynamic libraries,
they can do so by specifying the `lib` output, or `static` for static
libraries.

To avoid breakages, we continue to provide the `out` output, which
becomes the union of all other outputs, effectively making the split
outputs opt-in.
@ConnorBaker ConnorBaker force-pushed the feat/cuda-redist-multiple-outputs branch from 2556525 to d5e5246 Compare August 31, 2023 03:32
@ofborg ofborg bot requested a review from SomeoneSerge August 31, 2023 03:58
@ConnorBaker ConnorBaker merged commit bd83b4e into NixOS:master Aug 31, 2023
5 checks passed
@ConnorBaker ConnorBaker deleted the feat/cuda-redist-multiple-outputs branch August 31, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: enhancement Add something new 6.topic: cuda Parallel computing platform and API 6.topic: python 8.has: documentation 10.rebuild-darwin: 11-100 10.rebuild-linux: 101-500 11.by: package-maintainer This PR was created by the maintainer of the package it changes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants