docs: Questions about maintaining CUDA-related packaging #217780

ConnorBaker · 2023-02-23T01:14:03Z

Problem

Asking these questions here because it may provide the opportunity for more people to see and contribute.

For context, I used docker to the point of tears a while back and don't ever want to go back. I like Nix and Nixpkgs, so I've been trying to contribute to the CUDA ecosystem to get things up to speed. I've made a number of PRs toward that end recently, and doing so has taught me a fair amount of Nix and about the state of CUDA support.

The following are a list of questions I've had knocking around while I've worked on the PRs. I figure this might be a good way to gather some knowledge, informally, before trying to write it up or otherwise synthesize it. I read through the big PR from last year which involved most of the CUDA refactoring (#167016) prior to writing this.

I apologize in advance if I've missed any obvious resources.

Proposal

The following questions and answers should either make their way into a FAQ or be incorporated into the CUDA documentation.

`sha256` vs. `hash`

Nix expects hashes provided by the hash attribute (e.g., for fetchurl) to be SRI hashes¹. SRI hashes are self-describing, so it's not necessary to specify the hash algorithm via the attribute name. Where possible, it is preferable to use hash instead of sha256 to avoid confusion and to avoid the need to specify the hash algorithm -- there is an ongoing, Nixpkgs-wide effort to do so!

`autoAddOpenGLRunpathHook`

Background on autoAddOpenGLRunpathHook, why we need it, and answers, courtesy of @samuela:

Back in the olden days, one had to manually call patchelf to repair executable RUNPATHs. addOpenGLRunpath and subsequently autoAddOpenGLRunpathHook automate this process and removes confusion. What does this mean and why is any of it necessary? Well, binaries that depend on nvidia gpu access dynamically load driver libraries at runtime, esp. libcuda.so. This file is part of the kernel driver installation. On most Linux systems the kernel driver is available in a fixed, globally accessible location. The RUNPATH/RPATH is a section in the binary itself that determines where the executable will search for library files when dynamically loading, like LD_LIBRARY_PATH but embedded in the binary. So on conventional systems you just add /foo/bar/nvidia/ to RUNPATH and you're set to go.

However, NixOS operates differently: drivers live in the Nix store no different than any other kind of package. So how do we load these libraries? The NixOS solution is to symlink a special directory, currently /run/opengl-driver/lib, which points to the current graphics driver. Then you can load libcuda.so from there. But because this differs from external convention, we need to add this path into RUNPATH in order to keep everything running smoothly.

Once upon a time, each package had to do this manually with patchelf. Now we have addOpenGLRunpath and autoAddOpenGLRunpathHook to automate the process for us.

How will I know when I need to use autoAddOpenGLRunpathHook?

If your binaries are complaining about not being able to load libraries, you probably need one of the hooks.
Does the hook replace the need to manually invoke addOpenGLRunpath?

Hopefully yes. If you come across a case where it does not work, it's likely a bug and should be reported.

Multi-output derivations

Part of the goal of multi-output derivations is to allow for more granular control over what is installed. For example, if you only need the static library, you can install that without installing the shared library. This is especially useful for CUDA, where libraries are often dozens, if not hundreds, of megabytes in size.

What name should be used for the output containing static libraries?
- Is there a consistent way of organizing shared vs. static across Nixpkgs?
  - I've seen static and dev used for this purpose.
- Are they used / can they be used within Nixpkgs without breaking things? See cudaPackages: overhaul of how we package cuda packages #167016 (comment).
Where do include files go?
- Should they be in dev or out? Does it make sense to split at that granular of a level?
Does it make sense for out to contain files which are available in other outputs?
- Should out be standalone/able to run without being joined with other outputs from the derivation? For example, should library files required to run a CUDA program be in out, lib, or both?

Added challenges, pointed out by @SomeoneSerge:

NVidia ships pkg-config files with each of the redistributables
However, packages use FindCUDAToolkit.cmake (if we're lucky), FindCUDA.cmake (if we're less lucky), or bazel (if they positively decided to torture us) to find the toolkit
- None of these are aware of the pkg-config files
- All of these expect the toolkit to be installed in a single directory (e.g., CUDAToolkit_ROOT or CUDA_HOME)
Due to the expectation the toolkit is installed in a single directory, the symlinkJoin pattern emerges

Supporting multiple versions of CUDA

What is the impetus to maintain multiple versions of CUDA?
- Nothing beyond the trade-off of a little extra work for a larger supported install-base.
Is there something Nixpkgs is committed to?
- There is no established commitment.
Does the Nixpkgs community have a clear idea of which packages rely on which versions of CUDA?
- It can be determined mechanically by looking at Nixpkgs.

Version bounds within `cudaPackages`

Different CUDA versions support different compute capabilities. What is the desired way to handle keeping track of this?
- The current approach has a list of GPUs and information about the first (and optionally last) CUDA version to support them: https://github.com/NixOS/nixpkgs/blob/cfb7a2bc29d046f3ad6540cd7db9eed4eb6c7ad6/pkgs/development/compilers/cudatoolkit/gpus.nix.
- With that, we derive supported capabilities given a specific version of cudaPackages to use.
For the Nixpkgs community at large, is there a best practice of including information about packages in the derivation vs. auxiliary files?
- There was mention of something similar (cudnn: init 8.8.0; rewrite to assist maintainability #217363 (review)) on a PR I have adding the same functionality to cudnn.

Version bounds for consumers of `cudaPackages`

The source of some packages have a hard-coded list of supported compute capabilities. How should we handle such packages?
- Two examples are torch and magma.
- It is entirely possible that those packages do not support every compute capability or architecture the CUDA version supports. How should we handle this?
- The direction I have taken so far is to add information into the derivation about the supported compute capabilities and architectures. This is then used to generate the appropriate flags for the package.
  - torch: https://github.com/NixOS/nixpkgs/pull/217367/files#diff-59c22b0fc67d897077e55030166ca816d19c80b7767b2ad486bc0aaa2a772115.
  - magma: https://github.com/NixOS/nixpkgs/pull/217410/files#diff-8477c70c80bcca19eae2995acb33196e4b6cb57588f3c6eb2cea0860e5be7633.

@SomeoneSerge found that magma specifically allows us to set CMAKE_CUDA_ARCHITECTURES to use architectures outside those in the CMakeLists.txt: https://github.com/NixOS/nixpkgs/pull/218265/files#diff-989b55d62898864bff7cbba951ccdcdf5ff604fc917498863d2fb567efde542fR137.

Storing "meta" information

Information about GPUs, compute capabilities, and which packages support what is extremely useful and critical to ensuring that packages are built correctly. How should we store this information?
- In the case that we're mirroring a JSON file locally, it should stay a JSON file. (Example being NVIDIA's CUDA redistributable.)
- In the case that we're generating the information ourselves, we should use Nix.

Misc questions to be organized better later

When we do override the C/C++ compilers by setting the CC/CXX environment variables, that doesn't change binutils, so (in my case) I still see ar/ranlib/ld and friends from gcc12 being used. Is that a problem? I don't know if version bumps to those tools can cause as much damage as libraries compiled with different language standards.
If a package needs to link against libcuda.so specifically, what's the best way to make the linker aware of those stubs? I set LIBRARY_PATH and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76
Is it better to set environment variables as env.BLAG = "blarg" (I saw a tree-wide change about using env because of "structuredAttrs") in the derivation or to export them in the shell, in something like preConfigure?
Do we have any infrastructure (like CI) besides cachix?
What populates our cachix?
What's the storage limit for our cachix (meaning, is the number of derivations we host a result of limited compute, storage, or both)?
If it's not CI populating the cache, what's the process for getting permissions to push to it?

Checklist

checked latest Nixpkgs manual (source) and latest NixOS manual (source)
checked open documentation issues for possible duplicates
checked open documentation pull requests for possible solutions

cc @NixOS/cuda-maintainers

https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity ↩

The text was updated successfully, but these errors were encountered:

ConnorBaker · 2023-02-23T12:16:43Z

@samuela I'd really appreciate your perspective and thoughts on these questions if you have the time!

fricklerhandwerk · 2023-02-23T14:23:14Z

Related: https://discourse.nixos.org/t/nixpkgss-current-development-workflow-is-not-sustainable/18741

samuela · 2023-02-23T21:51:39Z

Hey @ConnorBaker, thanks for your interest in CUDA development and contributions so far! These are all fair questions. I'll try to answer as many as possible.

First of all, for some context the @NixOS/cuda-maintainers team was created here. We maintain a few things: Thanks to the generosity of @domenkozar and Cachix we have a build cache. @SomeoneSerge has built some great infrastructure that regularly builds packages like TF/JAX/PyTorch/etc with cudaSupport = true and pushes the results into the cache. I maintain nixpkgs-upkeep, a CI system that fills in some gaps left by Hydra, nixpkgs flagship CI. Since cuda does not fall under the strict definition of "free" software, Hydra refuses to test any packages that use it. nixpkgs-upkeep regularly builds a subset of packages with cudaSupport = true and auto-reports failures (which are more common than you'd think!). Most importantly we review PRs, write documentation, and provide support.

Folks can reach us via @NixOS/cuda-maintainers or on matrix chat in the #cuda:nixos.org channel.

Onto the questions!

On sha256 vs. hash

There is an ongoing, Nixpkgs-wide migration to using SRI hashes and the hash argument, which is intended to be more future-proof. More info here: https://nixos.wiki/wiki/Nix_Hash

How will I know when I need to use autoAddOpenGLRunpathHook?

Back in the olden days, one had to manually call patchelf to repair executable RUNPATHs. addOpenGLRunpath and subsequently autoAddOpenGLRunpathHook automate this process and removes confusion. What does this mean and why is any of it necessary? Well, binaries that depend on nvidia gpu access dynamically load driver libraries at runtime, esp. libcuda.so. This file is part of the kernel driver installation. On most Linux systems the kernel driver is available in a fixed, globally accessible location. The RUNPATH/RPATH is a section in the binary itself that determines where the executable will search for library files when dynamically loading, like LD_LIBRARY_PATH but embedded in the binary. So on conventional systems you just add /foo/bar/nvidia/ to RUNPATH and you're set to go.

However, NixOS operates differently: drivers live in the Nix store no different than any other kind of package. So how do we load these libraries? The NixOS solution is to symlink a special directory, currently /run/opengl-driver/lib, which points to the current graphics driver. Then you can load libcuda.so from there. But because this differs from external convention, we need to add this path into RUNPATH in order to keep everything running smoothly.

Once upon a time, each package had to do this manually with patchelf. Now we have addOpenGLRunpath and autoAddOpenGLRunpathHook to automate the process for us. tl;dr if your binaries are complaining about not being able to load libraries, you probably need one of the hooks.

Does the hook replace the need to manually invoke addOpenGLRunpath?

Hopefully yes. If you come across a case where it does not work, it's likely a bug and should be reported.

Are there cases where it should not be used? See last comment on #167016 (comment).

I'm not sure if I understand what you mean by the link... the last is comment is from @nixos-discourse. In any case, if your package build and runs without it, then no need.

On multi-output derivations

These are good questions, but outside my wheelhouse... I wonder if nixpkgs has a recommended path for these things?

What is the impetus to maintain multiple versions of CUDA? Is there something Nixpkgs is committed to? Does the Nixpkgs community have a clear idea of which packages rely on which versions of CUDA?

Personally I'm in favor of culling old versions of CUDA, but previously there was some pushback that the cost of keeping them was relatively small. You can find out which packages use which versions by searching the nixpkgs dependency tree (or source). As much as possible we try to keep packages on mainline cudaPackages. So older CUDA versions should not have any consumers.

Different CUDA versions support different compute capabilities. What is the desired way to handle keeping track of this?

Prior art is to just keep a list, but we're open to different solutions here. We're always looking for refactors that simplify things.

Information about GPUs, compute capabilities, and which packages support what is extremely useful and critical to ensuring that packages are built correctly. How should we store this information?

This is constantly evolving. Just having everything written down in code is an improvement to begin with. I believe Nix to be the best language for that, keeping it Nix-all-the-way-down. We'll prob keep refactoring and refining things here over time. I'm sure that more opportunities for refactors will present themselves down the road!

SomeoneSerge · 2023-02-27T12:17:53Z

On multi-output derivations

Where do static libraries go?

Ideally, we would like static libraries to go into separate outputs.
Use-case: a downstream package links against a shared library, its output is part of the runtime closure, but we don't need any static .a libraries at runtime. In case of cudaPackages this means gigabytes of useless extra weight, particularly bad for docker and singularity images built with Nix.

An annoying detail: when we splitting static libraries out in cudaPackages, we might find we need symlinkJoin more often because of build-scripts that expect "everything to be in one directory"

EDIT: Tracking in #224533

SomeoneSerge · 2023-02-27T12:25:14Z

Further on multiple outputs and downstream packages expecting "things in one place"

It seems that NVidia ships at least pkg-config files with all of the redist packages.
In practice, however, people use FindCUDAToolkit.cmake if we're lucky, FindCUDA.cmake if we're less lucky, or bazel if they positively decided to torture us 🙃 . They all discover cuda components through using various single-throw-all-in-location variables, e.g. CUDAToolkit_ROOT, CUDA_HOME, etc

It would be nice to have individual cmake targets/components for different pieces of cuda so that we could avoid symlinkJoin. I think we can consider at least opening an issue at https://github.com/NVIDIA/build-system-archive-import-examples/issues if we come up with a reasonable way to organize these things

(Maybe @kmittman would have some hints?)

kmittman · 2023-02-27T22:56:04Z

Hi @SomeoneSerge
Unfortunately I don't really have a solution for the assumption that all of the components are installed in one place, i.e. /usr/local/cuda/ - it's been that way for so long that decoupling the install paths has been problematic I can attest to it, had changed the libcublas install path in CUDA 10.1 and due to popular demand, had to revert that change in a later version

EDIT: some keyboard shortcut submitted early.

As I was saying, it is non-trivial and there are rpath constraints as well. But yes, please go ahead and file a request in that GitHub repo. Perhaps creating a CMake "stub" per component would be beneficial? Otherwise open to suggestions.

ConnorBaker · 2023-03-01T15:00:30Z

From @SomeoneSerge

RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures"

Couldn't find an issue tracking this, so I'll drop a message here.
The more precise argument in favour of building for individual capabilities is easier maintenance and nixpkgs development.
When working on master it's desirable to only build for your own arch, but currently it means a cache-miss for transitive dependencies.
For example, you work on torchvision and you import nixpkgs with config.cudaCapabilities = [ "8.6" ]. Snap! You're rebuilding pytorch, you cancel, you write a custom shell that overrides torchvision specifically, you remove asserts, etc.

Alternative world: cuda-maintainers.cachix.org has a day-old pytorch build for 8.6, a build for 7.5, a build for 6.0, etc
Extra: faster nixpkgs-review, assuming fewer default capabilities
Indivudial builds:

More builds, but they're lighter
Can re-use cache when working on master
Hard to choose default capabilities that would fit most users and not cost too much
All-platforms build:

Less compute in total, but jobs are fat and sometimes drain the build machine
Simpler UX for end-users

nixos-discourse · 2023-03-16T12:43:37Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-45/26397/1

SomeoneSerge · 2023-03-17T14:34:25Z

RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures"

CC #221564

SomeoneSerge · 2023-03-17T14:39:41Z

RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures"

Since the actual motivation is that we want our binary cache to be 1) useful (minimize cache-misses), and 2) affordable, we should also:

Consider upstreaming common performance-related overlays, so that the default nixpkgs import without overlays hits the same cache:
- https://github.com/Nix-QChem/NixOS-QChem/blob/master/overlay.nix, especially blas = ... and mpi = ... bits
- https://github.com/numtide/nixpkgs-unfree/blob/main/overlay.nix
- https://github.com/SomeoneSerge/nixpkgs-unfree/blob/develop/nix/overlays.nix

EDIT: Being addressed in #224068

nixos-discourse · 2024-06-20T13:35:35Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/need-freelancer-for-nix-packaging-for-machine-learning-dependencies-with-cuda/47268/4

ConnorBaker added the 9.needs: documentation label Feb 23, 2023

domenkozar mentioned this issue Feb 24, 2023

cuda: add module cachix/devenv#422

Draft

FRidh added the 6.topic: cuda label Feb 24, 2023

ConnorBaker self-assigned this Mar 9, 2023

SomeoneSerge mentioned this issue Apr 3, 2023

cudaPackages: split outputs #224533

Closed

AtaraxiaSjel mentioned this issue Apr 29, 2024

koboldcpp having issue with -lcuda path AtaraxiaSjel/nur#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Questions about maintaining CUDA-related packaging #217780

docs: Questions about maintaining CUDA-related packaging #217780

ConnorBaker commented Feb 23, 2023 •

edited

Loading

ConnorBaker commented Feb 23, 2023

fricklerhandwerk commented Feb 23, 2023

samuela commented Feb 23, 2023 •

edited

Loading

SomeoneSerge commented Feb 27, 2023 •

edited

Loading

SomeoneSerge commented Feb 27, 2023 •

edited

Loading

kmittman commented Feb 27, 2023 •

edited

Loading

ConnorBaker commented Mar 1, 2023

nixos-discourse commented Mar 16, 2023

SomeoneSerge commented Mar 17, 2023 •

edited

Loading

SomeoneSerge commented Mar 17, 2023 •

edited

Loading

nixos-discourse commented Jun 20, 2024

docs: Questions about maintaining CUDA-related packaging #217780

docs: Questions about maintaining CUDA-related packaging #217780

Comments

ConnorBaker commented Feb 23, 2023 • edited Loading

Problem

Proposal

Table of Contents

sha256 vs. hash

autoAddOpenGLRunpathHook

Multi-output derivations

Supporting multiple versions of CUDA

Version bounds within cudaPackages

Version bounds for consumers of cudaPackages

Storing "meta" information

Misc questions to be organized better later

Checklist

Footnotes

ConnorBaker commented Feb 23, 2023

fricklerhandwerk commented Feb 23, 2023

samuela commented Feb 23, 2023 • edited Loading

SomeoneSerge commented Feb 27, 2023 • edited Loading

SomeoneSerge commented Feb 27, 2023 • edited Loading

kmittman commented Feb 27, 2023 • edited Loading

ConnorBaker commented Mar 1, 2023

nixos-discourse commented Mar 16, 2023

SomeoneSerge commented Mar 17, 2023 • edited Loading

SomeoneSerge commented Mar 17, 2023 • edited Loading

nixos-discourse commented Jun 20, 2024

ConnorBaker commented Feb 23, 2023 •

edited

Loading

`sha256` vs. `hash`

`autoAddOpenGLRunpathHook`

Version bounds within `cudaPackages`

Version bounds for consumers of `cudaPackages`

samuela commented Feb 23, 2023 •

edited

Loading

SomeoneSerge commented Feb 27, 2023 •

edited

Loading

SomeoneSerge commented Feb 27, 2023 •

edited

Loading

kmittman commented Feb 27, 2023 •

edited

Loading

SomeoneSerge commented Mar 17, 2023 •

edited

Loading

SomeoneSerge commented Mar 17, 2023 •

edited

Loading