-
-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Questions about maintaining CUDA-related packaging #217780
Comments
@samuela I'd really appreciate your perspective and thoughts on these questions if you have the time! |
Hey @ConnorBaker, thanks for your interest in CUDA development and contributions so far! These are all fair questions. I'll try to answer as many as possible. First of all, for some context the @NixOS/cuda-maintainers team was created here. We maintain a few things: Thanks to the generosity of @domenkozar and Cachix we have a build cache. @SomeoneSerge has built some great infrastructure that regularly builds packages like TF/JAX/PyTorch/etc with Folks can reach us via @NixOS/cuda-maintainers or on matrix chat in the #cuda:nixos.org channel. Onto the questions!
There is an ongoing, Nixpkgs-wide migration to using SRI hashes and the
Back in the olden days, one had to manually call patchelf to repair executable RUNPATHs. However, NixOS operates differently: drivers live in the Nix store no different than any other kind of package. So how do we load these libraries? The NixOS solution is to symlink a special directory, currently Once upon a time, each package had to do this manually with patchelf. Now we have
Hopefully yes. If you come across a case where it does not work, it's likely a bug and should be reported.
I'm not sure if I understand what you mean by the link... the last is comment is from @nixos-discourse. In any case, if your package build and runs without it, then no need.
These are good questions, but outside my wheelhouse... I wonder if nixpkgs has a recommended path for these things?
Personally I'm in favor of culling old versions of CUDA, but previously there was some pushback that the cost of keeping them was relatively small. You can find out which packages use which versions by searching the nixpkgs dependency tree (or source). As much as possible we try to keep packages on mainline
Prior art is to just keep a list, but we're open to different solutions here. We're always looking for refactors that simplify things.
This is constantly evolving. Just having everything written down in code is an improvement to begin with. I believe Nix to be the best language for that, keeping it Nix-all-the-way-down. We'll prob keep refactoring and refining things here over time. I'm sure that more opportunities for refactors will present themselves down the road! |
Ideally, we would like static libraries to go into separate outputs. An annoying detail: when we splitting static libraries out in EDIT: Tracking in #224533 |
Further on multiple outputs and downstream packages expecting "things in one place" It seems that NVidia ships at least pkg-config files with all of the redist packages. It would be nice to have individual cmake targets/components for different pieces of cuda so that we could avoid (Maybe @kmittman would have some hints?) |
Hi @SomeoneSerge EDIT: some keyboard shortcut submitted early. As I was saying, it is non-trivial and there are rpath constraints as well. But yes, please go ahead and file a request in that GitHub repo. Perhaps creating a CMake "stub" per component would be beneficial? Otherwise open to suggestions. |
From @SomeoneSerge
|
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/tweag-nix-dev-update-45/26397/1 |
RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures" CC #221564 |
RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures" Since the actual motivation is that we want our binary cache to be 1) useful (minimize cache-misses), and 2) affordable, we should also:
EDIT: Being addressed in #224068 |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
Problem
Asking these questions here because it may provide the opportunity for more people to see and contribute.
For context, I used docker to the point of tears a while back and don't ever want to go back. I like Nix and Nixpkgs, so I've been trying to contribute to the CUDA ecosystem to get things up to speed. I've made a number of PRs toward that end recently, and doing so has taught me a fair amount of Nix and about the state of CUDA support.
The following are a list of questions I've had knocking around while I've worked on the PRs. I figure this might be a good way to gather some knowledge, informally, before trying to write it up or otherwise synthesize it. I read through the big PR from last year which involved most of the CUDA refactoring (#167016) prior to writing this.
I apologize in advance if I've missed any obvious resources.
Proposal
The following questions and answers should either make their way into a FAQ or be incorporated into the CUDA documentation.
Table of Contents
sha256
vs.hash
autoAddOpenGLRunpathHook
cudaPackages
cudaPackages
sha256
vs.hash
Nix expects hashes provided by the
hash
attribute (e.g., forfetchurl
) to be SRI hashes1. SRI hashes are self-describing, so it's not necessary to specify the hash algorithm via the attribute name. Where possible, it is preferable to usehash
instead ofsha256
to avoid confusion and to avoid the need to specify the hash algorithm -- there is an ongoing, Nixpkgs-wide effort to do so!autoAddOpenGLRunpathHook
Background on
autoAddOpenGLRunpathHook
, why we need it, and answers, courtesy of @samuela:autoAddOpenGLRunpathHook
?addOpenGLRunpath
?Multi-output derivations
Part of the goal of multi-output derivations is to allow for more granular control over what is installed. For example, if you only need the static library, you can install that without installing the shared library. This is especially useful for CUDA, where libraries are often dozens, if not hundreds, of megabytes in size.
static
anddev
used for this purpose.dev
orout
? Does it make sense to split at that granular of a level?out
to contain files which are available in other outputs?out
be standalone/able to run without being joined with other outputs from the derivation? For example, should library files required to run a CUDA program be inout
,lib
, or both?Added challenges, pointed out by @SomeoneSerge:
pkg-config
files with each of the redistributablesFindCUDAToolkit.cmake
(if we're lucky),FindCUDA.cmake
(if we're less lucky), or bazel (if they positively decided to torture us) to find the toolkitpkg-config
filesCUDAToolkit_ROOT
orCUDA_HOME
)symlinkJoin
pattern emergesSupporting multiple versions of CUDA
Version bounds within
cudaPackages
cudnn
.Version bounds for consumers of
cudaPackages
torch
andmagma
.torch
: https://github.com/pytorch/pytorch/blob/v1.13.1/torch/utils/cpp_extension.py#L1751magma
for CUDA: https://bitbucket.org/icl/magma/src/16ae283407366881829b5f6055e3a1179fdb89dd/CMakeLists.txt#lines-175.magma
for ROCm: https://bitbucket.org/icl/magma/src/16ae283407366881829b5f6055e3a1179fdb89dd/CMakeLists.txt#lines-386.torch
: https://github.com/NixOS/nixpkgs/pull/217367/files#diff-59c22b0fc67d897077e55030166ca816d19c80b7767b2ad486bc0aaa2a772115.magma
: https://github.com/NixOS/nixpkgs/pull/217410/files#diff-8477c70c80bcca19eae2995acb33196e4b6cb57588f3c6eb2cea0860e5be7633.@SomeoneSerge found that
magma
specifically allows us to setCMAKE_CUDA_ARCHITECTURES
to use architectures outside those in theCMakeLists.txt
: https://github.com/NixOS/nixpkgs/pull/218265/files#diff-989b55d62898864bff7cbba951ccdcdf5ff604fc917498863d2fb567efde542fR137.Storing "meta" information
Misc questions to be organized better later
libcuda.so
specifically, what's the best way to make the linker aware of those stubs? I setLIBRARY_PATH
and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76env.BLAG = "blarg"
(I saw a tree-wide change about usingenv
because of "structuredAttrs") in the derivation or to export them in the shell, in something likepreConfigure
?Checklist
cc @NixOS/cuda-maintainers
Footnotes
https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity ↩
The text was updated successfully, but these errors were encountered: