Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages_12_2.cudatoolkit: init at 12.2.0 #240457

Merged

Conversation

Dessix
Copy link
Contributor

@Dessix Dessix commented Jun 29, 2023

Description of changes

Update cudaPackages: 12.1.1 -> 12.2.0

https://docs.nvidia.com/cuda/archive/12.1.1/ -> https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jun 29, 2023
@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Jun 29, 2023

Hi! This PR currently adds a run-file based cudaPackages_12_2.cudatoolkit (which we're trying to phase out) but not the redist packages like cudaPackages_12_2.cuda_nvcc.
We'd have to add the redist bit prior to merging at least because changing the default 12=12.1 to 12.2 effectively "removes" cudaPackages_12.cuda_nvcc that used to come from 12.0

(I'll follow up shortly with pointers on where to find the missing stuff)

@Dessix Dessix force-pushed the dev/dessix/update-cudatoolkit-12.2 branch 2 times, most recently from 564faa4 to a64b1ac Compare June 29, 2023 09:08
@Dessix
Copy link
Contributor Author

Dessix commented Jun 29, 2023

(I'll follow up shortly with pointers on where to find the missing stuff)

I believe I found some of it; I hadn't yet seen your comment and stumbled across the manifest files while browsing around the filesystem. I've added that, but I'm not sure what else may be missing.

For what it's worth, this PR is open to maintainer edits, so feel free to change whatever you need, if it's easier than commenting.

@SomeoneSerge SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Jun 29, 2023
@SomeoneSerge
Copy link
Contributor

Great! The manifests list sets of arguments to invoke build-cuda-redist-package.nix with, and you've already found the extension.nix where all the things get wired together

Check-list

  • Verified the json:
❯ wget https://developer.download.nvidia.com/compute/cuda/redist/redistrib_12.2.0.json -O pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_12.2.0.json
❯ git diff
pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_12.2.0.json
│1151│}   │1151│}
\ No newline at end of file

An extra \n?

  • Verifying the new package set:
❯ NIXPKGS_ALLOW_UNFREE=1 nix build --impure github:Dessix/nixpkgs/dev/dessix/update-cudatoolkit-12.2#cudaPackages_12_2.cuda_nvcc
error: hash mismatch in fixed-output derivation '/nix/store/yjk6l9qaaxnhi0ii7sims8bc7xa733nn-cuda_nvcc-linux-x86_64-12.2.91-archive.tar.xz.drv':
         specified: sha256-yJnFjQfbvjN6vq0BM3oq7HKn/UYAuVg4sEQzXIUZnGs=
            got:    sha256-1wOvCb6jmZtAyMBAlaeOx2zPArEQlAWTAKOpwr1S7GE=
error: 1 dependencies of derivation '/nix/store/mydj6kvjwj71i6snhyfsnnrrkw80qvh6-cuda_nvcc-12.2.91.drv' failed to build

Now that's odd.
CC @kmittman we get a hash mismatch between the manifest and the actual tars

@SomeoneSerge
Copy link
Contributor

❯ wget https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-x86_64/cuda_nvcc-linux-x86_64-12.2.91-archive.tar.xz
❯ sha256sum cuda_nvcc-linux-x86_64-12.2.91-archive.tar.xz
d703af09bea3999b40c8c04095a78ec76ccf02b11094059300a3a9c2bd52ec61  cuda_nvcc-linux-x86_64-12.2.91-archive.tar.xz
❯ cat redistrib_12.2.0.json | jq .cuda_nvcc.'"linux-x86_64"'.sha256
"c899c58d07dbbe337abead01337a2aec72a7fd4600b95838b044335c85199c6b"

@Dessix
Copy link
Contributor Author

Dessix commented Jun 29, 2023

@SomeoneSerge Without the newline, the EditorConfig test gate failed. Should we solve this by fetch-ing manifests instead of checking them in?
As for the hash failures on manifest-referenced content, I have no idea.

@SomeoneSerge
Copy link
Contributor

@Dessix we stage the manifests because this way we can use their contents in nix evaluation, without the "Import-From-Derivation" feature

Without the newline, the EditorConfig test gate failed

Oh, OK, let's just keep it

As for the hash failures on manifest-referenced content, I have no idea.

Let's wait for upstream to comment

@Dessix
Copy link
Contributor Author

Dessix commented Jun 29, 2023

@SomeoneSerge

As for the hash failures on manifest-referenced content, I have no idea.

Let's wait for upstream to comment

Interestingly, the md5 for that file does match (b3b07d9b0b874c09a0cafadfc32b8760), but the sha256 does not. I wonder how that happened.

@kmittman
Copy link

Hi,
Thanks for catching this, I'm working on a fix ASAP.

@kmittman
Copy link

Apologies for the delay, resolved now https://developer.download.nvidia.com/compute/cuda/redist/redistrib_12.2.0.json
Left a comment about what happened in NVIDIA/build-system-archive-import-examples#8

@Dessix Dessix force-pushed the dev/dessix/update-cudatoolkit-12.2 branch from a64b1ac to f52519e Compare June 30, 2023 01:52
@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Jun 30, 2023

  • Seems to build:
    ❯ NIXPKGS_ALLOW_UNFREE=1 nix build --impure github:Dessix/nixpkgs/dev/dessix/update-cudatoolkit-12.2#cudaPackages_12_2.cuda_nvcc
    ❯
    
  • JSON matches what I download manually
  • Commit message: previous commits seem to have followed the pattern, cudaPackages_12: init at 12.0.0, cudaPackages_12_1: init at 12.1.1; cudnn: don't break cudaPackages, etc

@Dessix Dessix force-pushed the dev/dessix/update-cudatoolkit-12.2 branch from f52519e to 5264cff Compare June 30, 2023 19:29
@Dessix
Copy link
Contributor Author

Dessix commented Jun 30, 2023

I've split out the cudaPackages_12 remap to cudaPackages_12_2 for moving to a separate PR for potential ease of revert if it becomes necessary. The commit message now matches the pattern you described.

Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dessix!

@figsoda figsoda added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Jul 1, 2023
@Dessix Dessix changed the title cudaPackages_12_2.cudatoolkit: Add Cuda 12.2 cudaPackages_12_2.cudatoolkit: init at 12.2.0 Jul 1, 2023
@ConnorBaker
Copy link
Contributor

@Dessix can you reproduce these? I'm most concerned with the ones under Failures.

With ~/.config/nixpkgs/config.nix:

{
  allowUnfree = true;
  cudaSupport = true;
  cudaCapabilities = [ "8.9" ];
  cudaForwardCompat = false;
}

Running:

nix build --impure --keep-going -L .#legacyPackages.x86_64-linux.cudaPackages_12_2.{autoAddOpenGLRunpathHook,backendStdenv,cuda_cccl,cuda_cudart,cuda_cuobjdump,cuda_cupti,cuda_cuxxfilt,cuda_demo_suite,cuda_documentation,cuda_gdb,cuda_nsight,cuda_nvcc,cuda_nvdisasm,cuda_nvml_dev,cuda_nvprof,cuda_nvprune,cuda_nvrtc,cuda_nvtx,cuda_nvvp,cuda_opencl,cuda_profiler_api,cuda_sanitizer_api,cudatoolkit,cutensor,fabricmanager,libcublas,libcufft,libcufile,libcurand,libcusolver,libcusparse,libnpp,libnvidia_nscq,libnvjitlink,libnvjpeg,nccl,nsight_compute,nsight_systems,nvidia_fs}

I see the following (all but those under Failures are expected):

@ConnorBaker ConnorBaker self-requested a review July 3, 2023 13:55
@Dessix Dessix force-pushed the dev/dessix/update-cudatoolkit-12.2 branch from 5264cff to b164517 Compare July 3, 2023 19:37
Fixed NixOS#239557 via autoPatchelf to `qt6.(...)` packages.
@Dessix Dessix force-pushed the dev/dessix/update-cudatoolkit-12.2 branch from 7a21b2c to 532a7d3 Compare July 3, 2023 21:35
@Dessix
Copy link
Contributor Author

Dessix commented Jul 3, 2023

@ConnorBaker All items mentioned above are failures present prior to this PR, for 12.0 and 12.1. They appear to be caused by the CUDA team's in-progress rework replacing cudatoolkit with the various single-subject packages, which seems to be incomplete based on the above.

I've reworked the PR slightly to un-vendor QT6, which will at least partially solve #239557, but several items (CollectX, neko) remain which may be targets for total removal.

This PR brings 12.2 to the same state as 12.1 and 12.0; further fixes are likely candidate targets for the CUDA team's larger-scoped rework, as I was unable to untangle the build-cuda-redist-package.nix relation (or lack thereof) with common.nix.

@ofborg ofborg bot requested review from SomeoneSerge and samuela July 3, 2023 22:06
@ofborg ofborg bot added 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 and removed 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 3, 2023
@ConnorBaker
Copy link
Contributor

In that case, looks good to me! Thank you for taking a look at that, @Dessix.

@ConnorBaker ConnorBaker merged commit 1bbdebb into NixOS:master Jul 4, 2023
@Dessix Dessix deleted the dev/dessix/update-cudatoolkit-12.2 branch July 4, 2023 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 11-100 10.rebuild-linux: 11-100 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants