Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaPackages_{10*,11*}: warn about upcoming removal #342112

Merged
merged 2 commits into from
Oct 3, 2024

Conversation

emilazy
Copy link
Member

@emilazy emilazy commented Sep 15, 2024

Description of changes

We currently package all CUDA versions from 10.0 onwards. In some cases, CUDA is the only thing preventing us from removing old versions of GCC. Since we currently don’t deprecate or remove CUDA versions, this will be an increasing drag on compiler maintenance in Nixpkgs going forward unless we establish a sensible policy. After discussing this with @SomeoneSerge in the context of old versions of GCC, I learned that there was already a desire to remove at least versions prior to 11.3, as those versions were only packaged in the old “runfile” format, but that it was blocked on someone doing the work to warn about the upcoming deprecation for a release cycle.

This change adds a release note and warnings indicating that CUDA 10.x and 11.x will be removed in Nixpkgs 25.05, about 8 months from now.

I chose this version cut‐off because these versions of CUDA require GCC < 12. GCC releases a major version every year, and seems to support about four releases at a time, releasing the last update to the oldest version and marking it as unsupported on their site around the time of the release of the next major version. Therefore, by the time of the 25.05 release, we should expect GCC 15 to be released and GCC 11 to become unsupported. Adding a warning and communicating the policy of only shipping CUDA versions that work with supported compilers in the release notes means that we should be able to clean up old versions as required without any issue or extensive deprecation period in future, without obligating us to do so if there is a strongly compelling reason to be more lenient. That should help solve both shipping an indefinitely‐growing list of CUDA versions and an indefinitely‐growing list of GCC and LLVM versions.

As I’m not a user of CUDA myself, I can’t be sure of how sensible this version support policy is, but I think it’s fair to say that it’s reasonable for Nixpkgs to choose not to maintain compiler versions that are unsupported upstream just for the sake of versions of CUDA that are also unmaintained. CUDA 11.x has not received an update for two years already, and would only become unsupported in Nixpkgs in over half a year’s time.

CUDA 10.x is currently unused in‐tree except for the unmaintained Caffe and NVIDIA DCGM, which depends on multiple CUDA versions solely so that it can provide plugins for those versions. The latest DCGM version has already removed support for CUDA 10.x and is just awaiting an update in Nixpkgs. They maintain a list of supported versions to build plugins for in their CMake build system, so it should be simple enough for us to only build support for the versions of CUDA that we support in Nixpkgs.

From what I can tell, CUDA 11.x is currently used by the following packages other than DCGM:

Given the years since CUDA 11.x received any update upstream, and the seemingly very limited set of packages that truly require it, I think the policy of being able to drop versions that require unsupported compilers starting from the next Nixpkgs release is a reasonable one, but of course I’m open to feedback from the CUDA maintainers about this.

Result of nixpkgs-review run on x86_64-linux 1

1 package blacklisted:
  • nixos-install-tools
1 package built:
  • disko

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@emilazy emilazy requested review from RossComputerGuy and a team September 15, 2024 17:29
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation 8.has: changelog labels Sep 15, 2024
@@ -123,4 +123,6 @@ let
fixedPoints.extends composedExtension passthruFunction
);
in
cudaPackages
lib.warnIf (lib.versionOlder cudaVersion "12.0" && config.allowAliases)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lib.warnIf (lib.versionOlder cudaVersion "12.0" && config.allowAliases)
lib.warnIf (lib.versionOlder cudaVersion "11.4" && config.allowAliases)

is safe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR/commit message I gave reasons for choosing 12.0 as a baseline rather than 11.4. By the time of 25.05’s release, 11.x will all depend on GCC versions that are unsupported upstream, which could result in old GCC versions sticking around solely for obsolete versions of CUDA.

Picking the policy listed in the release notes currently in this PR also helps us establish expectations for whether we will be obligated in future to keep an indefinitely‐growing list of versions of CUDA that haven’t been updated in years and only work with compilers that haven’t been updated in years and don’t have upstream support.

I think it’s better to set those expectations now, unless there’s a strong reason not to do it like this. My survey of the affected packages suggested that there’s only one package in Nixpkgs that is currently pinned to CUDA 11.x and that may not support 12.x (but I don’t know whether it does or not).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we should probably get TensorFlow off the unnecessary 11.x pin before actually merging this to avoid annoying users. I’m not sure I’m the best person for that job, though.

Copy link
Contributor

@SomeoneSerge SomeoneSerge Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I must say I couldn't find any explicit information about gcc11's projected end of life, but I agree we don't need to support it any longer than upstream does.

Regarding updates for cuda11, maybe it doesn't needs any (EDIT: well, except for supporting modern toolchains 🤡): the stack is somewhat rigid and stable, slow to evolve, the newer releases tend to just add support for new devices/features, and most applications (hpc) aren't security-critical...

Regarding users of cuda11, I'm afraid we (I) were rather hasty about discarding e.g. 1<=torch<2 which is used by a substantial amount of research code, and I don't think one could easily build it against cuda12.

Regarding tensorflow... nobody wants to deal with bazel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In either case, thanks for picking this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d hope that it’s as simple as just giving it a higher version

With a compatible cuDNN version in place (8.9?), it's probably worth a try.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I’ll try giving it a shot soon. One thing confuses me: we list cuDNN 8.9 as supporting CUDA 12.2 at most, but TensorFlow pairs it with CUDA 12.3. What’s going on there? (And are those versions minimum requirements or precise pinned versions? TensorFlow only pins the major version right now.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using our handy cudaPackages.cudaOlder instead of the calls to lib :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's proof read I guess.

we list cuDNN 8.9 as supporting CUDA 12.2 at most,

Seems to check out:

https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-897/support-matrix/index.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emilazy
Copy link
Member Author

emilazy commented Sep 15, 2024

Looks like the guard to make sure this doesn’t break eval doesn’t work for ofborg. Looking into it.

@emilazy
Copy link
Member Author

emilazy commented Sep 16, 2024

(I still don’t know why ofborg-eval is unhappy, even after digging into the ofborg source code and running outpaths.nix manually myself. If anyone has any ideas let me know; otherwise I’ll keep digging later.)

@emilazy
Copy link
Member Author

emilazy commented Sep 20, 2024

Now that #334996 is merged, I’ll take a look at the TensorFlow source build and maybe a couple of the other stragglers over the next few days.

@SomeoneSerge Do you consider the evaluation warning here vital, or is the release note enough in your view? Unfortunately trying to dig in to why this is breaking CI led to ofborg code nobody really understands and it remains unclear to me why it’s happening even with the enableAliases condition. I can look into it further if you think it’s important, but I wanted to double‐check before putting a bunch more effort into it.

@SomeoneSerge
Copy link
Contributor

Do you consider the evaluation warning here vital, or is the release note enough in your view?

I'm inclined to push for the warnings. I'm confident consumers notice warnings in a timely manner, less so about release notes. It maybe would be good to transition to a protocol where release notes are an expected notice and the only expected notice, but we'd need to be more verbal about that (maybe mention this in the warnings when deprecating the next few features, jaja)

unclear to me why it’s happening even with the enableAliases condition.

Oh actually, isn't it that we have to remove all the in-tree consumers of the aliased packages? I think it's OK to have a package set attributes wrapped in warnIf, but it's not OK if any of the outPath computations depend on that attribute (I might be making this up)

@SomeoneSerge
Copy link
Contributor

, I’ll take a look at the TensorFlow source build and maybe a couple of the other stragglers over the next few days.

Please don't burn out

@emilazy
Copy link
Member Author

emilazy commented Sep 20, 2024

Oh actually, isn't it that we have to remove all the in-tree consumers of the aliased packages? I think it's OK to have a package set attributes wrapped in warnIf, but it's not OK if any of the outPath computations depend on that attribute (I might be making this up)

So the reason for the weird enableAliases condition is that AIUI all the Nixpkgs checks that make sure there are no warnings also disable aliases (to check that aliases aren’t being used in‐tree). Basically it’s a gross hack to deliver a warning to the majority of end users without breaking CI. The correct solution would be the problems infrastructure (see also #338267), but it doesn’t exist yet.

The problem is that apparently the assumption isn’t true, or something else really weird is happening? But it’s going to take some diving to figure out.

The alternative would be to deal with all the reverse dependencies first, but that’s not a great idea since e.g. we still need DCGM to carry the dependencies for the next release cycle to maintain support.

Please don't burn out

Oh don’t worry you’re stuck with me, and I’ll never stop removing things 😆 Though I’ll probably just look at TensorFlow (and maybe TensorRT since it seems important?) and leave the rest to others once the deadline is set, since I’m really not qualified to poke at this stuff and don’t use it myself.

@SomeoneSerge
Copy link
Contributor

Basically it’s a gross hack to deliver a warning to the majority of end users without breaking CI. The correct solution would be the problems infrastructure (see also #338267), but it doesn’t exist yet.

yes-chad.jpg

@emilazy
Copy link
Member Author

emilazy commented Sep 24, 2024

New, grosser warning hack just dropped; let’s see if ofborg is happy now.

I didn’t realize the TensorFlow source build is out of date, so I’ve started poking at it but am indeed in hell right now. I think I can probably get it working, but am I right to assume that all the serious CUDA users will be using the binary TensorFlow package anyway and so it doesn’t hugely matter?

I also learned that Caffe is currently stuck on CUDA 10. BVLC/caffe#7082 implies that it might work to just bump the CUDA version. If not, though, I’m not sure there’s anything we can really do since it’s been unmaintained for so many years. The package is in a pretty bad state already.

@emilazy
Copy link
Member Author

emilazy commented Sep 24, 2024

ofborg also evaluates unfree packages, so this is probably back to the drawing board.

The idea of only shipping one GCC version, like many other distros, was floated in the Staging room on Matrix and I had to shut it down on the grounds of CUDA support. I’m curious, though – given that Nvidia don’t seem to be too behind on GCC support in the latest minor updates, what’s the motivation for using a specific minor version of CUDA that leads us to decide to ship all of them? Is it just a matter of which versions have been verified to work with software? But TensorFlow supports only a specific minor version and yet #334996 just pinned it to the major. If we only shipped one minor version of a given CUDA major release, then we could feasibly ship only one to three GCC versions (default GCC, latest GCC, GCC required by the latest CUDA – some of which may be the same version), which would reduce the burden on GCC maintainers further, but I don’t know enough about CUDA to tell whether this is a completely insane proposal or not :)

@lilyinstarlight
Copy link
Member

Conditioning on config.allowAliases is the right knob for this. It looks like two release attrs, tests.pkg-config.defaultPkgConfigPackages.tensorflow.{aarch64,x86_64}-linux re-evaluate nixpkgs with a reset config causing this issue

It looks like the pkg-config tests re-import nixpkgs but without preserving critical options like allowAliases when it probably should:

# defaultPkgConfigPackages test needs a Nixpkgs with allowUnsupportedPlatform
# in order to filter out the unsupported packages without throwing any errors
# tryEval would be too fragile, masking different problems as if they're
# unsupported platform problems.
allPkgs = import ../default.nix {
system = stdenv.hostPlatform.system;
localSystem = stdenv.buildPlatform.system;
config = {
allowUnsupportedSystem = true;
};
overlays = [];
};

Use nix-instantiate --eval pkgs/top-level/release.nix --arg nixpkgsArgs '{ config.allowAliases = false; }' -A tests.pkg-config.defaultPkgConfigPackages.tensorflow.x86_64-linux for reproducing the warning ofborg gets

@ConnorBaker
Copy link
Contributor

ofborg also evaluates unfree packages, so this is probably back to the drawing board.

The idea of only shipping one GCC version, like many other distros, was floated in the Staging room on Matrix and I had to shut it down on the grounds of CUDA support. I’m curious, though – given that Nvidia don’t seem to be too behind on GCC support in the latest minor updates, what’s the motivation for using a specific minor version of CUDA that leads us to decide to ship all of them? Is it just a matter of which versions have been verified to work with software? But TensorFlow supports only a specific minor version and yet #334996 just pinned it to the major. If we only shipped one minor version of a given CUDA major release, then we could feasibly ship only one to three GCC versions (default GCC, latest GCC, GCC required by the latest CUDA – some of which may be the same version), which would reduce the burden on GCC maintainers further, but I don’t know enough about CUDA to tell whether this is a completely insane proposal or not :)

From my understanding, the range of CUDA packages made available has been driven by what people want to package and the hardware they want to use. From CUDA 11.4, NVIDIA has provided JSON manifests and made packages available as individual tarballs, which allowed me to automate a great deal of packaging.

I think it's unrealistic to offer only a single minor version of CUDA in Nixpkgs, given popular consumers (e.g., OpenCV, PyTorch, Tensorflow, JAX, etc.) each specify their own range of supported versions for a release. Additionally, packages can break in spectacular or devious ways when using versions outside their support matrix (numerical accuracy regressions making me realize I should be glad to get runtime failures).

I feel it's also important to discuss extensibility; if we don't maintain a range of CUDA versions in-tree, how easy is it for downstream users to add it themselves? Currently, I imagine that would look like re-implementing most of the CUDA packaging we have in-tree. One of the reasons I moved to using the module system to create the CUDA package sets was to expose an interface to allow downstream users to supply their own manifests, effectively extending whatever we currently support. This is still something that is very much in progress, and I'd feel much more comfortable deprecating the majority of the CUDA package sets we make available if it were ready right now.

@emilazy
Copy link
Member Author

emilazy commented Sep 25, 2024

Will rebase this to fix the eval issues soon; thanks @lilyinstarlight!

I think it's unrealistic to offer only a single minor version of CUDA in Nixpkgs, given popular consumers (e.g., OpenCV, PyTorch, Tensorflow, JAX, etc.) each specify their own range of supported versions for a release. Additionally, packages can break in spectacular or devious ways when using versions outside their support matrix (numerical accuracy regressions making me realize I should be glad to get runtime failures).

Thanks, that makes a lot of sense. It seems like the best compromise here is to stick with the proposed policy of “we’ll ship compiler versions that upstream supports and that are required for something important”, then, so that we can continue to ship a range of CUDA 12.x minor versions.

I feel it's also important to discuss extensibility; if we don't maintain a range of CUDA versions in-tree, how easy is it for downstream users to add it themselves? Currently, I imagine that would look like re-implementing most of the CUDA packaging we have in-tree. One of the reasons I moved to using the module system to create the CUDA package sets was to expose an interface to allow downstream users to supply their own manifests, effectively extending whatever we currently support. This is still something that is very much in progress, and I'd feel much more comfortable deprecating the majority of the CUDA package sets we make available if it were ready right now.

The problem is that these old CUDA versions don’t work with supported compilers, right? So if and when those compilers are removed from Nixpkgs, anyone packaging these unmaintained CUDA versions out‐of‐tree would also have to repackage old versions of GCC, which is pretty gnarly – that’s exactly why it’s an unreasonable maintenance burden on GCC maintainers currently. But for anyone who is up for handling that, I think the maintenance effort of having to vendor some of the CUDA packaging code isn’t too bad compared to that. For most users who, for whatever reason, can’t use a CUDA version that Nvidia and the wider ecosystem is officially supporting and updating, and who therefore don’t need updates, I think pinning an older Nixpkgs release is likely to be the better approach.

That said, this would only introduce a deprecation warning and so users would still be able to use those versions on 24.11 while work is done on the CUDA side to make it more modular and extensible in the unstable branch. It’s just that right now, these old, unmaintained versions of CUDA are acting as a significant drag on maintainers of core packages like GCC. Old CUDA versions are the only significant thing depending on unsupported versions of GCC with no viable plan to migrate them to supported ones.

There are currently five GCC versions that it would be possible and desired to remove support for if not for that. You can see in my pull request to remove GCC 6 how much additional code, complexity, and vendored patches these old versions require, including shipping versions of multiple old libraries just to keep those GCCs alive; indeed, a significant portion of the total complexity of the current GCC packaging relates to support of these old versions. The need to support these old GCCs has also lead to additional work and complexity for the upcoming rework of Darwin support. We had a recent flurry of removing these old and almost entirely unused compiler versions, but I had to stop the train at #342242 when it reached GCC 7 because of these old CUDA versions. It might not be realistic to only carry one version of GCC and one version of CUDA, but I don’t think we can let a long tail of old CUDA versions be a burden on the rest of Nixpkgs for an indefinite period.

I appreciate that CUDA is a really important ecosystem with a lot of users, but given my extensive survey of what versions the in‐tree users of CUDA support, I think deprecating support for versions older than 12.0 over the course of a release cycle seems like a reasonable trade‐off given the unreasonability of expecting Nixpkgs maintainers to keep unsupported compilers going indefinitely just for versions that even NVIDIA doesn’t care about keeping maintained any more.

This was enabling aliases on ofborg.
We currently package all CUDA versions from 10.0 onwards. In
some cases, CUDA is the only thing preventing us from removing old
versions of GCC. Since we currently don’t deprecate or remove CUDA
versions, this will be an increasing drag on compiler maintenance in
Nixpkgs going forward unless we establish a sensible policy. After
discussing this with @SomeoneSerge in the context of old versions
of GCC, I learned that there was already a desire to remove at least
versions prior to 11.3, as those versions were only packaged in the
old “runfile” format, but that it was blocked on someone doing
the work to warn about the upcoming deprecation for a release cycle.

This change adds a release note and warnings indicating that CUDA 10.x
and 11.x will be removed in Nixpkgs 25.05, about 8 months from now.

I chose this version cut‐off because these versions of CUDA require
GCC < 12. GCC releases a major version every year, and seems to
support about four releases at a time, releasing the last update to
the oldest version and marking it as unsupported on their site around
the time of the release of the next major version. Therefore, by the
time of the 25.05 release, we should expect GCC 15 to be released
and GCC 11 to become unsupported. Adding a warning and communicating
the policy of only shipping CUDA versions that work with supported
compilers in the release notes means that we should be able to
clean up old versions as required without any issue or extensive
deprecation period in future, without obligating us to do so if there
is a strongly compelling reason to be more lenient. That should help
solve both shipping an indefinitely‐growing list of CUDA versions
and an indefinitely‐growing list of GCC and LLVM versions.

As I’m not a user of CUDA myself, I can’t be sure of how sensible
this version support policy is, but I think it’s fair to say that
it’s reasonable for Nixpkgs to choose not to maintain compiler
versions that are unsupported upstream just for the sake of versions
of CUDA that are also unmaintained. CUDA 11.x has not received an
update for two years already, and would only become unsupported in
Nixpkgs in over half a year’s time.

CUDA 10.x is currently unused in‐tree except for the unmaintained
Caffe and NVIDIA DCGM, which depends on multiple CUDA versions solely
so that it can provide plugins for those versions. The latest DCGM
version has already removed support for CUDA 10.x and is just awaiting
an update in Nixpkgs. They maintain a list of supported versions to
build plugins for in their CMake build system, so it should be simple
enough for us to only build support for the versions of CUDA that we
support in Nixpkgs.

From what I can tell, CUDA 11.x is currently used by the following
packages other than DCGM:

* `catboost`, because of
  <catboost/catboost#2540>. It looks like
  upstream has since redesigned this part of their build system, so
  perhaps the problem is no longer present, or would be easier to fix.

* `magma_2_6_2`, an old version from before upstream added CUDA
  12 support. This seems okay to break to me; that version is not
  maintained and will never be updated for new CUDA versions, and
  the CUDA support is optional.

* `paddlepaddle`, which, uh, also requires OpenSSL 1.1 of all
  things. <PaddlePaddle/Paddle#67571>
  states that PaddlePaddle supports up to 12.3.

* `python3Packages.cupy`, which is listed as “possibly incompatible
  with cutensor 2.0 that comes with `cudaPackages_12`”. I’m
  not sure what the “possibly” means here, but according to
  <https://github.com/cupy/cupy/tree/v13.3.0?tab=readme-ov-file#installation>
  they ship binary wheels using CUDA 12.x so I think this should
  be fine.

* `python3Packages.tensorrt`, which supports CUDA 12.x going by
  <https://github.com/NVIDIA/TensorRT/blob/release/10.4/CMakeLists.txt#L111>.

* TensorFlow, which has a link to
  <https://www.tensorflow.org/install/source#gpu> above the
  `python3Packages.tensorflow-bin` definition, but that page lists
  the versions we package as supporting CUDA 12.x.

Given the years since CUDA 11.x received any update upstream, and the
seemingly very limited set of packages that truly require it, I think
the policy of being able to drop versions that require unsupported
compilers starting from the next Nixpkgs release is a reasonable
one, but of course I’m open to feedback from the CUDA maintainers
about this.
@emilazy
Copy link
Member Author

emilazy commented Sep 26, 2024

Hopefully this should work. If it does I think this should be ready to merge, although my TensorFlow question in #342112 (comment) is still relevant.

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Sep 26, 2024
@hacker1024
Copy link
Member

I didn’t realize the TensorFlow source build is out of date, so I’ve started poking at it but am indeed in hell right now. I think I can probably get it working, but am I right to assume that all the serious CUDA users will be using the binary TensorFlow package anyway and so it doesn’t hugely matter?

The binary package is alright, but there are still a few problems that make it less-than-ideal:

  • Many derivations in Nixpkgs use tensorflow and not tensorflow-bin, requiring a Python overlay like tensorflow = tensorflow-bin to be manually added
  • Nixpkgs config options like cudaForwardCompat and cudaCapabilities do not apply, leading to a larger package size that what is often necessary
  • The binary package is not configurable, so things like XLA, GCP, NCCL and TensorRT support cannot be turned on or off
  • On the Jetson side of things, NVIDIA do not provide a wide range of wheels. I'm using tensorflow-bin with Python 3.11 and 3.12, for example, which required manually building TensorFlow on Ubuntu first as NVIDIA only go up to 3.10.

@emilazy
Copy link
Member Author

emilazy commented Sep 26, 2024

ofborg is happy 🎉

Thanks for the response @hacker1024. I agree that having a working source build of TensorFlow is a good thing and that we should have it. Would it be reasonable to say, though, that since the TensorFlow source build is multiple versions behind, if nobody manages to wrangle it into shape by 25.05 it’d be okay to set tensorflow = tensorflow-bin; for now until someone can revive it? I have a local work‐in‐progress commit trying to update the source package to the newest TensorFlow and CUDA 12.x, but I just wanted to get a sense of how useful the from‐source package is in its present, cobweb‐covered state.

@hacker1024
Copy link
Member

hacker1024 commented Sep 26, 2024

Would it be reasonable to say, though, that since the TensorFlow source build is multiple versions behind, if nobody manages to wrangle it into shape by 25.05 it’d be okay to set tensorflow = tensorflow-bin; for now until someone can revive it?

This sounds reasonable to me. Sadly, I think the only real way to find out how much it matters to Nixpkgs users as a whole is to do this and see who complains.

Note that some Keras packages need patching with this change due to some sort of package name discrepancy. I'll hopefully be able to share them later this week, but IIRC they're just some simple substitutions in setup.pys.

@emilazy
Copy link
Member Author

emilazy commented Sep 26, 2024

Sounds reasonable. I think that it should be okay to ship this as‐is, then, since we are only warning and not removing right now; I’ll try to send a follow‐up PR for the TensorFlow source build, and I’ll try and handle TensorRT if someone else doesn’t beat me to it, but I can’t guarantee I’ll be able to wrangle TensorFlow successfully and anyway it would be best if someone with an investment in it maintained it since I’m ending up as the maintainer of last resort for too many things anyway 😅

@emilazy
Copy link
Member Author

emilazy commented Sep 30, 2024

@SomeoneSerge @ConnorBaker gentle ping :)

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Oct 1, 2024
@emilazy emilazy merged commit 7dc35d8 into NixOS:master Oct 3, 2024
26 checks passed
@emilazy emilazy deleted the push-lykmynuyxowq branch October 3, 2024 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants