Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newer NVIDIA driver #30

Open
ThomasA opened this issue Mar 15, 2023 · 2 comments
Open

Newer NVIDIA driver #30

ThomasA opened this issue Mar 15, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@ThomasA
Copy link
Contributor

ThomasA commented Mar 15, 2023

Users are starting to use CUDA 12. Our current NVIDIA driver in AI Cloud is version 470.161.03 (470.141.03 on the DGX-2 servers). CUDA 12 requires at least version 525.60.13 of the NVIDIA driver. This means users are unable to run containers depending on CUDA 12 in AI Cloud.

I think we should upgrade the driver in AI Cloud. Continuing the same line of installed packages on the respective nodes, the appropriate package on the DGX-2 (DGX-A100?) servers seems to be "nvidia-driver-525-server" and on the remaining nodes "nvidia-headless-525".

Could we upgrade this in the service window on the 28th of March?

@ThomasA ThomasA added the enhancement New feature or request label Mar 15, 2023
@fasmide
Copy link
Contributor

fasmide commented Mar 16, 2023

Upgrading this on the non DGX OS machines would be pretty straightforward. We can prepare and test these changes on the staging environment, so even though the packages are somewhat black boxes from Nvidia, we can at least test them before potentially ruining the hosts.

The DGX machines, however, I'm not so sure about. We have no way of testing updates, and the amount of Nvidia tooling and Nvidia-applied hacks installed on these hosts is massive.

@ThomasA
Copy link
Contributor Author

ThomasA commented Jun 20, 2023

Was fixed in #31 on non-DGX servers in late March service window.
Hoping to fix on DGX servers on 27th of June.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants