You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users are starting to use CUDA 12. Our current NVIDIA driver in AI Cloud is version 470.161.03 (470.141.03 on the DGX-2 servers). CUDA 12 requires at least version 525.60.13 of the NVIDIA driver. This means users are unable to run containers depending on CUDA 12 in AI Cloud.
I think we should upgrade the driver in AI Cloud. Continuing the same line of installed packages on the respective nodes, the appropriate package on the DGX-2 (DGX-A100?) servers seems to be "nvidia-driver-525-server" and on the remaining nodes "nvidia-headless-525".
Could we upgrade this in the service window on the 28th of March?
The text was updated successfully, but these errors were encountered:
Upgrading this on the non DGX OS machines would be pretty straightforward. We can prepare and test these changes on the staging environment, so even though the packages are somewhat black boxes from Nvidia, we can at least test them before potentially ruining the hosts.
The DGX machines, however, I'm not so sure about. We have no way of testing updates, and the amount of Nvidia tooling and Nvidia-applied hacks installed on these hosts is massive.
Users are starting to use CUDA 12. Our current NVIDIA driver in AI Cloud is version 470.161.03 (470.141.03 on the DGX-2 servers). CUDA 12 requires at least version 525.60.13 of the NVIDIA driver. This means users are unable to run containers depending on CUDA 12 in AI Cloud.
I think we should upgrade the driver in AI Cloud. Continuing the same line of installed packages on the respective nodes, the appropriate package on the DGX-2 (DGX-A100?) servers seems to be "nvidia-driver-525-server" and on the remaining nodes "nvidia-headless-525".
Could we upgrade this in the service window on the 28th of March?
The text was updated successfully, but these errors were encountered: