Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faile to run ``gpustat --debug'': pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found #90

Closed
hongyi-zhao opened this issue Aug 25, 2020 · 18 comments

Comments

@hongyi-zhao
Copy link

hongyi-zhao commented Aug 25, 2020

Hi,

On Ubuntu 20.04 with Python 3.8.3, I failed to run gpustat --debug, as shown below:

$ gpustat --debug
Error on querying NVIDIA devices. Use --debug flag for details
Traceback (most recent call last):
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 644, in _LoadNvmlLibrary
    nvmlLib = CDLL("libnvidia-ml.so.1")
  File "/home/werner/.pyenv/versions/3.8.3/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/gpustat/__main__.py", line 19, in print_gpustat
    gpu_stats = GPUStatCollection.new_query()
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/gpustat/core.py", line 281, in new_query
    N.nvmlInit()
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 608, in nvmlInit
    _LoadNvmlLibrary()
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 646, in _LoadNvmlLibrary
    _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
  File "/home/werner/.pyenv/versions/3.8.3/envs/socks5-haproxy/lib/python3.8/site-packages/pynvml.py", line 310, in _nvmlCheckReturn
    raise NVMLError(ret)
pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found

@Stonesjtu
Copy link
Collaborator

What's the output of nvidia-smi

@hongyi-zhao
Copy link
Author

I don't install any nvida relevant drivers/tools/utlities on the machine, so the nvidia-smi command is not available currently.

@Stonesjtu
Copy link
Collaborator

Stonesjtu commented Aug 25, 2020 via email

@hongyi-zhao
Copy link
Author

Thanks a lot for your explanations. I'll try and feedback if necessary.

@radhikasethi2011
Copy link

radhikasethi2011 commented Aug 28, 2020

What's the output of nvidia-smi

image
I had the same issue, this is the output of nvidia-smi @Stonesjtu

@hongyi-zhao
Copy link
Author

hongyi-zhao commented Aug 28, 2020

The problem has been solved. The reason is that I don't have a correct installation of cuda/nvidia-driver. Now, it works smoothly. See the following for details:

$  gpustat --debug
X10DAi-01                  Fri Aug 28 15:15:31 2020  450.51.06
[0] GeForce RTX 2070 SUPER | 41'C,   5 % |   291 /  7977 MB | gdm(35M) werner(132M) werner(111M)
$ nvidia-smi 
Fri Aug 28 15:15:43 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 207...  On   | 00000000:02:00.0  On |                  N/A |
| 30%   41C    P8    17W / 215W |    294MiB /  7977MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1933      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      2631      G   /usr/lib/xorg/Xorg                135MiB |
|    0   N/A  N/A      4075      G   /usr/bin/gnome-shell              111MiB |
+-----------------------------------------------------------------------------+

@Stonesjtu
Copy link
Collaborator

@radhikasethi2011 Your problem seems like windows compatibility issue from PyNVML. I don't have any windows GPU server by my side, so I'm afraid I cannot fix it my self. But can you take a look at this link (https://forum.faceswap.dev/viewtopic.php?t=14)?

@wookayin How do you think adding a windows support section in the documentation.

@wookayin
Copy link
Owner

wookayin commented Aug 28, 2020

It is indeed a good datapoint where nvidia-smi works but PyNVML cannot load the shared library (first time seeing this) in Windows (@radhikasethi2011's case). On Ubuntu it was probably fine (@hongyi-zhao's case). Not sure why, but the link you posted says:

The most likely issue for this is that you have Windows drivers installed through Windows Update/Windows Store.

So we should provide an instruction saying that the drivers should be obtained from the Nvidia website. @radhikasethi2011, Can you confirm this is the case for yours and whether this solves your issue?

I will add some notes in the README, and more informative error messages (which should be shipped from the next release though).

@wookayin wookayin added this to the 1.0 milestone Aug 28, 2020
@wookayin
Copy link
Owner

In another issue #86, @eusoubrasileiro used a workaround of copying nvml.dll from Windows\System32 to site-packages folder. This would be somewhat python-path-related problem and only a quickfix, but hope it helps.

@radhikasethi2011
Copy link

@Stonesjtu @wookayin updated my nvidia driver but nothing changed. Will uninstall and install again from the nvidia website and update here soon.

@wookayin
Copy link
Owner

Did you mean you updated your driver through windows installer?

@radhikasethi2011
Copy link

@wookayin no, through the nvidia website. Will try the workaround

@garcolazo
Copy link

This was my solution hope it helps someone:

pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.

If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside

The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.

Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

@shirishkz
Copy link

This was my solution hope it helps someone:

pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.

If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside

The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.

Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

Thanks a ton, I was running into this issue earlier while working with some Pytorch/fastai models. Now it seems good. Thanks again.

@eduardatmadenn
Copy link

This was my solution hope it helps someone:

pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.

If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside

The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.

Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

This worked perfectly, thank you !

@nikky4D
Copy link

nikky4D commented May 30, 2021

This was my solution hope it helps someone:
pynvml ask for nvml.dll on "C:\Program Files\NVIDIA Corporation\NVSMI" and "C:\Windows\System32", but the new installer puts the file in "C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_aXXXXXXXXXXXXXX", just copy the dll from "FileRepostory" to the "Program Files" location.
If there is no "NVSMI" folder inside "C:\Program Files\NVIDIA Corporation" make one and just put the dll inside
The nvml.dll on system32 is 596kb, the file inside "FileRepostory" is 1051kb, if there is a nvml.dll inside "Program Files" but is the 596kb version, just replace it for the 1051kb one.
Make sure right click and copy the file and not just hold and move, it will take the original file from "File Repository" and you will not have privileges to copy back or undo the file move.

Thanks a ton, I was running into this issue earlier while working with some Pytorch/fastai models. Now it seems good. Thanks again.

This works for me with a slight change: The location of nvml.dll is now in C:\Windows\System32\DriverStore\FileRepository\nvrzui.inf_amd64_8df10ddaac270452

@jungwon-choi
Copy link

You can solve this issue as belows:

  1. Search "nvml.dll" file in "C:\Windows\System32\DriverStore\FileRepository"
  2. Copy "nvml.dll" file to "C:\Program Files\NVIDIA Corporation\NVSMI" (Make NVSMI folder if not in there by yourself)
  3. Done

@wookayin
Copy link
Owner

wookayin commented Sep 4, 2022

Let me close this issue now, now that we have v1.0 released. I believe the new version of pynvml should have no problem, but if anyone runs into a similar issue on Windows, please create a new issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants