Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot start VM on proxmox 7.2 with profile override #101

Open
UntouchedWagons opened this issue Jul 14, 2022 · 14 comments
Open

Cannot start VM on proxmox 7.2 with profile override #101

UntouchedWagons opened this issue Jul 14, 2022 · 14 comments

Comments

@UntouchedWagons
Copy link

I'm following Jeff of Craft Computing's vgpu unlock tutorial on a spare machine running Proxmox 7.2 using a GTX 1070 with 8GB of VRAM. I've installed nvidia driver 510.73.06, made the service configs and made a custom profile as follows:

[profile.nvidia-54]
num_displays = 1
display_width = 1280
display_height = 1024
max_pixels = 1310720
cuda_enabled = 1
frl_enabled = 60
framebuffer = 984263338
pci_id = 0x17F011A0
pci_device_id = 0x17F0

I've attached the GPU to my VM running Windows 10 Pro x64 and chose profile 54. But when I start the VM I get this error:

kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:01:00.0/00000000-0000-0000-0000-000000000100,id=hostpci0,bus=pci.0,addr=0x10: warning: vfio 00000000-0000-0000-0000-000000000100: Could not enable error recovery for the device
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:01:00.0/00000000-0000-0000-0000-000000000100,id=hostpci0,bus=pci.0,addr=0x10: vfio 00000000-0000-0000-0000-000000000100: failed to read device config space: Bad address
TASK ERROR: start failed: QEMU exited with code 1

Now if I reboot and choose one of the other profiles that won't use more RAM than is available the VM will start but I cannot install any drivers because the PCI ID stuff is wrong.

I'd join the vgpu_unlock discord linked in Jeff's tutorial but the invite is expired.

@Epiphrin
Copy link

Epiphrin commented Jul 15, 2022

You set the wrong settings...because the tutorial from Craft Computing is not for Proxmox 7.2,
Thats error means theres a problem with wrong settings in the /etc/vgpu_unlock/profile_override.toml

wrong: framebuffer = 984263338
right; framebuffer = 0x3B000000

                      # Other options:
                      # 1GB: 0x3B000000
                      # 2GB: 0x76000000
                      # 3GB: 0xB1000000
                      # 4GB: 0xEC000000
                      # 8GB: 0x1D8000000
                      # 16GB: 0x3B0000000
                      # These numbers may not be accurate for you, but you can always calculate the right number like this:
                      # The amount of VRAM in your VM = `framebuffer` + `framebuffer_reservation`

This Tutorial is for Proxmox 7.2
https://gitlab.com/polloloco/vgpu-proxmox

@UntouchedWagons
Copy link
Author

UntouchedWagons commented Jul 15, 2022

Okay that did the trick, the VM is able to start and I was able to install the drivers but now I'm getting the code 43 error. I thought NVidia stopped that nonsense?

[Edit] If I shut down the VM it won't start again because of the same error as at the start, I have to reboot the host to fix it

@Epiphrin
Copy link

Epiphrin commented Jul 15, 2022

try it with the P6000 Pascal Qudaro pci_id maybe thats work.

pci_id = 0x1B3011A0
pci_device_id = 0x1B30

what will be printed to screen if you enter the following command:
mdevctl types

@UntouchedWagons
Copy link
Author

I just had a thought, when I was following Jeff's instructions, I modified the source code of the nvidia driver by adding the compiler directives, but I don't recall ever telling dkms to recompile the driver, unless something did that for me automatically.

@UntouchedWagons
Copy link
Author

# mdevctl types
0000:01:00.0
  nvidia-156
    Available instances: 12
    Device API: vfio-pci
    Name: GRID P40-2B
    Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
  nvidia-215
    Available instances: 12
    Device API: vfio-pci
    Name: GRID P40-2B4
    Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
  nvidia-241
    Available instances: 24
    Device API: vfio-pci
    Name: GRID P40-1B4
    Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
  nvidia-283
    Available instances: 6
    Device API: vfio-pci
    Name: GRID P40-4C
    Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=6
  nvidia-284
    Available instances: 4
    Device API: vfio-pci
    Name: GRID P40-6C
    Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=4096x2160, max_instance=4
  nvidia-285
    Available instances: 3
    Device API: vfio-pci
    Name: GRID P40-8C
    Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=3
  nvidia-286
    Available instances: 2
    Device API: vfio-pci
    Name: GRID P40-12C
    Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=4096x2160, max_instance=2
  nvidia-287
    Available instances: 1
    Device API: vfio-pci
    Name: GRID P40-24C
    Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=4096x2160, max_instance=1
  nvidia-46
    Available instances: 24
    Device API: vfio-pci
    Name: GRID P40-1Q
    Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
  nvidia-47
    Available instances: 12
    Device API: vfio-pci
    Name: GRID P40-2Q
    Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
  nvidia-48
    Available instances: 8
    Device API: vfio-pci
    Name: GRID P40-3Q
    Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
  nvidia-49
    Available instances: 6
    Device API: vfio-pci
    Name: GRID P40-4Q
    Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
  nvidia-50
    Available instances: 4
    Device API: vfio-pci
    Name: GRID P40-6Q
    Description: num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4
  nvidia-51
    Available instances: 3
    Device API: vfio-pci
    Name: GRID P40-8Q
    Description: num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=3
  nvidia-52
    Available instances: 2
    Device API: vfio-pci
    Name: GRID P40-12Q
    Description: num_heads=4, frl_config=60, framebuffer=12288M, max_resolution=7680x4320, max_instance=2
  nvidia-53
    Available instances: 1
    Device API: vfio-pci
    Name: GRID P40-24Q
    Description: num_heads=4, frl_config=60, framebuffer=24576M, max_resolution=7680x4320, max_instance=1
  nvidia-54
    Available instances: 24
    Device API: vfio-pci
    Name: GRID P40-1A
    Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=24
  nvidia-55
    Available instances: 12
    Device API: vfio-pci
    Name: GRID P40-2A
    Description: num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=12
  nvidia-56
    Available instances: 8
    Device API: vfio-pci
    Name: GRID P40-3A
    Description: num_heads=1, frl_config=60, framebuffer=3072M, max_resolution=1280x1024, max_instance=8
  nvidia-57
    Available instances: 6
    Device API: vfio-pci
    Name: GRID P40-4A
    Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=6
  nvidia-58
    Available instances: 4
    Device API: vfio-pci
    Name: GRID P40-6A
    Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=1280x1024, max_instance=4
  nvidia-59
    Available instances: 3
    Device API: vfio-pci
    Name: GRID P40-8A
    Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=3
  nvidia-60
    Available instances: 2
    Device API: vfio-pci
    Name: GRID P40-12A
    Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=1280x1024, max_instance=2
  nvidia-61
    Available instances: 1
    Device API: vfio-pci
    Name: GRID P40-24A
    Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=1280x1024, max_instance=1
  nvidia-62
    Available instances: 24
    Device API: vfio-pci
    Name: GRID P40-1B
    Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24

@Epiphrin
Copy link

Epiphrin commented Jul 15, 2022

maybe try it with a tesla p40 pci_id
pci_id = 0x1B3811D9
pci_device_id = 0x1B38

@Epiphrin
Copy link

Epiphrin commented Jul 15, 2022

use profile nvidia-50 you need to use a Q Profile!

i have get vgpu work with a rtx 2060.
wish you luck to get the 1070 running.

@UntouchedWagons
Copy link
Author

Which pci ids should I use with nvidia-50? Jeff gave barely any information about how to choose what profile to use.

@Epiphrin
Copy link

you need one of the q profiles "-1Q","-2Q","-3Q" select the profile with the needed instances count.

try it with the follwing ids maybe one of them work.

tesla p40 pci_id
pci_id = 0x1B3811D9
pci_device_id = 0x1B38

P6000 Pascal Qudaro pci_id
pci_id = 0x1B3011A0
pci_device_id = 0x1B30

@UntouchedWagons
Copy link
Author

UntouchedWagons commented Jul 15, 2022

If I use nvidia-50 with the tesla p40 IDs I get a "This device cannot start. (Code 10)" in device manager. If I try the pascal IDs the VM won't start so I'll have to reboot the host again.

@Epiphrin
Copy link

Epiphrin commented Jul 15, 2022

maybe this blog here helps ? he has got a 1060 gtx running and using the Tesla P40 vGPU unlock profile.
https://wvthoog.nl/proxmox-7-vgpu-v2/

@UntouchedWagons
Copy link
Author

Nah I can't get it to work. Is there a list of proper quadro cards I can use this vgpu stuff with?

@Epiphrin
Copy link

Epiphrin commented Jul 16, 2022

https://docs.google.com/document/d/1pzrWJ9h-zANCtyqRgS7Vzla0Y8Ea2-5z2HEi4X75d2Q/edit
consumer grade GPU’s like GTX and RTX series is just try and testing.. if its running and how stable and with which drivers...
i have get the 2060 RTX 12GB working with the nvidia 510 drivers and proxmox 7.2 with 5.15 kernel and the manual from https://gitlab.com/polloloco/vgpu-5.15

Nvidia vGPU cards GPU Chip vGPU unlock supported:
Tesla M10 GM107 x4 Most Maxwell 1.0 cards
Tesla M60 GM204 x2 Most Maxwell 2.0 cards
Tesla P40 GP102 Most Pascal cards
Tesla V100 16GB GV100 Titan V, Quadro GV100
Quadro RTX 6000 TU102 Most Turing cards
RTX A6000 GA102 Ampere is not supported

@alx696
Copy link

alx696 commented Aug 10, 2022

@UntouchedWagons Do not set framebuffer = 984263338 ! If set this, you will get error:

  • failed to read device config space: Bad address
  • failed to get region 0 info: Input/output error

The mistakes are random. Maybe it's OK to try a few more times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants