Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPU passthrough support #109

Closed
gardenali opened this issue Mar 30, 2024 · 24 comments
Closed

AMD GPU passthrough support #109

gardenali opened this issue Mar 30, 2024 · 24 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@gardenali
Copy link

It's great to have intel and nvidia support, but I'm missing the AMD option.

Thank you!

@lks-hrsch
Copy link

I am currently also at the point that I need amd gpu support to move my last service to jailmaker. Can someone explain what is needed to enable gpu support or what you have done for intel and nvidia? I am open to implementing and evaluating the feature on my system.

@easyfab
Copy link

easyfab commented Mar 30, 2024

Does it work if you manually add --bind=/dev/dri ?

e.g : jlmkr create myjail --bind=/dev/dri

@jeefberkey
Copy link

The intel gpu setting does just that

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 31, 2024

Jailmaker has intel and nvidia GPU support because these drivers are provided by the TrueNAS SCALE host OS. I think adding support for a dedicated AMD GPU in jailmaker is not trivial, if possible at all without modifying the host OS. Since I have no dedicated GPU in my TrueNAS server I can't investigate this. Feel free to investigate though. @lks-hrsch you could have a look at the python code of jlmkr.py to see what the intel and nvidia GPU passthrough options do.

@easyfab
Copy link

easyfab commented Mar 31, 2024

Isn't AMD support in Truenas Host OS ?

lspci -k | grep amdgpu
Kernel driver in use: amdgpu
Kernel modules: amdgpu

For info, I tried with 5700G APU, adding --bind=/dev/dri seems to give me access to the igpu in jailmaker.
Don't know if it work with dgpu.

edit : to complete

root@myjail:~# vainfo
error: can't connect to X server!
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_17
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.17 (libva 2.12.0)
vainfo: Driver version: Mesa Gallium driver 22.3.6 for AMD Radeon Graphics (renoir, LLVM 15.0.6, DRM 3.54, 6.6.16-production+truenas)
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileHEVCMain10 : VAEntrypointEncSlice
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileVP9Profile2 : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc

image

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 31, 2024

Yes I think there's a difference between AMD iGPU and dGPU but I'd be happy to be proven wrong.

@Jip-Hop Jip-Hop added invalid This doesn't seem right help wanted Extra attention is needed labels Apr 6, 2024
@Jip-Hop
Copy link
Owner

Jip-Hop commented Apr 6, 2024

I've labeled this issue as invalid and help wanted. I think iGPU is already supported (Intel or AMD). You'd have to set gpu_passthrough_intel=1 in your config file for that. I realize now this naming is confusing in this case...

Regarding AMD dedicated GPUs, as far as I know those aren't supported on the SCALE host system and therefore jailmaker can't support them either.

Since I don't have an AMD GPU I could use help to confirm this issue is indeed invalid. Either way, I won't be working on a solution for this issue and I recommend to either switch to an nvidia GPU or implement a solution (if possible) and provide a pull request.

@lks-hrsch
Copy link

I apologize for not getting back to you sooner, but I can prove that for AMD iGPU it's already working, a dGPU I also currently don't have for testing.

@maeehart
Copy link

Hey! I just want to add that that passing the AMD gpu does work, but one needs to also bind /dev/kfd. That is, the command like
./jlmkr.py create --distro=ubuntu --release=jammy ubuntu --bind=/dev/dri --bind=/dev/kfd
works. After this, I just had a bit of problems with permissions on these files. However, now I can run llama3 in a jail using an AMD GPU.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Apr 22, 2024

@maeehart is that a dedicated AMD GPU you're using (which model)? If so then that's good news and we can close this ticket as completed.

@maeehart
Copy link

Yes, it is a 6900 xt, i.e., a dedicated AMD GPU. We could still make a PR regarding the AMD support so that one could just ues the GPU by adding a gpu_passthrough_amd flag.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Apr 23, 2024

Ah yes that's a good idea. Could you provide the PR?

@maeehart
Copy link

I can do it during the weekend. I will need to see if I can do something about the permissions.

@Jip-Hop Jip-Hop added enhancement New feature or request and removed invalid This doesn't seem right labels Apr 24, 2024
@dalgibbard
Copy link
Contributor

dalgibbard commented May 9, 2024

Though you didn't specify what your permission issues are; is it fixed if you add:
--property=DeviceAllow="/dev/kfd rw"
?

I ask, since I do something similar for CoralTPU passthrough, which looks like:

--bind='/dev/ttyUSB0'
--property=DeviceAllow="/dev/ttyUSB0 rwm"
--property=DeviceAllow="char-drm rwm"
--property=DeviceAllow=/dev/bus/usb

Though I haven't spent enough time in the land of nspawn to really work out if all of these are necessary/correct lol

Edit: This made me want to go look up what "rwm" is vs just "rw", and the "m" means:

"m" (Mknod): Allows the creation of device nodes using mknod. Device nodes are special files in Unix-like operating systems that represent device interfaces. With this permission, the container can create new device nodes within its filesystem, enabling access to devices that were not initially available. This is useful for dynamically creating device nodes as needed by containerized applications.

@Jip-Hop
Copy link
Owner

Jip-Hop commented May 9, 2024

Please have a look in the jlmkr.py code and search for for DeviceAllow. I think adding this explicitly will actually cause issues instead of solving them.

@jere-co
Copy link

jere-co commented Jun 17, 2024

Any updates on the AMD dGPU support?

@Jip-Hop
Copy link
Owner

Jip-Hop commented Jun 25, 2024

Hey! I just want to add that that passing the AMD gpu does work, but one needs to also bind /dev/kfd. That is, the command like

@maeehart are you sure it was the AMD GPU being used (and not the one in the CPU because you also added --bind=/dev/dri)? I assume the AMD GPU should be usable without --bind=/dev/dri, at least this is the case for an NVIDIA GPU. Which commands did you run in the jail to test the AMD GPU?

I have an AMD RX 580 GPU in a test TrueNAS server but couldn't yet get it working in an ubuntu jail. I tried debugging with mpv --hwdec=auto video_filename from this arch resource.

@maeehart
Copy link

Hey! I am sure that it is the AMD GPU. I have been now running ollama in the jail and confirming that the GPU is running with watching rocm-smi command for some time. However, I have not had the time to do it again so that I could add the proper scrip to this repo and I am sorry about that. I remember that I had to bind both /dev/dri and /dev/fkd and then modify their rights to allow writing to these files (chmod ...).

@Jip-Hop
Copy link
Owner

Jip-Hop commented Jun 25, 2024

Instead of messing with permissions of /dev/kfd can't you run the process in your jail under the same user/group which already owns /dev/kfd?

@maeehart
Copy link

I think that that is a much better idea.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Jul 9, 2024

Reportedly AMD GPU passthrough works:

 ./jlmkr.py create --distro=ubuntu --release=jammy ubuntu --bind=/dev/dri --bind=/dev/kfd

Adding a dedicated AMD GPU passthrough config option, with corresponding flag for the create command, seems like overkill when a single additional bind mount is enough (especially since AMD GPU passthrough reportedly relies on /dev/dri being mounted which gpu_passthrough_intel=1 takes care of).

@Jip-Hop Jip-Hop closed this as completed Jul 9, 2024
@tyvsmith
Copy link

tyvsmith commented Jul 14, 2024

Suggest documenting this more widely, like in primary readme, or revisiting the decision not to include a flag (even if it's simple). I started investigating jailmaker to handle my k3s -> Docker conversion for Truenas scale, and it wasn't clear if AMD GPU passthrough were supported at all until finding this ticket and reading comments.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Jul 15, 2024

Updated the readme!

@krupinskika
Copy link

Can confirm hardware transcoding for AMD 5700G works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

10 participants