-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuCtxCreate
triggers NV_ERR_INVALID_ADDRESS
with H100 CC-mode on
#9921
Comments
Thanks for the detailed report! I don't have access to a H100 right now unfortunately. I tried reproducing this on a T4 (needed to substitute I believe the nvtrace at https://github.com/derpsteb/libcuda-debug/blob/gvisor-bugreport/cuMemory/cuCtxCreate_nvtrace.log is from inside the container? The address range base=0x7fb87e200000, length=0x3ab000 does seem to be mapped inside the container (because the application make the preceding mmap(2) call with the same arguments). But the application address space is different from the sentry address space. Since the sentry is making the ioctl(2) call to the host, the sentry address space is evaluated by the host driver. nvproxy passes those arguments to the host without any translation (since Could you try to repro this and apart from the nvtrace, could you also grab the |
Awesome. Thanks for looking into this so quickly! The log you referenced is from the native case, i.e. no gvisor, no sandbox. Sorry for not labeling this more clearly. We weren't able to run nvtrace inside gvisor yet, probably because it uses ptrace. But it seems like we fixed it by implementing pwrite. It worked now. The logs you requested:
We also added a log to the uvm_validate function in the kernel driver to print the arguments it sees. These are the addresses:
|
Another thing that I didn't realize so far: the second mmap returns "success". In the native case there is a mapping afterwards. Inside gvisor, the mmap call is also successful, but the mapping is missing. nvtrace doesn't print the mmap return value, but I checked by (a) looking at straces/gvisor's strace and by (b) adding prints to the kernel code. To make sure we are not hitting this case inside the driver. We don't. We also wondered if uvm mmap translation might break permission expectations, since gvisor changes perms of the call during translation. But the previous mmap is mapped with the same permissions as in the native case, which is working. So it seems like gVisor is behaving correctly here. |
Thanks for the logs. The @nixprime Do you know why application mmap(2) of Similarly, none of the application mmaps of |
Application mmap()s of Can you also collect |
Thanks for getting back on this. |
I had been using https://github.com/geohot/cuda_ioctl_sniffer to "sniff" ioctls made to the Nvidia devices. I would run this on the target GPU-binary directly on the host (without gVisor) and collect the output. I have also built a parser (written in golang) which is written against nvproxy package. It parses the output of the sniffer, figures out which commands/ioctls/classes are not implemented in nvproxy, and prints out the diff of what needs to be implemented. However, it seems like https://github.com/geohot/cuda_ioctl_sniffer is not actively maintained and the output of the sniffer is a little garbled (I have had to patch it in various places to make it palatable for the parser). Also it seems like the sniffer segfaults with R550+ drivers (soon to release). I think a more sustainable path forward would be to build this in-house and adding it to
So we could have a LD-preloadable binary like cc @luiscape |
Thanks @ayushr2, we have our own fork of the sniffer with minor patches: https://github.com/modal-labs/cuda_ioctl_sniffer. Having this tool in We haven't invested in internal tooling that much at all yet, but For now I'll lean on our fork of the sniffer to sort out these H100 compatibility issues. |
Thanks a lot for the writeup. Would love to contribute our tooling here. We are currently aligning internally on this. Nvtrace uses ptrace to intercept syscalls. And then parses the args. It's a modified xfstrace. Worked very well for us. |
@thundergolfer I sent our sniffer patch to modal-labs/cuda_ioctl_sniffer#1. @derpsteb Awesome! Looking forward to it! We could also just pull in the necessary components from a https://github.com/edgelesssys repository if it resides there. I will just write my parser against it. And the parser will be linked into nvproxy package to provide accurate info. If nvtrace is written in Golang, one benefit of having it here would be that you can easily integrate it with packages like |
Just a quick update on the tooling @ayushr2 described above; we now have a simple tool to intercept Nvidia ioctl calls in tools/ioctl_sniffer. Right now it simply runs a GPU binary unsandboxed and reports any ioctls/commands/classes that nvproxy doesn't support, but we plan to expand on its functionality in the next few months. Hope this tool can be of use! |
Description
Hey everyone,
we are currently trying to utilize nvproxy with an H100 GPU that has it's confidential computing mode enabled. However, when trying to create a context on the GPU libcuda ends up in an endless loop. You can find the two syscalls that loop highlighted here. When running inside gvisor
uvm_validate_va_range
returns NV_ERR_INVALID_ADDRESS. The other logfiles in the libcuda-debug repo are stacktraces and process memory mappings while executing natively. The backtrace at the mmap actually is the same inside gvisor and natively.To get to this point we had to apply a few patches to gVisor. You can find them here. I am not very familiar with gVisor so those patches may already be faulty. Please review them before you dive deep into any debugging on your side.
Do you have any idea what could be the problem here? I would also appreciate any hints describing your dev setup while developing nvproxy. Since that could help our efforts right now. We already built a custom strace that decodes the ioctl cmds.
Steps to reproduce
git clone [email protected]:derpsteb/gvisor.git
git checkout h100-cc-mode
make copy TARGETS=runsc DESTINATION=bin/ && sudo cp ./bin/runsc /usr/local/bin
git clone [email protected]:derpsteb/libcuda-debug.git && cd libcuda-debug
git checkout gvisor-bugreport
cd cuMemory && make
./cuMemory
. It is expected that itcuInit
takes a while.docker run -ti --gpus=all -v $(realpath ./cuMemory):/cuMemory nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04 /cuMemory
ctrl+c
or runpkill
:docker run -ti --runtime=runsc --gpus=all -v $(realpath ./cuMemory):/cuMemory nvcr.io/nvidia/cuda:12.2.2-devel-ubuntu22.04 /cuMemory
runsc version
docker version (if using docker)
Client: Docker Engine - Community Version: 25.0.0 API version: 1.44 Go version: go1.21.6 Git commit: e758fe5 Built: Thu Jan 18 17:09:49 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 25.0.0 API version: 1.44 (minimum version 1.24) Go version: go1.21.6 Git commit: 615dfdf Built: Thu Jan 18 17:09:49 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.27 GitCommit: a1496014c916f9e62104b33d1bb5bd03b0858e59 runc: Version: 1.1.11 GitCommit: v1.1.11-0-g4bccb38 docker-init: Version: 0.19.0 GitCommit: de40ad0
uname
Linux guest 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 9 15:34:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
release-20231218.0-13-g1bc75a281
runsc debug logs (if available)
https://gist.github.com/derpsteb/0533a5b9acd1bf21938cf0245dbbd0cb Because it's so long. I also shortened the looping section at the end - you can spot it by searching for the mmap with length `0x3ab000`.
The text was updated successfully, but these errors were encountered: