-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New snapshot - CUDA error: the provided PTX was compiled with an unsupported toolchain. #62
Comments
Yes the native bindings have been updated to CUDA 12.3 as of bytedeco/javacpp-presets@7d56a2c. Could you check if you have a recent Nvidia driver installed? At least 535. I ran into this issue too with a 530 driver but after updating it started working again. You can check with See also this thread and the CUDA Compatibility docs. There's some discussion about using upstream libtorch in bytedeco/javacpp-presets#1426 which is compiled against CUDA 12.1 but it's unclear if this is feasible. Welcome to CUDA depency hell 🙈 |
@sbrunk Thanks for the help. I have checked the compatibility matrix. Command
All seems to be good. I also have:
Could this be an issue of simply not finding |
Strange. I made an update & upgrade. Did a reboot. Now I get (no libtorch warning):
|
Hmm my assumption about the driver might have been wrong (it might be needed too, but not perhaps it's not enough). While testing an update to PyTorch 2.1, I just compiled libtorch from source using CUDA 12.3. I tried to run some tests on the GPU then, and ran into the error you're seeing: I went back to main and the error still occurs. This is a machine with an A 6000 ADA GPU (ampere 86).
@hmf what's the GPU you're testing this on? |
@sbrunk I have a NVIDIA GeForce RTX 3090 in the VM EDIT: Now I get he same error in the VM as the dev container for both |
According to https://developer.nvidia.com/cuda-gpus, the RTX 3090 has compute capability 8.6 just like the RTX A6000, supporting my assumption that we might need to update the @HGuillemet do you think this makes sense? |
Hmm yeah that's a bit weird. If I recall correctly I've seen the error too and then it did work again (after upgrading the driver) until I tested again today. Perhaps check your I changed the issue title BTW because the other message is just a warning and is not causing issues here. |
I figured I would try to install CUDA Toolkit 12.3 in a Dev container just to see what happens. So I get this error when I try it out:
If I looked at the cuDNN support matrix, I see that the latest version for the CUDA Toolkit Version is I am assuming that libTorch compilation requires and links to CUDA and CuDNN. So how were the JavaCPP binaries created? In the build script I see variables pointing to a CuDNN installation. Maybe |
I built libtorch from source against CUDA 12.3 and with added support for compute capability 8.0 and 8.6 but I'm still running into the same issue. I didn't check CuDNN before but now I'm seeing that my version is also for 12.2, not sure if this an issue though. I'll try to compile libtorch against CUDA 12.1 next. For these tests, I'm using the Ubuntu CUDA/CuDNN packages whose installation is described here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu |
So CuDNN version is
I have a hunch this might not be the issue. I think that if you don't add the capability, you simple don't take advantage of the hardware but it should still execute without problems. As I said, this seemed to work with CUDA 11.8 (and respective CuDNN).
Hope that works. But then 12.2 is also a possible candidate, right?
Used the same instructions in the bare VM with the same negative results. |
I would also say that the "CUDA error: the provided PTX was compiled with an unsupported toolchain" is related to the CUDA driver version. But with version 535 it should not happen. You can try running with libtorch binary downloadable from pytorch.org instead of the libraries in JavaCPP (using It's sure that we'd better avoid all problems related to PTX (this error, long startup time when compiling PTX, features lacking, size of cuda cache...) and provide cubins for all major architectures, but we are limited to only 1 or 2 architecture when compiling Pytorch presets because of the compilation time it requires and GitHub actions limits. That's why I'd like to include the libtorch binaries provided by pytorch.org in JavaCPP presets. But I still need to test what is doable. |
I've done some tests. I've put the results into the 2.1.0 update PR thread bytedeco/javacpp-presets#1426 (comment) |
@sbrunk Can you tell me what the status of this is? Bytedeco snapshot seem to be the same version. I have a CUDA driver 12.2 on the bare metal machine. Tests with Toolkit 11.8, 12.2, 12.1 and 12.3 on the dev container don't seem to work (they use the 12.2 driver). What is the plan for Storch? Move all CUDA to 12.3 (drivers 545.XX.YY)? At this moment, can I assume that the current snapshot won't woprk with the CUDA 12.2 + 535.XXX.YY drivers? |
Yes but it should start working again once we update to PyTorch 2.1 #64 which brings in fixes on the JavaCPP bindings. See bytedeco/javacpp-presets#1426 (comment). The main reason why I haven't pushed the update yet is that PyTorch 2.1 adds new data types which we need to add as well. |
@sbrunk Thanks for the information.
Hope that does not entail too many changes on my side. |
I don't think so. The upgrade seems to be quite smooth so far. I've started working on integrating the datatypes and hopefully we'll be able to get it ready in the next few days. |
@sbrunk I was able to compile and run the LeNet example. I did this in both Mill and SBT using a clone of the latest master. Unfortunately the GPU does not seem to be detected. Here is what I get with SBT on a second run: sbt:root> examples/runMain LeNetApp
[info] running (fork) LeNetApp
[info] Using device: Device(CPU,-1)
[info] Epoch: 1 | Batch: 0 | Training loss: 2.2934 | Eval loss: 2.3007 | Eval accuracy: 0.1028
[info] Epoch: 1 | Batch: 200 | Training loss: 0.7505 | Eval loss: 0.7085 | Eval accuracy: 0.7803
[info] Epoch: 1 | Batch: 400 | Training loss: 0.3910 | Eval loss: 0.4682 | Eval accuracy: 0.8660
[info] Epoch: 1 | Batch: 600 | Training loss: 0.2330 | Eval loss: 0.3810 | Eval accuracy: 0.8841
[info] Epoch: 1 | Batch: 800 | Training loss: 0.5794 | Eval loss: 0.3249 | Eval accuracy: 0.9006
[info] Epoch: 1 | Batch: 1000 | Training loss: 0.3650 | Eval loss: 0.2753 | Eval accuracy: 0.9142
[info] Epoch: 1 | Batch: 1200 | Training loss: 0.3777 | Eval loss: 0.2485 | Eval accuracy: 0.9245
[info] Epoch: 1 | Batch: 1400 | Training loss: 0.3084 | Eval loss: 0.2167 | Eval accuracy: 0.9312
[info] Epoch: 1 | Batch: 1600 | Training loss: 0.2847 | Eval loss: 0.2006 | Eval accuracy: 0.9381
[info] Epoch: 1 | Batch: 1800 | Training loss: 0.1045 | Eval loss: 0.1804 | Eval accuracy: 0.9414
[info] Epoch: 2 | Batch: 0 | Training loss: 0.1411 | Eval loss: 0.1729 | Eval accuracy: 0.9434
[info] Epoch: 2 | Batch: 200 | Training loss: 0.2209 | Eval loss: 0.1674 | Eval accuracy: 0.9449
[info] Epoch: 2 | Batch: 400 | Training loss: 0.2056 | Eval loss: 0.1520 | Eval accuracy: 0.9509
[info] Epoch: 2 | Batch: 600 | Training loss: 0.2192 | Eval loss: 0.1472 | Eval accuracy: 0.9538
[info] Epoch: 2 | Batch: 800 | Training loss: 0.0281 | Eval loss: 0.1298 | Eval accuracy: 0.9568
[info] Epoch: 2 | Batch: 1000 | Training loss: 0.1213 | Eval loss: 0.1239 | Eval accuracy: 0.9595
[info] Epoch: 2 | Batch: 1200 | Training loss: 0.1853 | Eval loss: 0.1166 | Eval accuracy: 0.9628
[info] Epoch: 2 | Batch: 1400 | Training loss: 0.0517 | Eval loss: 0.1129 | Eval accuracy: 0.9659
[info] Epoch: 2 | Batch: 1600 | Training loss: 0.0980 | Eval loss: 0.1028 | Eval accuracy: 0.9671
[info] Epoch: 2 | Batch: 1800 | Training loss: 0.1265 | Eval loss: 0.0968 | Eval accuracy: 0.9693
[info] Epoch: 3 | Batch: 0 | Training loss: 0.1374 | Eval loss: 0.0927 | Eval accuracy: 0.9706
[info] Epoch: 3 | Batch: 200 | Training loss: 0.2114 | Eval loss: 0.0883 | Eval accuracy: 0.9722
[info] Epoch: 3 | Batch: 400 | Training loss: 0.0732 | Eval loss: 0.0893 | Eval accuracy: 0.9711
[info] Epoch: 3 | Batch: 600 | Training loss: 0.1169 | Eval loss: 0.0814 | Eval accuracy: 0.9743
[info] Epoch: 3 | Batch: 800 | Training loss: 0.1020 | Eval loss: 0.0783 | Eval accuracy: 0.9752
[info] Epoch: 3 | Batch: 1000 | Training loss: 0.0209 | Eval loss: 0.0789 | Eval accuracy: 0.9751
[info] Epoch: 3 | Batch: 1200 | Training loss: 0.0231 | Eval loss: 0.0762 | Eval accuracy: 0.9756
[info] Epoch: 3 | Batch: 1400 | Training loss: 0.0405 | Eval loss: 0.0747 | Eval accuracy: 0.9762
[info] Epoch: 3 | Batch: 1600 | Training loss: 0.0203 | Eval loss: 0.0704 | Eval accuracy: 0.9773
[info] Epoch: 3 | Batch: 1800 | Training loss: 0.0165 | Eval loss: 0.0651 | Eval accuracy: 0.9800
[info] Epoch: 4 | Batch: 0 | Training loss: 0.0210 | Eval loss: 0.0682 | Eval accuracy: 0.9789
[info] Epoch: 4 | Batch: 200 | Training loss: 0.1051 | Eval loss: 0.0620 | Eval accuracy: 0.9812
[info] Epoch: 4 | Batch: 400 | Training loss: 0.1084 | Eval loss: 0.0628 | Eval accuracy: 0.9801
[info] Epoch: 4 | Batch: 600 | Training loss: 0.0055 | Eval loss: 0.0624 | Eval accuracy: 0.9802
[info] Epoch: 4 | Batch: 800 | Training loss: 0.0484 | Eval loss: 0.0593 | Eval accuracy: 0.9820
[info] Epoch: 4 | Batch: 1000 | Training loss: 0.0068 | Eval loss: 0.0591 | Eval accuracy: 0.9820
[info] Epoch: 4 | Batch: 1200 | Training loss: 0.1645 | Eval loss: 0.0563 | Eval accuracy: 0.9829
[info] Epoch: 4 | Batch: 1400 | Training loss: 0.0205 | Eval loss: 0.0565 | Eval accuracy: 0.9827
[info] Epoch: 4 | Batch: 1600 | Training loss: 0.0576 | Eval loss: 0.0536 | Eval accuracy: 0.9836
[info] Epoch: 4 | Batch: 1800 | Training loss: 0.0070 | Eval loss: 0.0540 | Eval accuracy: 0.9835
[info] Epoch: 5 | Batch: 0 | Training loss: 0.0614 | Eval loss: 0.0535 | Eval accuracy: 0.9833
[info] Epoch: 5 | Batch: 200 | Training loss: 0.0102 | Eval loss: 0.0514 | Eval accuracy: 0.9846
[info] Epoch: 5 | Batch: 400 | Training loss: 0.1576 | Eval loss: 0.0535 | Eval accuracy: 0.9819
[info] Epoch: 5 | Batch: 600 | Training loss: 0.1332 | Eval loss: 0.0520 | Eval accuracy: 0.9837
[info] Epoch: 5 | Batch: 800 | Training loss: 0.0251 | Eval loss: 0.0537 | Eval accuracy: 0.9820
[info] Epoch: 5 | Batch: 1000 | Training loss: 0.2369 | Eval loss: 0.0542 | Eval accuracy: 0.9835
[info] Epoch: 5 | Batch: 1200 | Training loss: 0.3197 | Eval loss: 0.0494 | Eval accuracy: 0.9856
[info] Epoch: 5 | Batch: 1400 | Training loss: 0.0147 | Eval loss: 0.0529 | Eval accuracy: 0.9831
[info] Epoch: 5 | Batch: 1600 | Training loss: 0.0183 | Eval loss: 0.0539 | Eval accuracy: 0.9832
[info] Epoch: 5 | Batch: 1800 | Training loss: 0.1724 | Eval loss: 0.0500 | Eval accuracy: 0.9844
[success] Total time: 41 s, completed Nov 27, 2023, 8:59:31 AM Here is what nvidia-smi reports:
The above was tested on a clean dev container. Are the versions of CUDA and the NVIDIA driver ok? EDIT: Same tests and results on my laptop on bare metal (has about the same setup). |
@hmf just to be sure when trying to run with sbt did you enable GPU in the sbt build? I.e.
It defaults to false currently. If I set it to true, the build switches to GPU-enabled LibTorch and adds The example checks It doesn't hurt to also delete your |
@sbrunk You are correct that I did not set the GPU in sbt. I then set the GPU and that failed. So as per your suggestion, I removed the Could you confirm that the javaCPP libraries are correct? I have the following results: EDIT: think there is something wrong with my OS. Will have to dig into it.
TIA |
Could you check the output of nvidia-smi or nvtop to see if it is actually using the GPU? |
@sbrunk You are correct. |
While working on #61, I rebuilt my dev container. The build failed. Upon investigations I see that the old snapshot does not exist. We now have:
12.3-8.9-1.5.10-SNAPSHOT/ Wed Oct 25 13:34:15 UTC 2023
maven-metadata.xml Wed Oct 25 14:13:45 UTC 2023 393
maven-metadata.xml.md5 Wed Oct 25 14:13:45 UTC 2023 32
maven-metadata.xml.sha1 Wed Oct 25 14:13:45 UTC 2023 40
maven-metadata.xml.sha256 Wed Oct 25 14:13:45 UTC 2023 64
maven-metadata.xml.sha512 Wed Oct 25 14:13:45 UTC 2023 128
I changed the library (Ivy dependencies). Now the build is ok, but execution fails. I suspect the problem may be due to the use of the CUDA 12.3 version. I now get the following message when running an app:
Can anyone confirm this is the most probable cause? Of course the issue now is installing the CUDA libraries in the dev-container. Only 12.2 is currently supported. Is installing 12.3 the only way forward?
TIA
The text was updated successfully, but these errors were encountered: