-
-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diffusion_mnist broken throws CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3) #367
Comments
Some additional information
|
Follow up on this
I was still getting error and suspected mismatch between 11.7 artifacts which is most recent provided by CUDA.jl and system's driver (however this shouldn't cause issues).
Now I am using system wide CUDA toolkit and CUDNN libraries, both are matching according to
Still the same error.
Which produced following trace
|
I did few more tests and
Regarding vgg_cifar10. First time I started julia session and updated packages, I got CUDNN_STATUS_INTERNAL_ERROR (code 4). But after restating Julia session and trying again, this model trains just fine. This is not the case with diffusion_mnist which never works and throws CUDNN_STATUS_BAD_PARAM (code 3). diffusion_mnist trains on CPU also fine. |
If you're able to narrow it down to the particular conv layer which is throwing the error, we can try to replicate those conditions in a MWE. |
The model is broken because it uses negative padding which is not supported on GPU
But CPU version works as author indended
I don't know if this is same in the original model though, I doubt so, because PyTorch does not support negative padding (despite there has been discussions about supporting it).
and not supported by NVIDIA CUDA
|
The text was updated successfully, but these errors were encountered: