Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error #57

Closed
Pandabear314 opened this issue Aug 13, 2021 · 12 comments
Closed

CUDA Error #57

Pandabear314 opened this issue Aug 13, 2021 · 12 comments

Comments

@Pandabear314
Copy link

While atempting to utalize AlphaZero for tetris I keep running into this error when running it on the GPU. I have reproduced this error on two separate machines, and happens consistently when launching a checkpoint evaluation. I am wondering if someone has insight into what might be causing this.

Repo:
https://gitlab.com/samdickinson314/tetrisai
include("runner.jl")

    Launching a checkpoint evaluation

CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:22
  [2] macro expansion
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\error.jl:39 [inlined]
  [3] cudnnActivationForward(handle::Ptr{Nothing}, activationDesc::CUDA.CUDNN.cudnnActivationDescriptor, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
    @ CUDA.CUDNN C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\utils\call.jl:26
  [4] #cudnnActivationForwardAD#657
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:48 [inlined]
  [5] #cudnnActivationForwardWithDefaults#656
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:42 [inlined]
  [6] #cudnnActivationForward!#653
    @ C:\Users\dickisp1\.julia\packages\CUDA\CtvPY\lib\cudnn\activation.jl:22 [inlined]
  [7] #35
    @ C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:13 [inlined]
  [8] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(NNlib.relu), Tuple{CUDA.CuArray{Float32, 4}}})
    @ NNlibCUDA C:\Users\dickisp1\.julia\packages\NNlibCUDA\Oc2CZ\src\cudnn\activations.jl:30
  [9] (::Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4}, cache::Nothing)
    @ Flux.CUDAint C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:9
 [10] BatchNorm
    @ C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\cuda\cudnn.jl:6 [inlined]
 [11] applychain(fs::Tuple{Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}, x::CUDA.CuArray{Float32, 4}) (repeats 2 times)
    @ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:37
 [12] (::Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}, Flux.Chain{Tuple{Flux.SkipConnection{Flux.Chain{Tuple{Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(NNlib.relu), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}, Flux.Conv{2, 2, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}}, Flux.BatchNorm{typeof(identity), CUDA.CuArray{Float32, 1}, Float32, CUDA.CuArray{Float32, 1}}}}, typeof(+)}, AlphaZero.FluxLib.var"#15#16"}}}})(x::CUDA.CuArray{Float32, 4})
    @ Flux C:\Users\dickisp1\.julia\packages\Flux\Zz9RI\src\layers\basic.jl:39
 [13] forward(nn::ResNet, state::CUDA.CuArray{Float32, 4})
    @ AlphaZero.FluxLib C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\flux.jl:142
 [14] forward_normalized(nn::ResNet, state::CUDA.CuArray{Float32, 4}, actions_mask::CUDA.CuArray{Float32, 2})
    @ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:264
 [15] evaluate_batch(nn::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
    @ AlphaZero.Network C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\networks\network.jl:312
 [16] fill_and_evaluate(net::ResNet, batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}}; batch_size::Int64, fill_batches::Bool)
    @ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:32
 [17] (::AlphaZero.var"#36#37"{Int64, Bool, ResNet})(batch::Vector{NamedTuple{(:board, :current_piece, :next_piece, :score, :pieces_placed, :seed), Tuple{StaticArrays.SMatrix{22, 10, Bool, 220}, Int64, Int64, Int64, Int64, Int64}}})
    @ AlphaZero C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\simulations.jl:54
 [18] macro expansion
    @ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\batchifier.jl:68 [inlined]
 [19] macro expansion
    @ C:\Users\dickisp1\.julia\packages\AlphaZero\Onn8G\src\util.jl:20 [inlined]
 [20] (::AlphaZero.Batchifier.var"#2#4"{Int64, AlphaZero.var"#36#37"{Int64, Bool, ResNet}, Channel{Any}})()
    @ AlphaZero.Batchifier C:\Users\dickisp1\.julia\packages\ThreadPools\ROFEh\src\macros.jl:261Interrupted by the user

1
@SheldonCurtiss
Copy link

Is this on a local machine?
I'd say reboot which has fixed this in the past for me, if not ensure you have the proper drivers which can be quite a nightmare in some cases.

@Pandabear314
Copy link
Author

Yes, I have rebooted both machines as well as reinstalled/recompiled all my Julia packages to clear any bad versions. The two machines (one a laptop and the other a deskop) have different NVidea GPU's and drivers so I do not think it is a driver issue though that can never be ruled out. I will try messing with that in the mean time.

@jonathan-laurent
Copy link
Owner

jonathan-laurent commented Aug 13, 2021

What version of CUDA.jl are you using? Can I see the result of CUDA.versioninfo() on your machine?
Also, are you using the Knet or Flux backend?

I have seen many different problems resulting in Code 8 errors, including Out Of Memory errors (are you sure you have enough memory on your GPU to accommodate your network) and bugs in CUDA.jl (AlphaZero.jl is a stress test for CUDA.jl).

@jonathan-laurent
Copy link
Owner

PS: I love the idea of using AlphaZero on Tetris!
I am looking forward to hearing more about your experiment and I would happily accept a PR to add Tetris to AlphaZero.Examples.

@Pandabear314
Copy link
Author

from CUDA.versioninfo():

CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.4.0
NVIDIA driver 471.68.0

Libraries:
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+471.68
- CUDNN: 8.20.2 (for CUDA 11.4.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)

Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce MX130 (sm_50, 1.963 GiB / 2.000 GiB available)

This is from my laptop. So it does not have a lot of memory but my GPU on my desktop has 8GB. Based on the stacktrace im pretty sure I am using the Flux backend.

I had tried to reduce the memory usage by reducing the number of boards stored not by the network size. I can try that next.

@jonathan-laurent
Copy link
Owner

I see nothing wrong with your versioninfo().
The most useful hyperparameters to tweak if you lack GPU memory are network size and batch size.
Reducing the number of boards stored in memory mostly has impact on RAM usage (and I guess a rather small one in most cases).

There are scripts in the script/profile directory that you can use to profile inference and self-play. It could be helpful in figuring out the best hyperparameters.

Note that it is also possible that what you are observing comes from a problem with CUDA.jl, as I've seen it happen in the past.

@SheldonCurtiss
Copy link

100% due to memory constraints on the gpu. I agree with his suggestion to lower batch size. How much vram do you have? Unsure but I’d assume the size of your vectorized states would effect the sizes of this as well.

recently I’ve been trying to get the most out of both my cpu and gpu and it’s typically very much a trail and error balancing act from my experience.

@SheldonCurtiss
Copy link

Another thing to note is that alphazero.Jl appears to preallocate all available gpu vram so that’s not a good way to measure.
Unsure if there’s a way to disable that or measure the usage similar to tensorflow.

@Pandabear314
Copy link
Author

Thank you all for you help so far. I just need to get some simple results this week so ill just run this on the CPU for now, but will be back in a couple weeks to work through this, then maybe set up a PR for Tetris.

@jonathan-laurent
Copy link
Owner

If you want to get results on CPU, you probably need to simplify the problem somehow (for example by looking at a smaller grid). I suspect that original Tetris is too complicated for AlphaZero to learn the game in a reasonable amount of time without a GPU. That being said, I may be wrong here. In any case, you will need to use a much smaller network if you want to train your agent on CPU.

@jonathan-laurent
Copy link
Owner

@Pandabear314 One thing you may also want to do is to update all dependencies using Pkg.update. Indeed, based on your version info, you are not using the latest version of CUDNN.

@Pandabear314
Copy link
Author

@SheldonCurtiss was correct with the batch size being the culprit of me running out of VRAM, and everything runs correctly once I reduce that.

Also the PR may take some time as will have to reformulate how Tetris is run by AlphaZero as my current implementation does not learn, but I have a few ideas to try yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants