-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive allocations when running on multiple threads #1429
Comments
The problem definitely seems to be with how memory is created on threads. By running on the same set of threads every time, the problem seem to dissapear. I am not sure how useful this solution could be for my actual use case, but it might give some insight into the problem. using CUDA
import CUDA.CUDNN: cudnnConvolutionForward
const W1 = cu(randn(5,5,3,6))
function inference(imgs)
out = cudnnConvolutionForward(W1, imgs)
return maximum(Array(out))
end
# Channel to put work on
const RecieverChannel = Channel{Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer},Channel{Float32}}}()
function listen_channel()
while true
imgs, res_channel = take!(RecieverChannel)
res = inference(imgs)
put!(res_channel, res)
end
end
# Create N tasks to do the GPU computations
tsks = [Threads.@spawn listen_channel() for i in 1:2]
# Ask one of the Thread-workers to do the job
function inference_caller(imgs)
res_channel = Channel{Float32}()
put!(RecieverChannel, (imgs,res_channel))
take!(res_channel)
end
imgs = cu(randn(28,28,3,1000))
N = 100
for k in 1:20
for j in 1:10
res = Any[nothing for i in 1:N]
for i in 1:N # We spawn a lot of work
res[i] = Threads.@spawn inference_caller(imgs)
end
s = sum(fetch.(res))
println(s)
end
end |
Hmm, I cannot reproduce. On Linux, CUDA.jl#master, Julia 1.7.2 with 32 threads, trying your original example:
Running it in the REPL, i.e. not in a |
The problem was not solved by changing CUDA.jl versions or changing system drivers. However, the problem does not seem to be reproducible on other machines. I tried myself on AWS without any problems. |
Very strange. Using the same Julia binaries everywhere? Number of threads Julia was launched with? |
Yes, I have tried several different configurations but I am only able to reproduce it on that machine. Not sure if it could be specific to the gpu model or even related the the specific hardware. |
It's been a while since there's been activity here, so I'm going to close this. If this still happens on latest master, don't hesitate to open a new issue with an updated MWE. |
I'm trying to run a neutral network (using Flux) on a production server. I have no problems when running everything in one thread, but memory allocations on the GPU starts increasing when i put it behind a Mux.jl server and eventually I get a cuda out of memory error (or other cuda related error). This happens regardless if I try to run all the requests serially or in parallel, even with a lock around the gpu computations.
The following example runs without problems without
Threads.@spawn
, and never allocates above1367MiB
on the GPU. However, when the computations are run in a separate thread as in this example, it keeps allocating memory on every call until it eventually crashes somehow.MWE:
Running on new project with only CUDA v3.8.3 as dependency.
Before running loop first time:
After 1:
After 2:
After 3:
After 4 (and crash)
The text was updated successfully, but these errors were encountered: