-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory allocation becomes very slow when reserved bytes is large #1540
Comments
Can you try lowering the reserve amount: https://github.com/JuliaGPU/CUDA.jl/blob/5efcee664ff50cfa1e14ad9ca4dfe6f600fabb10/src/pool.jl#L87-L88= |
Seems to help in the sense that the number of reserved bytes are reduced whenever the maximum is reached, but I still get long stalls when it is reached. I'm not sure what is a good value for it though. julia> attribute!(memory_pool(device()), CUDA.MEMPOOL_ATTR_RELEASE_THRESHOLD, UInt64(9_000_000_000))
julia> testloop(20) # All went well this time
[ Info: time: 0.003 reserved: 2.1, used : 2.1
[ Info: time: 0.017 reserved: 3.2, used : 3.2
[ Info: time: 0.021 reserved: 4.3, used : 4.3
[ Info: time: 0.017 reserved: 5.4, used : 5.4
[ Info: time: 0.012 reserved: 6.4, used : 6.4
[ Info: time: 0.007 reserved: 7.5, used : 7.5
[ Info: time: 0.011 reserved: 8.6, used : 8.6
[ Info: time: 0.004 reserved: 9.7, used : 9.7
[ Info: time: 0.009 reserved: 11.0, used : 11.0
[ Info: time: 0.56 reserved: 9.0, used : 2.1
[ Info: time: 0.0 reserved: 9.0, used : 3.2
[ Info: time: 0.0 reserved: 9.0, used : 4.3
[ Info: time: 0.004 reserved: 9.0, used : 5.4
[ Info: time: 0.001 reserved: 9.0, used : 6.4
[ Info: time: 0.001 reserved: 9.0, used : 7.5
[ Info: time: 0.001 reserved: 9.0, used : 8.6
[ Info: time: 0.008 reserved: 9.7, used : 9.7
[ Info: time: 0.016 reserved: 11.0, used : 11.0
[ Info: time: 1.4 reserved: 9.0, used : 2.1
[ Info: time: 0.0 reserved: 9.0, used : 3.2
julia> testloop(20)
[ Info: time: 0.0 reserved: 9.0, used : 2.1
[ Info: time: 0.001 reserved: 9.0, used : 3.2
[ Info: time: 0.0 reserved: 9.0, used : 4.3
[ Info: time: 0.001 reserved: 9.0, used : 5.4
[ Info: time: 0.001 reserved: 9.0, used : 6.4
[ Info: time: 0.001 reserved: 9.0, used : 7.5
[ Info: time: 0.0 reserved: 9.0, used : 8.6
[ Info: time: 0.006 reserved: 9.7, used : 9.7
[ Info: time: 0.018 reserved: 11.0, used : 11.0
[ Info: time: 93.0 reserved: 9.0, used : 2.1 # OUCH!
[ Info: time: 0.0 reserved: 9.0, used : 3.2
[ Info: time: 0.0 reserved: 9.0, used : 4.3
[ Info: time: 0.0 reserved: 9.0, used : 5.4
[ Info: time: 0.005 reserved: 9.0, used : 6.4
[ Info: time: 0.001 reserved: 9.0, used : 7.5
[ Info: time: 0.001 reserved: 9.0, used : 8.6
[ Info: time: 0.004 reserved: 9.7, used : 9.7
[ Info: time: 0.008 reserved: 11.0, used : 11.0
[ Info: time: 1.7 reserved: 9.0, used : 2.1
[ Info: time: 0.001 reserved: 9.0, used : 3.2
julia> attribute!(memory_pool(device()), CUDA.MEMPOOL_ATTR_RELEASE_THRESHOLD, UInt64(0))
julia> gcreclaim()
julia> testloop(20)
[ Info: time: 0.004 reserved: 2.1, used : 2.1
[ Info: time: 0.02 reserved: 3.2, used : 3.2
[ Info: time: 0.016 reserved: 4.3, used : 4.3
[ Info: time: 0.006 reserved: 5.4, used : 5.4
[ Info: time: 0.008 reserved: 6.4, used : 6.4
[ Info: time: 0.007 reserved: 7.5, used : 7.5
[ Info: time: 0.003 reserved: 8.6, used : 8.6
[ Info: time: 0.007 reserved: 9.7, used : 9.7
[ Info: time: 0.004 reserved: 11.0, used : 11.0
[ Info: time: 21.0 reserved: 2.1, used : 2.1 # Still happens :(
[ Info: time: 0.005 reserved: 3.2, used : 3.2
[ Info: time: 0.009 reserved: 4.3, used : 4.3
[ Info: time: 0.014 reserved: 5.4, used : 5.4
[ Info: time: 0.006 reserved: 6.4, used : 6.4
[ Info: time: 0.003 reserved: 7.5, used : 7.5
[ Info: time: 0.005 reserved: 8.6, used : 8.6
[ Info: time: 0.004 reserved: 9.7, used : 9.7
[ Info: time: 0.008 reserved: 11.0, used : 11.0
[ Info: time: 3.5 reserved: 2.1, used : 2.1
[ Info: time: 0.012 reserved: 3.2, used : 3.2
|
Memory handling and GC integration has changed significantly, so I don't think this issue as reported here is still relevant. If the problem persists on CUDA.jl#master, feel free to open a new issue! |
Describe the bug
Memory allocation often becomes very slow when reserved bytes is large. Doing manual GC and reclaim seems to prevent the issue from occuring. Tested on 3.10.1 and master.
To reproduce
The Minimal Working Example (MWE) for this bug:
Run to completion
Abort during stall
With manual GC and reclaim
Manifest.toml
Expected behavior
Using manual GC and reclaim should not be required.
Version info
Details on Julia:
Details on CUDA:
Additional context
The MWE seems to mostly trigger when used bytes is large too but in the real application it starts to happen when reserved bytes is large. Interrupting the program during a stall seems to give the same stacktrace in both mwe and real application so it is probably the same cause.
The text was updated successfully, but these errors were encountered: