Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation becomes very slow when reserved bytes is large #1540

Closed
DrChainsaw opened this issue Jun 8, 2022 · 3 comments
Closed

Memory allocation becomes very slow when reserved bytes is large #1540

DrChainsaw opened this issue Jun 8, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@DrChainsaw
Copy link

Describe the bug

Memory allocation often becomes very slow when reserved bytes is large. Doing manual GC and reclaim seems to prevent the issue from occuring. Tested on 3.10.1 and master.

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA


function testloop(n; gc=false)
    x = cu(randn(Float32, 2^14 , 2^14))
    t=time()
    for _ in 1:n      
        x .+ x
        t = memstatetime(t, gc)
    end
end

function memstatetime(t0, gc)
    if gc && CUDA.MemoryInfo().pool_reserved_bytes > 9e9
        @info "Do GC"
        gcreclaim()
    end

    t1 = time()
    @info "time: $(round(t1-t0; sigdigits=2)) reserved: $(round(CUDA.MemoryInfo().pool_reserved_bytes / 1e9; sigdigits=2)), used : $(round(CUDA.MemoryInfo().pool_used_bytes / 1e9; sigdigits=2))"
    t1
end

function gcreclaim()
    GC.gc()
    CUDA.reclaim()
end
Run to completion
julia> testloop(20)
[ Info: time: 0.0 reserved: 2.1, used : 2.1
[ Info: time: 0.027 reserved: 3.2, used : 3.2
[ Info: time: 0.013 reserved: 4.3, used : 4.3
[ Info: time: 0.009 reserved: 5.4, used : 5.4
[ Info: time: 0.008 reserved: 6.4, used : 6.4
[ Info: time: 0.008 reserved: 7.5, used : 7.5
[ Info: time: 0.009 reserved: 8.6, used : 8.6
[ Info: time: 0.01 reserved: 9.7, used : 9.7
[ Info: time: 0.007 reserved: 11.0, used : 11.0
[ Info: time: 0.56 reserved: 12.0, used : 2.1
[ Info: time: 0.001 reserved: 12.0, used : 3.2
[ Info: time: 0.0 reserved: 12.0, used : 4.3
[ Info: time: 0.001 reserved: 12.0, used : 5.4
[ Info: time: 0.001 reserved: 12.0, used : 6.4
[ Info: time: 0.001 reserved: 12.0, used : 7.5
[ Info: time: 0.001 reserved: 12.0, used : 8.6
[ Info: time: 0.0 reserved: 12.0, used : 9.7
[ Info: time: 0.001 reserved: 12.0, used : 11.0
[ Info: time: 45.0 reserved: 12.0, used : 2.1 # <= This is the bad one!
[ Info: time: 0.0 reserved: 12.0, used : 3.2
Abort during stall
julia> testloop(20)
[ Info: time: 0.003 reserved: 2.1, used : 2.1
[ Info: time: 0.013 reserved: 3.2, used : 3.2
[ Info: time: 0.016 reserved: 4.3, used : 4.3
[ Info: time: 0.013 reserved: 5.4, used : 5.4
[ Info: time: 0.01 reserved: 6.4, used : 6.4
[ Info: time: 0.006 reserved: 7.5, used : 7.5
[ Info: time: 0.007 reserved: 8.6, used : 8.6
[ Info: time: 0.008 reserved: 9.7, used : 9.7
[ Info: time: 0.006 reserved: 11.0, used : 11.0
^C
ERROR: InterruptException:
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base .\task.jl:812
  [2] wait()
    @ Base .\task.jl:872
  [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
    @ Base .\condition.jl:123
  [4] _wait(t::Task)
    @ Base .\task.jl:293
  [5] sync_end(c::Channel{Any})
    @ Base .\task.jl:361
  [6] macro expansion
    @ .\task.jl:400 [inlined]
  [7] nonblocking_synchronize
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\lib\cudadrv\stream.jl:164 [inlined]
  [8] synchronize(stream::CuStream; blocking::Nothing)
    @ CUDA E:\Programs\julia\.julia\packages\CUDA\GGwVa\lib\cudadrv\stream.jl:128
  [9] synchronize
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\lib\cudadrv\stream.jl:122 [inlined]
 [10] macro expansion
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\pool.jl:246 [inlined]
 [11] macro expansion
    @ .\timing.jl:299 [inlined]
 [12] #_alloc#170
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\pool.jl:313 [inlined]
 [13] #alloc#169
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\pool.jl:299 [inlined]
 [14] alloc
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\pool.jl:295 [inlined]
 [15] CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64})
    @ CUDA E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\array.jl:42
 [16] CuArray
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\array.jl:125 [inlined]
 [17] CuArray
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\array.jl:136 [inlined]
 [18] similar
    @ .\abstractarray.jl:829 [inlined]
 [19] similar
    @ .\abstractarray.jl:828 [inlined]
 [20] similar
    @ E:\Programs\julia\.julia\packages\CUDA\GGwVa\src\broadcast.jl:11 [inlined]
 [21] copy(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}})
    @ GPUArrays E:\Programs\julia\.julia\packages\GPUArrays\Zecv7\src\host\broadcast.jl:47
 [22] materialize
    @ .\broadcast.jl:860 [inlined]
 [23] testloop(n::Int64; gc::Bool)
    @ Main E:\swproj\CUDAMemMwe\memmwe.jl:8
 [24] testloop(n::Int64)
    @ Main E:\swproj\CUDAMemMwe\memmwe.jl:5
 [25] top-level scope
    @ REPL[14]:1
With manual GC and reclaim
julia> testloop(20; gc=true)
[ Info: time: 0.0 reserved: 2.1, used : 2.1
[ Info: time: 0.02 reserved: 3.2, used : 3.2
[ Info: time: 0.008 reserved: 4.3, used : 4.3
[ Info: time: 0.004 reserved: 5.4, used : 5.4
[ Info: time: 0.004 reserved: 6.4, used : 6.4
[ Info: time: 0.004 reserved: 7.5, used : 7.5
[ Info: time: 0.003 reserved: 8.6, used : 8.6
[ Info: Do GC
[ Info: time: 0.42 reserved: 1.1, used : 1.1
[ Info: time: 0.012 reserved: 2.1, used : 2.1
[ Info: time: 0.008 reserved: 3.2, used : 3.2
[ Info: time: 0.014 reserved: 4.3, used : 4.3
[ Info: time: 0.005 reserved: 5.4, used : 5.4
[ Info: time: 0.003 reserved: 6.4, used : 6.4
[ Info: time: 0.003 reserved: 7.5, used : 7.5
[ Info: time: 0.004 reserved: 8.6, used : 8.6
[ Info: Do GC
[ Info: time: 0.45 reserved: 1.1, used : 1.1
[ Info: time: 0.018 reserved: 2.1, used : 2.1
[ Info: time: 0.011 reserved: 3.2, used : 3.2
[ Info: time: 0.01 reserved: 4.3, used : 4.3
[ Info: time: 0.014 reserved: 5.4, used : 5.4
Manifest.toml

# This file is machine-generated - editing it directly is not advised

julia_version = "1.7.3"
manifest_format = "2.0"

[[deps.AbstractFFTs]]
deps = ["ChainRulesCore", "LinearAlgebra"]
git-tree-sha1 = "6f1d9bc1c08f9f4a8fa92e3ea3cb50153a1b40d4"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.1.0"

[[deps.Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "af92965fb30777147966f58acb05da51c5616b5f"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.3.3"

[[deps.ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"

[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[deps.BFloat16s]]
deps = ["LinearAlgebra", "Printf", "Random", "Test"]
git-tree-sha1 = "a598ecb0d717092b5539dbbe890c98bac842b072"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.2.0"

[[deps.Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[deps.CEnum]]
git-tree-sha1 = "eb4cb44a499229b3b8426dcfb5dd85333951ff90"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.4.2"

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CompilerSupportLibraries_jll", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions", "TimerOutputs"]
git-tree-sha1 = "a5dca49524292fc9d7c5b5c42942dcc2ebb5b852"
repo-rev = "master"
repo-url = "https://github.com/JuliaGPU/CUDA.jl.git"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "3.10.1"

[[deps.ChainRulesCore]]
deps = ["Compat", "LinearAlgebra", "SparseArrays"]
git-tree-sha1 = "9489214b993cd42d17f44c36e359bf6a7c919abf"
uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
version = "1.15.0"

[[deps.ChangesOfVariables]]
deps = ["ChainRulesCore", "LinearAlgebra", "Test"]
git-tree-sha1 = "1e315e3f4b0b7ce40feded39c73049692126cf53"
uuid = "9e997f8a-9a97-42d5-a9f1-ce6bfc15e2c0"
version = "0.1.3"

[[deps.Compat]]
deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"]
git-tree-sha1 = "9be8be1d8a6f44b96482c8af52238ea7987da3e3"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "3.45.0"

[[deps.CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"

[[deps.Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"

[[deps.DelimitedFiles]]
deps = ["Mmap"]
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"

[[deps.Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"

[[deps.DocStringExtensions]]
deps = ["LibGit2"]
git-tree-sha1 = "b19534d1895d702889b219c382a6e18010797f0b"
uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
version = "0.8.6"

[[deps.Downloads]]
deps = ["ArgTools", "FileWatching", "LibCURL", "NetworkOptions"]
uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[[deps.ExprTools]]
git-tree-sha1 = "56559bbef6ca5ea0c0818fa5c90320398a6fbf8d"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.8"

[[deps.FileWatching]]
uuid = "7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee"

[[deps.GPUArrays]]
deps = ["Adapt", "LLVM", "LinearAlgebra", "Printf", "Random", "Serialization", "Statistics"]
git-tree-sha1 = "c783e8883028bf26fb05ed4022c450ef44edd875"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "8.3.2"

[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "d8c5999631e1dc18d767883f621639c838f8e632"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.15.2"

[[deps.InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"

[[deps.InverseFunctions]]
deps = ["Test"]
git-tree-sha1 = "c6cf981474e7094ce044168d329274d797843467"
uuid = "3587e190-3f89-42d0-90ee-14403ec27112"
version = "0.1.6"

[[deps.IrrationalConstants]]
git-tree-sha1 = "7fd44fd4ff43fc60815f8e764c0f352b83c49151"
uuid = "92d709cd-6900-40b7-9082-c6be49f344b6"
version = "0.1.1"

[[deps.JLLWrappers]]
deps = ["Preferences"]
git-tree-sha1 = "abc9885a7ca2052a736a600f7fa66209f96506e1"
uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210"
version = "1.4.1"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "e7e9184b0bf0158ac4e4aa9daf00041b5909bf1a"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "4.14.0"

[[deps.LLVMExtra_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg", "TOML"]
git-tree-sha1 = "771bfe376249626d3ca12bcd58ba243d3f961576"
uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab"
version = "0.0.16+0"

[[deps.LazyArtifacts]]
deps = ["Artifacts", "Pkg"]
uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"

[[deps.LibCURL]]
deps = ["LibCURL_jll", "MozillaCACerts_jll"]
uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"

[[deps.LibCURL_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"

[[deps.LibGit2]]
deps = ["Base64", "NetworkOptions", "Printf", "SHA"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"

[[deps.LibSSH2_jll]]
deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"

[[deps.Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[deps.LinearAlgebra]]
deps = ["Libdl", "libblastrampoline_jll"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[deps.LogExpFunctions]]
deps = ["ChainRulesCore", "ChangesOfVariables", "DocStringExtensions", "InverseFunctions", "IrrationalConstants", "LinearAlgebra"]
git-tree-sha1 = "09e4b894ce6a976c354a69041a04748180d43637"
uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688"
version = "0.3.15"

[[deps.Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"

[[deps.Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"

[[deps.MbedTLS_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"

[[deps.Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"

[[deps.MozillaCACerts_jll]]
uuid = "14a3606d-f60d-562e-9121-12d972cd8159"

[[deps.NetworkOptions]]
uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"

[[deps.OpenBLAS_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"

[[deps.OpenLibm_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "05823500-19ac-5b8b-9628-191a04bc5112"

[[deps.OpenSpecFun_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "13652491f6856acfd2db29360e1bbcd4565d04f1"
uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
version = "0.5.5+0"

[[deps.Pkg]]
deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"

[[deps.Preferences]]
deps = ["TOML"]
git-tree-sha1 = "47e5f437cc0e7ef2ce8406ce1e7e24d44915f88d"
uuid = "21216c6a-2e73-6563-6e65-726566657250"
version = "1.3.0"

[[deps.Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[[deps.REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"

[[deps.Random]]
deps = ["SHA", "Serialization"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[[deps.Random123]]
deps = ["Random", "RandomNumbers"]
git-tree-sha1 = "afeacaecf4ed1649555a19cb2cad3c141bbc9474"
uuid = "74087812-796a-5b5d-8853-05524746bad3"
version = "1.5.0"

[[deps.RandomNumbers]]
deps = ["Random", "Requires"]
git-tree-sha1 = "043da614cc7e95c703498a491e2c21f58a2b8111"
uuid = "e6cf234a-135c-5ec9-84dd-332b85af5143"
version = "1.5.3"

[[deps.Reexport]]
git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b"
uuid = "189a3867-3050-52da-a836-e630ba90ab69"
version = "1.2.2"

[[deps.Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.3.0"

[[deps.SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"

[[deps.Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"

[[deps.SharedArrays]]
deps = ["Distributed", "Mmap", "Random", "Serialization"]
uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"

[[deps.Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"

[[deps.SparseArrays]]
deps = ["LinearAlgebra", "Random"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[[deps.SpecialFunctions]]
deps = ["ChainRulesCore", "IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"]
git-tree-sha1 = "a9e798cae4867e3a41cae2dd9eb60c047f1212db"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "2.1.6"

[[deps.Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[[deps.TOML]]
deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

[[deps.Tar]]
deps = ["ArgTools", "SHA"]
uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"

[[deps.Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[deps.TimerOutputs]]
deps = ["ExprTools", "Printf"]
git-tree-sha1 = "7638550aaea1c9a1e86817a231ef0faa9aca79bd"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.19"

[[deps.UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[[deps.Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

[[deps.Zlib_jll]]
deps = ["Libdl"]
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"

[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl", "OpenBLAS_jll"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"

[[deps.nghttp2_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"

[[deps.p7zip_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0"

Expected behavior

Using manual GC and reclaim should not be required.

Version info

Details on Julia:

# please post the output of:
julia> versioninfo()
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, haswell)
Environment:
  JULIA_DEPOT_PATH = E:/Programs/julia/.julia
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 1

Details on CUDA:

# please post the output of:
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
Unknown NVIDIA driver, for CUDA 11.6
CUDA driver 11.6

Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.1
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: missing
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.7.3
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce RTX 2080 Ti (sm_75, 0 bytes / 11.000 GiB available)

Additional context

The MWE seems to mostly trigger when used bytes is large too but in the real application it starts to happen when reserved bytes is large. Interrupting the program during a stall seems to give the same stacktrace in both mwe and real application so it is probably the same cause.

@DrChainsaw DrChainsaw added the bug Something isn't working label Jun 8, 2022
@maleadt
Copy link
Member

maleadt commented Jun 8, 2022

Can you try lowering the reserve amount: https://github.com/JuliaGPU/CUDA.jl/blob/5efcee664ff50cfa1e14ad9ca4dfe6f600fabb10/src/pool.jl#L87-L88=
Doing so will force CUDA to trim the memory pool at various synchronization events.

@DrChainsaw
Copy link
Author

Seems to help in the sense that the number of reserved bytes are reduced whenever the maximum is reached, but I still get long stalls when it is reached.

I'm not sure what is a good value for it though. 0?

julia> attribute!(memory_pool(device()), CUDA.MEMPOOL_ATTR_RELEASE_THRESHOLD, UInt64(9_000_000_000))

julia> testloop(20) # All went well this time
[ Info: time: 0.003 reserved: 2.1, used : 2.1
[ Info: time: 0.017 reserved: 3.2, used : 3.2
[ Info: time: 0.021 reserved: 4.3, used : 4.3
[ Info: time: 0.017 reserved: 5.4, used : 5.4
[ Info: time: 0.012 reserved: 6.4, used : 6.4
[ Info: time: 0.007 reserved: 7.5, used : 7.5
[ Info: time: 0.011 reserved: 8.6, used : 8.6
[ Info: time: 0.004 reserved: 9.7, used : 9.7
[ Info: time: 0.009 reserved: 11.0, used : 11.0
[ Info: time: 0.56 reserved: 9.0, used : 2.1
[ Info: time: 0.0 reserved: 9.0, used : 3.2
[ Info: time: 0.0 reserved: 9.0, used : 4.3
[ Info: time: 0.004 reserved: 9.0, used : 5.4
[ Info: time: 0.001 reserved: 9.0, used : 6.4
[ Info: time: 0.001 reserved: 9.0, used : 7.5
[ Info: time: 0.001 reserved: 9.0, used : 8.6
[ Info: time: 0.008 reserved: 9.7, used : 9.7
[ Info: time: 0.016 reserved: 11.0, used : 11.0
[ Info: time: 1.4 reserved: 9.0, used : 2.1
[ Info: time: 0.0 reserved: 9.0, used : 3.2

julia> testloop(20)
[ Info: time: 0.0 reserved: 9.0, used : 2.1
[ Info: time: 0.001 reserved: 9.0, used : 3.2
[ Info: time: 0.0 reserved: 9.0, used : 4.3
[ Info: time: 0.001 reserved: 9.0, used : 5.4
[ Info: time: 0.001 reserved: 9.0, used : 6.4
[ Info: time: 0.001 reserved: 9.0, used : 7.5
[ Info: time: 0.0 reserved: 9.0, used : 8.6
[ Info: time: 0.006 reserved: 9.7, used : 9.7
[ Info: time: 0.018 reserved: 11.0, used : 11.0
[ Info: time: 93.0 reserved: 9.0, used : 2.1  # OUCH!
[ Info: time: 0.0 reserved: 9.0, used : 3.2
[ Info: time: 0.0 reserved: 9.0, used : 4.3
[ Info: time: 0.0 reserved: 9.0, used : 5.4
[ Info: time: 0.005 reserved: 9.0, used : 6.4
[ Info: time: 0.001 reserved: 9.0, used : 7.5
[ Info: time: 0.001 reserved: 9.0, used : 8.6
[ Info: time: 0.004 reserved: 9.7, used : 9.7
[ Info: time: 0.008 reserved: 11.0, used : 11.0
[ Info: time: 1.7 reserved: 9.0, used : 2.1
[ Info: time: 0.001 reserved: 9.0, used : 3.2

julia> attribute!(memory_pool(device()), CUDA.MEMPOOL_ATTR_RELEASE_THRESHOLD, UInt64(0))

julia> gcreclaim()

julia> testloop(20)
[ Info: time: 0.004 reserved: 2.1, used : 2.1
[ Info: time: 0.02 reserved: 3.2, used : 3.2
[ Info: time: 0.016 reserved: 4.3, used : 4.3
[ Info: time: 0.006 reserved: 5.4, used : 5.4
[ Info: time: 0.008 reserved: 6.4, used : 6.4
[ Info: time: 0.007 reserved: 7.5, used : 7.5
[ Info: time: 0.003 reserved: 8.6, used : 8.6
[ Info: time: 0.007 reserved: 9.7, used : 9.7
[ Info: time: 0.004 reserved: 11.0, used : 11.0
[ Info: time: 21.0 reserved: 2.1, used : 2.1 # Still happens :(
[ Info: time: 0.005 reserved: 3.2, used : 3.2
[ Info: time: 0.009 reserved: 4.3, used : 4.3
[ Info: time: 0.014 reserved: 5.4, used : 5.4
[ Info: time: 0.006 reserved: 6.4, used : 6.4
[ Info: time: 0.003 reserved: 7.5, used : 7.5
[ Info: time: 0.005 reserved: 8.6, used : 8.6
[ Info: time: 0.004 reserved: 9.7, used : 9.7
[ Info: time: 0.008 reserved: 11.0, used : 11.0
[ Info: time: 3.5 reserved: 2.1, used : 2.1
[ Info: time: 0.012 reserved: 3.2, used : 3.2

@maleadt
Copy link
Member

maleadt commented Apr 27, 2024

Memory handling and GC integration has changed significantly, so I don't think this issue as reported here is still relevant. If the problem persists on CUDA.jl#master, feel free to open a new issue!

@maleadt maleadt closed this as completed Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants