Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading artifact: CUDA110 when using DiffEqFlux #542

Closed
ayaoyao214 opened this issue Nov 12, 2020 · 13 comments
Closed

Downloading artifact: CUDA110 when using DiffEqFlux #542

ayaoyao214 opened this issue Nov 12, 2020 · 13 comments

Comments

@ayaoyao214
Copy link

ayaoyao214 commented Nov 12, 2020

Hi, I am currently working on DiffEqFlux here is a problem struggling me for quite a while.
When I type using DiffEqFlux, it goiing to take long time to precompile and sometimes it fail to compile.

This is what I got.
image
Then if I do ctrl+C to exit, it will try to download other versions, like this.
image

But sometimes it works fine which makes me very confused.
Here is my package status. I am jusing Julia 1.5.2 btw.
image

Thank you very much.

@maleadt
Copy link
Member

maleadt commented Nov 12, 2020

This is not a bug, but the CUDA artifacts being downloaded. These artifacts are large, so downloading them takes a while. It should only happen when actually doing CUDA calls though, so if this really happens during using DiffEqFlux, that's a bug in that package.

I tried it myself, and the culprit is Flux:

julia> using CUDA
[ Info: Precompiling CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]

julia> using DiffEqFlux
[ Info: Precompiling DiffEqFlux [aae7a2af-3d4f-5e19-a356-7da93b79d9d0]
ERROR: LoadError: InitError: 
Stacktrace:
 [1] error() at ./error.jl:42
 [2] __runtime_init__() at /home/tim/Julia/pkg/CUDA/src/initialization.jl:110
 [3] macro expansion at /home/tim/Julia/pkg/CUDA/src/initialization.jl:32 [inlined]
 [4] macro expansion at ./lock.jl:183 [inlined]
 [5] _functional(::Bool) at /home/tim/Julia/pkg/CUDA/src/initialization.jl:26
 [6] functional(::Bool) at /home/tim/Julia/pkg/CUDA/src/initialization.jl:19
 [7] functional at /home/tim/Julia/pkg/CUDA/src/initialization.jl:18 [inlined]
 [8] __init__() at /home/tim/Julia/depot/packages/Flux/q3zeA/src/Flux.jl:54

cc @DhairyaLGandhi

@maleadt maleadt closed this as completed Nov 12, 2020
@maleadt
Copy link
Member

maleadt commented Nov 17, 2020

Just to clarify, it's not really a bug since CUDA.functional() will only download if you actually have a GPU. So this is likely the wanted behavior.

@ayaoyao214
Copy link
Author

Just to clarify, it's not really a bug since CUDA.functional() will only download if you actually have a GPU. So this is likely the wanted behavior.

Hi, thank you very much for replying. But this confuses me. I do have a GPU but I didn't run on it. Do you mean that as long as I have a gpu, when I do 'using DiffEqFlux', the CUDA.functional() is still going to download?
Then if that is true, what I need to do is just wait until the downloading is finished? Thanks a lot.

@maleadt
Copy link
Member

maleadt commented Nov 17, 2020

Do you mean that as long as I have a gpu, when I do 'using DiffEqFlux', the CUDA.functional() is still going to download?

Correct, there's no other way to guarantee that CUDA is functional without actually downloading the required libraries, so it need to happens then. The download should only happen once, and finish within 10 minutes or so. If that doesn't happen, either you have internet troubles, or the PkgServer is acting up.

Note that if you have CUDA installed locally you can always specify JULIA_CUDA_USE_BINARYBUILDER=false in your environment to avoid using and downloading artifacts, but that's not entirely recommended.

@ayaoyao214
Copy link
Author

Do you mean that as long as I have a gpu, when I do 'using DiffEqFlux', the CUDA.functional() is still going to download?

Correct, there's no other way to guarantee that CUDA is functional without actually downloading the required libraries, so it need to happens then. The download should only happen once, and finish within 10 minutes or so. If that doesn't happen, either you have internet troubles, or the PkgServer is acting up.

Note that if you have CUDA installed locally you can always specify JULIA_CUDA_USE_BINARYBUILDER=false in your environment to avoid using and downloading artifacts, but that's not entirely recommended.

Sure. I got this. I'll run it and wait and see what going to happen. Thanks again for your help!

@arash-banadaki
Copy link

arash-banadaki commented Jan 5, 2022

I use Julia Version 1.7.0 (2021-11-30) on a machine with a CUDA-enabled card.
I see that Julia keeps downloading the same artifacts over and over whenever you submit a code that invokes CUDA functionalities on the GPU.
To clarify if you submit the following code and exit Julia and come back again and submit the same code you will have to wait again for Julia to download all CUDA artifacts. Is this behavior intended?

using CUDA, LinearAlgebra
function cuinv(m::Matrix{T}) where T
    A = CuArray(m)
    B = CuArray(Matrix{T}(I(size(A,1))))
    A, ipiv = CUDA.CUSOLVER.getrf!(A)
    Matrix{T}(CUDA.CUSOLVER.getrs!('N', A, ipiv, B))
end

A = rand(100,100)
B = cuinv(A)

@maleadt
Copy link
Member

maleadt commented Jan 5, 2022

It is not. Do you have limited space available in your home folder? The artifacts should be saved to ~/.julia/artifact. This probably isn't a CUDA.jl bug.

@arash-banadaki
Copy link

I looked under the ~/.julia/artifacts. Under one of the directories (name of the directory looks like a hash), I found the cuda executables that are needed for running the above code e.g. libcusolver.so. Here is the list of the files that are found there:

libcublasLt.so@		   libcufftw.so@		libcusolver.so.10@
libcublasLt.so.10@	   libcufftw.so.10@		libcusolver.so.10.3.0.89*
libcublasLt.so.10.2.2.89*  libcufftw.so.10.1.2.89*	libcusparse.so@
libcublas.so@		   libcupti.so@			libcusparse.so.10@
libcublas.so.10@	   libcupti.so.10.2@		libcusparse.so.10.3.1.89*
libcublas.so.10.2.2.89*    libcupti.so.10.2.75*		libnvToolsExt.so@
libcudadevrt.a		   libcurand.so@		libnvToolsExt.so.1@
libcudart.so@		   libcurand.so.10@		libnvToolsExt.so.1.0.0*
libcudart.so.10.2@	   libcurand.so.10.1.2.89*	libnvvm.so@
libcudart.so.10.2.89*	   libcusolverMg.so@		libnvvm.so.3@
libcufft.so@		   libcusolverMg.so.10@		libnvvm.so.3.3.0*
libcufft.so.10@		   libcusolverMg.so.10.3.0.89*
libcufft.so.10.1.2.89*	   libcusolver.so@

Despite the above files being in my artifacts folder (no matter if I am in Julia or not), the downloaded will happen every time I exit Julia.
I do have a limited space in my ~ but there is still space left.

@maleadt
Copy link
Member

maleadt commented Jan 6, 2022

There are different CUDA artifacts, so that may not be the correct one. Try calling CUDA.ptxas() to see which one is in use. From CUDA.jl's directory, starting Julia with --project, you can then manually try importing the artifact (specify the CUDA version you see in CUDA.versioninfo()):

$ julia
julia> using CUDA

julia> CUDA.ptxas()
"/home/tim/Julia/depot/artifacts/1ada8b4cd8083610e1e41fe9d699f5a451977aeb/bin/ptxas"

julia> CUDA.versioninfo()
CUDA toolkit 11.5, artifact installation
CUDA.jl$ julia --project
julia> using Artifacts

julia> platform = Base.BinaryPlatforms.HostPlatform()
Linux x86_64 {cxxstring_abi=cxx11, julia_version=1.7.1, libc=glibc, libgfortran_version=5.0.0, libstdcxx_version=3.4.29}

julia> platform.tags["cuda"] = "11.5"
"11.5"

julia> @artifact_str("CUDA", platform)
"/home/tim/Julia/depot/artifacts/1ada8b4cd8083610e1e41fe9d699f5a451977aeb"

# this didn't download

If that redownloads, that's an Artifacts/Pkg bug.

@arash-banadaki
Copy link

Here is the ptxas() and versioninfo() output:

julia> CUDA.ptxas()
"~/.julia/artifacts/eaa17e7c15ad1a27356fa2e5002f64c3096588e3/bin/ptxas"

julia> CUDA.versioninfo()
CUDA toolkit 10.2, artifact installation
NVIDIA driver 495.29.5, for CUDA 11.5
CUDA driver 11.5

Libraries: 
- CUBLAS: 10.2.2
- CURAND: 10.1.2
- CUFFT: 10.1.2
- CUSOLVER: 10.3.0
- CUSPARSE: 10.3.1
- CUPTI: 12.0.0
- NVML: 11.0.0+495.29.5
  Downloaded artifact: CUDNN
  Downloaded artifact: CUDNN
- CUDNN: missing
  Downloaded artifact: CUTENSOR
- CUTENSOR: 1.3.0 (for CUDA 10.2.0)

Toolchain:
- Julia: 1.7.0
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: Tesla T4 (sm_75, 14.517 GiB / 14.756 GiB available)

I am having trouble finding a directory named CUDA.jl on my machine. So I did the closest thing; I found the CUDA.jl file in ~/.julia/packages/CUDA/M4jkK/src/, cd'd to it, and ran the commands that you specified:

julia> using Artifacts

julia> platform = Base.BinaryPlatforms.HostPlatform()
Linux x86_64 {cxxstring_abi=cxx11, julia_version=1.7.0, libc=glibc, libgfortran_version=5.0.0}

julia> platform.tags["cuda"] = "11.5"
"11.5"

julia> @artifact_str("CUDA", platform)
ERROR: Cannot locate artifact 'CUDA' for x86_64-linux-gnu-libgfortran5-cxx11-cuda+11.5-julia_version+1.7.0 in '~/.julia/packages/CUDA/M4jkK/Artifacts.toml'
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] artifact_slash_lookup(name::String, artifact_dict::Dict{String, Any}, artifacts_toml::String, platform::Base.BinaryPlatforms.Platform)
   @ Artifacts /usr/local/julia-1.7.0/share/julia/stdlib/v1.7/Artifacts/src/Artifacts.jl:608
 [3] top-level scope
   @ /usr/local/julia-1.7.0/share/julia/stdlib/v1.7/Artifacts/src/Artifacts.jl:688

@maleadt
Copy link
Member

maleadt commented Jan 6, 2022

Check out CUDA.jl from its source repository.

@arash-banadaki
Copy link

arash-banadaki commented Jan 7, 2022

Got it. Ok. I am strictly following your instructions. Please let me know if I am missing something:

julia> using LazyArtifacts, Artifacts

julia> platform = Base.BinaryPlatforms.HostPlatform()
Linux x86_64 {cxxstring_abi=cxx11, julia_version=1.7.0, libc=glibc, libgfortran_version=5.0.0}

julia> platform.tags["cuda"] = "11.5"
"11.5"

julia> @artifact_str("CUDA", platform)
  Downloaded artifact: CUDA
  Downloaded artifact: CUDA
ERROR: Unable to automatically install 'CUDA' from '~/.julia/CUDA.jl/Artifacts.toml'
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] ensure_artifact_installed(name::String, meta::Dict{String, Any}, artifacts_toml::String; platform::Base.BinaryPlatforms.Platform, verbose::Bool, quiet_download::Bool, io::Base.TTY)
   @ Pkg.Artifacts /usr/local/julia-1.7.0/share/julia/stdlib/v1.7/Pkg/src/Artifacts.jl:441
 [3] ensure_artifact_installed(name::String, artifacts_toml::String; platform::Base.BinaryPlatforms.Platform, pkg_uuid::Nothing, verbose::Bool, quiet_download::Bool, io::Base.TTY)
   @ Pkg.Artifacts /usr/local/julia-1.7.0/share/julia/stdlib/v1.7/Pkg/src/Artifacts.jl:404
 [4] _artifact_str(__module__::Module, artifacts_toml::String, name::SubString{String}, path_tail::String, artifact_dict::Dict{String, Any}, hash::Base.SHA1, platform::Base.BinaryPlatforms.Platform, lazyartifacts::Any)
   @ Artifacts /usr/local/julia-1.7.0/share/julia/stdlib/v1.7/Artifacts/src/Artifacts.jl:547
 [5] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Base ./essentials.jl:716
 [6] invokelatest(::Any, ::Any, ::Vararg{Any})
   @ Base ./essentials.jl:714
 [7] top-level scope
   @ /usr/local/julia-1.7.0/share/julia/stdlib/v1.7/Artifacts/src/Artifacts.jl:689

It seems the artifacts that are being downloaded are not all needed for the code that I submitted.
For example, in my original question you can see that I am merely calling the CUDA.CUSOLVER.getrf but it ends up downloading libcufft.so or the libcusparse for some reason.

@maleadt
Copy link
Member

maleadt commented Jan 7, 2022

CUFFT and CUSPARSE are part of CUDA, as is CUSOLVER, so that's expected.

The failure to install is unknown to me, and probably unrelated to CUDA.jl. Maybe try asking around on #helpdesk (on Slack) or filing a bug on Pkg.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants