Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_Driver_jll's lazy artifacts cause a precompilation-time warning #2415

Closed
oscardssmith opened this issue Jun 13, 2024 · 10 comments
Closed
Labels
bug Something isn't working help wanted Extra attention is needed upstream Somebody else's problem.

Comments

@oscardssmith
Copy link

julia> using CUDA
 │ Package CUDA not found, but a package named CUDA is available from a registry. 
 │ Install package?
 │   (docs) pkg> add CUDA 
 └ (y/n/o) [y]: y
   Resolving package versions...
   Installed CUDA_Runtime_Discovery ─ v0.3.3
   Installed SentinelArrays ───────── v1.4.3
   Installed LLVM ─────────────────── v7.2.1
   Installed UnsafeAtomicsLLVM ────── v0.1.4
   Installed CUDA_Driver_jll ──────── v0.9.0+0
   Installed GPUArrays ────────────── v10.2.0
   Installed KernelAbstractions ───── v0.9.20
   Installed CUDA_Runtime_jll ─────── v0.14.0+1
   Installed PrettyTables ─────────── v2.3.2
   Installed GPUCompiler ──────────── v0.26.5
   Installed CUDA ─────────────────── v5.4.2
    Updating `~/.julia/dev/Ferrite/docs/Project.toml`
  [052768ef] + CUDA v5.4.2
    Updating `~/.julia/dev/Ferrite/docs/Manifest.toml`
  Downloaded artifact: CUDA_Driver

[pid 611155] waiting for IO to finish:
 Handle type        uv_handle_t->data
 timer              0x1631cf0->0x7f50886d75b0
This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.

it looks like something here is missing a wait call.

@oscardssmith oscardssmith added the bug Something isn't working label Jun 13, 2024
@maleadt
Copy link
Member

maleadt commented Jun 13, 2024

You'll have to debug this, as I've never encountered this, not locally or on any CI job.

@maleadt maleadt added the needs information Further information is requested label Jun 13, 2024
@KSepetanc
Copy link

The error is definitely there. Getting it every time locally. Simple pkg> add CUDA should reproduce it.
What info do you need?

@KSepetanc
Copy link

Please see log of issue 2428.
You will find there waiting for IO to finish too.

@maleadt
Copy link
Member

maleadt commented Jul 2, 2024

This is just a warning, right? Are you experiencing any issues because of it?

There's a section in the docs on this warning: https://docs.julialang.org/en/v1/devdocs/precompile_hang/

@KSepetanc
Copy link

AFAIK it is only warning. I am not aware if any issue is actually caused by this.

@maleadt
Copy link
Member

maleadt commented Jul 2, 2024

OK, that's good. I thought I didn't encounter this, but testing in a fresh environment it does appear. I guess it only happens when precompiling CUDA_Driver_jll or so, which is not frequently recompiled, and because it doesn't cause a hang it slipped through the cracks.

@maleadt maleadt added help wanted Extra attention is needed and removed needs information Further information is requested labels Jul 2, 2024
@maleadt maleadt changed the title precompile gives "waiting for IO to finish" error CUDA_Driver_jll's lazy artifacts cause a precompilation-time warning Jul 4, 2024
@maleadt
Copy link
Member

maleadt commented Jul 4, 2024

Looks like this can be isolated to CUDA_Driver_jll:

❯ JULIA_DEPOT_PATH=$(mktemp -d) julia -e 'using Pkg; Pkg.add("CUDA_Driver_jll")'
  Installing known registries into `/tmp/tmp.28PQv5Mx6w`
    Updating registry at `/tmp/tmp.28PQv5Mx6w/registries/General.toml`
   Resolving package versions...
   Installed CUDA_Driver_jll ─ v0.9.1+0
   Installed JLLWrappers ───── v1.5.0
   Installed Preferences ───── v1.4.3
    Updating `/tmp/tmp.28PQv5Mx6w/environments/v1.10/Project.toml`
  [4ee394cb] + CUDA_Driver_jll v0.9.1+0
    Updating `/tmp/tmp.28PQv5Mx6w/environments/v1.10/Manifest.toml`
  [692b3bcd] + JLLWrappers v1.5.0
  [21216c6a] + Preferences v1.4.3
  [4ee394cb] + CUDA_Driver_jll v0.9.1+0
  [0dad84c5] + ArgTools v1.1.1
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [f43a241f] + Downloads v1.6.0
  [7b1f6079] + FileWatching
  [b77e0a4c] + InteractiveUtils
  [4af54fe1] + LazyArtifacts
  [b27032c2] + LibCURL v0.6.4
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [ca575930] + NetworkOptions v1.2.0
  [44cfe95a] + Pkg v1.10.0
  [de0858da] + Printf
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA v0.7.0
  [9e88b42a] + Serialization
  [6462fe0b] + Sockets
  [fa267f1f] + TOML v1.0.3
  [a4e569a6] + Tar v1.10.0
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [deac9b47] + LibCURL_jll v8.4.0+0
  [e37daf67] + LibGit2_jll v1.6.4+0
  [29816b5a] + LibSSH2_jll v1.11.0+1
  [c8ffd9c3] + MbedTLS_jll v2.28.2+1
  [14a3606d] + MozillaCACerts_jll v2023.1.10
  [83775a58] + Zlib_jll v1.2.13+1
  [8e850ede] + nghttp2_jll v1.52.0+1
  [3f19e933] + p7zip_jll v17.4.0+2
Precompiling project...
  Progress [================================>        ]  4/5
  ◐ CUDA_Driver_jll Waiting for background task / IO / timer. Interrupt to inspect
  1 dependency had output during precompilation:
┌ CUDA_Driver_jll
│   Downloading artifact: CUDA_Driver
│
│  [pid 18663] waiting for IO to finish:
│   Handle type        uv_handle_t->data
│   timer              0x174876e0->0x72c02a961450
│  This means that a package has started a background task or event source that has not finished running. For precompilation to complete successfully, the event source needs to be closed explicitly. See the developer documentation on fixing precompilation hangs for more help.
│
│  [18663] signal (2): Interrupt
│  in expression starting at none:0
│  epoll_wait at /usr/lib/libc.so.6 (unknown line)
│  uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:236
│  uv_run at /workspace/srcdir/libuv/src/unix/core.c:400
│  ijl_task_get_next at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/partr.c:478

Although that package has a very complicated __init__ function, that's not to blame. The problem seems to stem from the artifacts being lazy, and can be reproduced with the following minimal set-up:

Artifacts.toml

[[CUDA_Driver]]
arch = "x86_64"
git-tree-sha1 = "a86b67fd924e2a8c72d376d301a34b2364281978"
lazy = true
libc = "glibc"
os = "linux"

    [[CUDA_Driver.download]]
    sha256 = "350b076a65dc548226a91cb53a029647c1264ef6099379eb8f0e5f95dcaa0a15"
    url = "https://github.com/JuliaBinaryWrappers/CUDA_Driver_jll.jl/releases/download/CUDA_Driver-v0.9.1+0/CUDA_Driver.v0.9.1.x86_64-linux-gnu.tar.gz"

src/CUDA_Driver_jll.jl

# Use baremodule to shave off a few KB from the serialized `.ji` file
baremodule CUDA_Driver_jll
using Base
using Base: UUID
using LazyArtifacts
import JLLWrappers

JLLWrappers.@generate_main_file_header("CUDA_Driver")
JLLWrappers.@generate_main_file("CUDA_Driver", UUID("4ee394cb-3365-5eb0-8335-949819d2adfc"))
end  # module CUDA_Driver_jll

src/wrappers/x86_64-gnu-linux.jl

# Autogenerated wrapper script for CUDA_Driver_jll for x86_64-linux-gnu
export libcuda_compat, libcuda_debugger, libnvidia_nvvm, libnvidia_ptxjitcompiler

JLLWrappers.@generate_wrapper_header("CUDA_Driver")
JLLWrappers.@declare_library_product(libcuda_compat, "libcuda.so.1")
JLLWrappers.@declare_library_product(libcuda_debugger, "libcudadebugger.so.1")
JLLWrappers.@declare_library_product(libnvidia_nvvm, "libnvidia-nvvm.so.4")
JLLWrappers.@declare_library_product(libnvidia_ptxjitcompiler, "libnvidia-ptxjitcompiler.so.1")
function __init__()
    JLLWrappers.@generate_init_header()
    JLLWrappers.@init_library_product(
        libcuda_compat,
        "lib/libcuda.so",
        nothing,
    )
end  # __init__()

@giordano @KristofferC Is there a known precompilation issue with the JLL-generated code and lazy artifacts?

@giordano
Copy link

giordano commented Jul 4, 2024

I'm not aware of any specific issue, but note that moving download from installation-time to precompile-time is the whole point of lazy artifacts, so this is basically expected. Slow networks/large artifacts will cause a wait during precompilation just because it takes time to download them (MKL is often a culprit).

@maleadt
Copy link
Member

maleadt commented Jul 4, 2024

moving download from installation-time to precompile-time is the whole point of lazy artifacts, so this is basically expected

The download finishes in a couple of seconds; the warning occurs much later. I'd guess that some Downloads.jl-related task/timer is left running.

In addition, the use of lazy artifacts here is reasonable: Only download CUDA artifacts if the system supports CUDA (well, for the Driver JLL it can become eager, but other CUDA JLLs probably suffer from something similar). That shouldn't cause a "scary" error at precompilation time. Even if it's "expected", that only results in bugs like this one getting filed.

@maleadt
Copy link
Member

maleadt commented Jul 4, 2024

Seems fixed by making the CUDA_Driver_jll's non-lazy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed upstream Somebody else's problem.
Projects
None yet
Development

No branches or pull requests

4 participants