Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU does not load any more without error when installation not functional #685

Closed
omlins opened this issue Oct 9, 2024 · 3 comments · Fixed by #686
Closed

AMDGPU does not load any more without error when installation not functional #685

omlins opened this issue Oct 9, 2024 · 3 comments · Fixed by #686

Comments

@omlins
Copy link

omlins commented Oct 9, 2024

It has been found a very useful property of GPU packages over the years to load without error when the installation not functional. Now there is thrown the following error when no installation is present:

Error: ROCm discovery failed!
│ Discovered ROCm path: .
│ Use `ROCM_PATH` env variable to specify ROCm directory.
│ 
│   exception =
│    could not load symbol "hipRuntimeGetVersion":
│    /opt/hostedtoolcache/julia/1.11.0/x64/bin/julia: undefined symbol: hipRuntimeGetVersion
│    Stacktrace:
│      [1] _hip_runtime_version()
│        @ AMDGPU.ROCmDiscovery ~/.julia/packages/AMDGPU/jLWP2/src/discovery/discovery.jl:44
│      [2] __init__()
│        @ AMDGPU.ROCmDiscovery ~/.julia/packages/AMDGPU/jLWP2/src/discovery/discovery.jl:95
│      [3] run_module_init(mod::Module, i::Int64)
│        @ Base ./loading.jl:1336
│      [4] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
│        @ Base ./loading.jl:1324
│      [5] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any}, ignore_native::Nothing; register::Bool)
│        @ Base ./loading.jl:1213
│      [6] _include_from_serialized (repeats 2 times)
│        @ ./loading.jl:1169 [inlined]
│      [7] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128, stalecheck::Bool; reasons::Dict{String, Int64}, DEPOT_PATH::Vector{String})
│        @ Base ./loading.jl:1975
│      [8] _require(pkg::Base.PkgId, env::String)
│        @ Base ./loading.jl:2435
│      [9] __require_prelocked(uuidkey::Base.PkgId, env::String)
│        @ Base ./loading.jl:2300
│     [10] #invoke_in_world#3
│        @ ./essentials.jl:1088 [inlined]
│     [11] invoke_in_world
│        @ ./essentials.jl:1085 [inlined]
│     [12] _require_prelocked(uuidkey::Base.PkgId, env::String)
│        @ Base ./loading.jl:2287
│     [13] macro expansion
│        @ ./loading.jl:2226 [inlined]
│     [14] macro expansion
│        @ ./lock.jl:273 [inlined]
│     [15] __require(into::Module, mod::Symbol)
│        @ Base ./loading.jl:2183
│     [16] #invoke_in_world#3
│        @ ./essentials.jl:1088 [inlined]
│     [17] invoke_in_world
│        @ ./essentials.jl:1085 [inlined]
│     [18] require(into::Module, mod::Symbol)
│        @ Base ./loading.jl:2176
│     [19] top-level scope
│        @ ~/work/ParallelStencil.jl/ParallelStencil.jl/test/ParallelKernel/test_allocators.jl:16
│     [20] include(mod::Module, _path::String)
│        @ Base ./Base.jl:557
│     [21] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:323
│     [22] _start()
│        @ Base ./client.jl:531
└ @ AMDGPU.ROCmDiscovery ~/.julia/packages/AMDGPU/jLWP2/src/discovery/discovery.jl:113

It can be observed for example here in the ParallelStencil CI: https://github.com/omlins/ParallelStencil.jl/actions/runs/11251525928/job/31282759336?pr=169#step:6:578

In occurrence this breaks all our CI...

@luraess
Copy link
Collaborator

luraess commented Oct 9, 2024

@pxl-th is this a feature or a bug from the latest discovery refactor?

@pxl-th
Copy link
Collaborator

pxl-th commented Oct 9, 2024

That's definitely a bug... I'll open a PR with fixes

@omlins
Copy link
Author

omlins commented Oct 9, 2024

That's definitely a bug... I'll open a PR with fixes

Thanks @pxl-th !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants