-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UndefVarError: lib not defined when training a connect four agent #5
Comments
This looks like a Knet error. I can find several instances of "lib not defined" errors after a quick Google search, such as denizyuret/Knet.jl#411. Would you mind telling me what version of Knet you are using? To figure it out, just run:
I know that Knet can be a bit tricky to install sometimes but I have mostly heard problems coming from windows users. Did you ever manage to make some Knet example work on your machine? |
This is my first experience with Julan and Knet. brian@1920x-Ubuntu: I tried again, but still get the error: brian@1920x-Ubuntu:~/AlphaZero.jl$ julia julia> using Knet
Stacktrace: julia> using Pkg; Pkg.add("Knet") julia> using Knet julia> Initializing a new AlphaZero environment Initial report
Running benchmark: AlphaZero against MCTS (1000 rollouts) UndefVarError: lib not defined |
Also ran this test: julia> using Knet; include(Knet.dir("test/gpu.jl")) julia> brian@1920x-Ubuntu:~/AlphaZero.jl$ cat /usr/local/cuda/version.txt |
From your last post, it seems to be a problem installing Knet indeed. In theory, Knet is supposed to be installed automatically by the But installing all the CUDA-related dependencies that are necessary to make Knet work for all possible configurations is not a trivial problem and bugs like the one you observed are still happening (especially for windows users). It seems to me that Flux (another Julia ML framework) has been more successful in this regard. My advice is to follow the steps proposed in the manual to install Knet: If this still results in a problem, you may want to open a Knet issue. Also, please note that I am planning to add an option to switch to a Flux implementation of the networks library (see #2). Therefore, if you don't have time to debug Knet, you may want to wait a bit for this. Finally, welcome to Julia! I am glad that AlphaZero.jl gave you an occasion to try out this great language and I hope you don't get discouraged by this initial bump. :-) |
Got a bit further and looks like a Cuda/compiler issue after doing this: julia> using Pkg julia> Pkg.add("CUDAapi") julia> using CUDAapi julia> CXX,CXXVER = CUDAapi.find_host_compiler() julia> |
This looks like a bug in CUDAapi indeed. I recommend filing an issue. |
Worked on it for several hours. I have to say it is not really unexpected. The entire machine learning field is moving very fast and compatibility between various library stacks and frameworks is simply a moving target. My primary interest is with Lc0 (Leela Chess) being able to train nets and compile the engine. Most of that is tensorflow. Also got torch to work for some other things (A0Lite). As I mentioned, new to Julia and Knet. Perhaps TF will be supported at some point. I am reluctant to break things for Lc0 at this point but may take another stab at it. Thank you for sharing your work. |
Thanks for reporting back! In any case, you may want to file an issue because the maintainers of Knet and CuArrays may be interested in this. I'll ping you after I fix the Flux backend in case you want to try again. |
I encountered similar problem and after I rm ~/.julia and redo the |
Trying to train per instructions in Training a Connect Four Agent section.
Ubuntu 18.04 with RTX 2080ti
At first thought it might be a Julia version issue.
Tried with 1.4.0 and 1.3.1 but both have an error (1.4.0 outputs more warning type info).
Perhaps I'm doing something wrong:
brian@1920x-Ubuntu:~$ julia
_
_ _ ()_ | Documentation: https://docs.julialang.org
() | () () |
_ _ | | __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ ` | |
| | || | | | (| | | Version 1.3.1 (2019-12-30)
/ |_'|||_'_| | Official https://julialang.org/ release
|__/ |
julia>
brian@1920x-Ubuntu:
$ git clone https://github.com/jonathan-laurent/AlphaZero.jl.git$ cd AlphaZero.jl/Cloning into 'AlphaZero.jl'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 5859 (delta 15), reused 47 (delta 15), pack-reused 5812
Receiving objects: 100% (5859/5859), 8.56 MiB | 12.84 MiB/s, done.
Resolving deltas: 100% (3141/3141), done.
brian@1920x-Ubuntu:
brian@1920x-Ubuntu:
/AlphaZero.jl$ julia --project -e "import Pkg; Pkg.instantiate()"/AlphaZero.jl$ julia --project --color=yes scripts/alphazero.jl --game connect-four trainUpdating registry at
~/.julia/registries/General
Updating git-repo
https://github.com/JuliaRegistries/General.git
brian@1920x-Ubuntu:
CuArrays.jl SplittingPool statistics:
CuArrays.jl SplittingPool statistics:
Initializing a new AlphaZero environment
Initial report
Running benchmark: AlphaZero against MCTS (1000 rollouts)
UndefVarError: lib not defined
Stacktrace:
[1] broadcasted(::typeof(NNlib.relu), ::Knet.KnetArray{Float32,4}) at /home/brian/.julia/packages/Knet/vxHRi/src/unary.jl:17
[2] (::AlphaZero.KNets.BatchNorm)(::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet/layers.jl:85
[3] (::AlphaZero.KNets.Chain)(::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet/layers.jl:19
[4] forward(::ResNet{Game}, ::Knet.KnetArray{Float32,4}) at /home/brian/AlphaZero.jl/src/networks/knet.jl:148
[5] evaluate(::ResNet{Game}, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,2}) at /home/brian/AlphaZero.jl/src/networks/network.jl:288
[6] evaluate_batch(::ResNet{Game}, ::Array{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},1}) at /home/brian/AlphaZero.jl/src/networks/network.jl:313
[7] inference_server(::AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}) at ./util.jl:288
[8] macro expansion at /home/brian/AlphaZero.jl/src/util.jl:64 [inlined]
[9] (::AlphaZero.MCTS.var"#21#23"{AlphaZero.MCTS.Env{Game,StaticArrays.SArray{Tuple{7,6},UInt8,2,42},ResNet{Game}}})() at ./task.jl:333
***************** Hangs here so after ctrl-C
^C
signal (2): Interrupt
in expression starting at /home/brian/AlphaZero.jl/scripts/alphazero.jl:70
epoll_pwait at /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/epoll_pwait.c:42
uv__io_poll at /workspace/srcdir/libuv/src/unix/linux-core.c:270
uv_run at /workspace/srcdir/libuv/src/unix/core.c:359
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:448
poptaskref at ./task.jl:660
wait at ./task.jl:667
wait at ./condition.jl:106
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
_wait at ./task.jl:238
sync_end at ./task.jl:278
macro expansion at ./task.jl:319 [inlined]
macro expansion at /home/brian/AlphaZero.jl/src/mcts.jl:427 [inlined]
macro expansion at ./util.jl:212 [inlined]
explore_async! at /home/brian/AlphaZero.jl/src/mcts.jl:426
explore! at /home/brian/AlphaZero.jl/src/mcts.jl:452 [inlined]
think at /home/brian/AlphaZero.jl/src/play.jl:176 [inlined]
#play_game#90 at /home/brian/AlphaZero.jl/src/play.jl:246
#play_game at ./none:0 [inlined]
#pit#93 at /home/brian/AlphaZero.jl/src/play.jl:296
unknown function (ip: 0x7efca1f99dd9)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
#pit at ./none:0
unknown function (ip: 0x7efca1f99a4a)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
macro expansion at /home/brian/AlphaZero.jl/src/benchmark.jl:111 [inlined]
macro expansion at ./util.jl:288 [inlined]
run at /home/brian/AlphaZero.jl/src/benchmark.jl:110
run_duel at /home/brian/AlphaZero.jl/src/ui/session.jl:252
run_benchmark at /home/brian/AlphaZero.jl/src/ui/session.jl:275
zeroth_iteration! at /home/brian/AlphaZero.jl/src/ui/session.jl:285
#Session#126 at /home/brian/AlphaZero.jl/src/ui/session.jl:356
Type at ./none:0
unknown function (ip: 0x7efca1f42f79)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2141 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1631 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:328
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:417
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:368 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:778
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:888
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7efcbc3d6c0f)
unknown function (ip: 0x7)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:897
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:814
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:873
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:878
include at ./boot.jl:328 [inlined]
include_relative at ./loading.jl:1105
include at ./Base.jl:31
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
exec_options at ./client.jl:287
_start at ./client.jl:460
jfptr__start_2084.clone_1 at /opt/julia-1.3.1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2135 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2305
unknown function (ip: 0x401931)
unknown function (ip: 0x401533)
__libc_start_main at /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
unknown function (ip: 0x4015d4)
unknown function (ip: 0xffffffffffffffff)
Allocations: 159067857 (Pool: 159028147; Big: 39710); GC: 99
CuArrays.jl SplittingPool statistics:
brian@1920x-Ubuntu:~/AlphaZero.jl$
The text was updated successfully, but these errors were encountered: