Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference time "PTX compile error: Entry function uses too much parameter space" #171

Open
smart-fr opened this issue Feb 5, 2023 · 2 comments

Comments

@smart-fr
Copy link

smart-fr commented Feb 5, 2023

I successfully trained a NN on my game for 8x8 and 12x12 boards. I am aiming at 16x16, which is the original board dimension.
Inference on a 8x8 board works perfectly, the NN seems to win against any human player, this is fascinating!
Thank you again Jonathan for this generic implementation of AlphaZero.

Now during inference on a 12x12 board, I run into what looks like a CUDA problem. Probably not a bug; I need to allow an "Entry function" to use more parameter space. NB. neither the GPU memory nor the RAM are fully used when this occurs.

Has someone encountered this limitation, and tried to resolve it?

ERROR: Failed to compile PTX code (ptxas exited with code 4294967295)
Invocation arguments: --generate-line-info --verbose --gpu-name sm_86 --output-file C:\Users\smart\AppData\Local\Temp\jl_pK7SBkXwjg.cubin C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx
ptxas C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx, line 2027; error   : Entry function '_Z27julia_broadcast_kernel_818015CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE2__S4_I8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS9_ES4_IS6_S6_EES8_I13ReshapedArrayIS1_Li2E6SArrayIS4_ILi1152EES1_Li1ELi1152EES4_ES4_IS9_S9_ES4_IS6_S6_EEEES6_' uses too much parameter space (0x12b0 bytes, 0x1100 max).
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:428
  [3] #224
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:347 [inlined]       
  [4] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}})
    @ GPUCompiler C:\Users\smart\.julia\packages\GPUCompiler\qdoh1\src\driver.jl:76
  [5] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:346
  [6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\smart\.julia\packages\GPUCompiler\qdoh1\src\cache.jl:90
  [7] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:299
  [8] cufunction
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:292 [inlined]       
  [9] macro expansion
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:102 [inlined]       
 [10] #launch_heuristic#248
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\gpuarrays.jl:17 [inlined]
 [11] _copyto!
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:63 [inlined]       
 [12] copyto!
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:46 [inlined]       
 [13] copy
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:37 [inlined]       
 [14] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}}})
    @ Base.Broadcast .\broadcast.jl:860
 [15] forward_normalized(nn::ResNet, state::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, actions_mask::Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:265
 [16] evaluate(nn::ResNet, state::NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:292
 [17] AbstractNetwork
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:297 [inlined]    
 [18] state_info(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, state::NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}})
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:170
 [19] run_simulation!(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, game::AlphaZero.Examples.BonbonRectangle.GameEnv; η::Vector{Float64}, root::Bool)
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:206
 [20] explore!(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, game::AlphaZero.Examples.BonbonRectangle.GameEnv, nsims::Int64)
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:244
 [21] think(p::MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, game::AlphaZero.Examples.BonbonRectangle.GameEnv)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:202
 [22] select_move
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:49 [inlined]
 [23] select_move(p::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, Human}, game::AlphaZero.Examples.BonbonRectangle.GameEnv, turn::Int64)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:265
 [24] interactive!(game::AlphaZero.Examples.BonbonRectangle.GameEnv, player::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, Human})
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:378
 [25] interactive!
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:400 [inlined]
 [26] interactive!(game::AlphaZero.Examples.BonbonRectangle.GameSpec, white::MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, black::Human)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:402
 [27] play(e::Experiment; args::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ AlphaZero.Scripts C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:59
 [28] play
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:39 [inlined]      
 [29] #play#19
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:71 [inlined]      
 [30] play(s::String)
    @ AlphaZero.Scripts C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:71
 [31] top-level scope
    @ none:1 
@smart-fr smart-fr changed the title PTX compile error: Entry function uses too much parameter space Inference time "PTX compile error: Entry function uses too much parameter space" Feb 7, 2023
@smart-fr
Copy link
Author

smart-fr commented Feb 8, 2023

I could overcome this CUDA inference issue by forcing CPU use during play sessions, in the Arena parameters:

arena = ArenaParams(
  sim=SimParams(
    use_gpu=false,#true,

This is more a workaround than a satisfactory solution, hence I don't close the issue yet.

Maybe my hardware is too limited? RTX3080 Laptop with 16GB Memory.
But I got the same issue using a cloud V100.

(By the way, in order to reuse a session with different parameters for playing than the original ones used for training, I used the trick suggested here: To continue a training #118)

@jonathan-laurent
Copy link
Owner

I am really glad to hear that you are starting to see good results on your game!
Your hardware is fine. The RTX3080 is a good GPU and more than I had when I originally developed AlphaZero.jl.

I never encountered the error you reported, although it is probably not AlphaZero.jl-specific.
I would encourage you to look for a minimal nonworking example and submit an issue to CUDA.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants