-
-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom model can not be trained #2187
Comments
The immediate problem is that it's trying to tell you to define trainable differently: Flux.trainable(a_net::AttentionNet) = (embedding = a_net.embedding, attention = a_net.attention, fc_output = a_net.fc_output) It needs to return a NamedTuple, with a subset of But in fact it need not be define at all. Integers, or other scalars in the struct, will be ignored anyway -- Flux cannot treat these as trainable parameters. Simply deleting that method definition removes the problem. Sadly, after fixing that, I run into another error: julia> Flux.train!(a_net, data, opt_state) do m, x, y
Flux.mse(m(x), y)
end
┌ Warning: Layer with Float32 parameters got Float64 input.
│ The input will be converted, but any earlier layers may be very slow.
│ layer = Dense(8 => 64, relu) # 576 parameters
│ summary(x) = "8×32 Matrix{Float64}"
└ @ Flux ~/.julia/dev/Flux/src/layers/stateless.jl:77
ERROR: MethodError: no method matching +(::Base.RefValue{Any}, ::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...)
@ Base operators.jl:578
+(::Union{InitialValues.NonspecificInitialValue, InitialValues.SpecificInitialValue{typeof(+)}}, ::Any)
@ InitialValues ~/.julia/packages/InitialValues/OWP8V/src/InitialValues.jl:154
+(::ChainRulesCore.AbstractThunk, ::Any)
@ ChainRulesCore ~/.julia/packages/ChainRulesCore/a4mIA/src/tangent_arithmetic.jl:122
...
Stacktrace:
[1] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
@ Zygote ~/.julia/packages/Zygote/g2w9o/src/lib/lib.jl:17 This one is much deeper in the weeds. Somehow a type instability is causing something to be boxed, and Zygote is trying to add its gradient to another in a different format, and... why do I know these things? You might be able to hack around it something like this: julia> using Zygote
julia> Zygote.accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,)}) = Zygote.accum(x[], y)
julia> Base.:+(x::NamedTuple{(:contents,)}, y::Base.RefValue{Any}) = Zygote.accum(x, y[])
julia> Zygote.refresh()
julia> Flux.train!(a_net, data, opt_state) do m, x, y
Flux.mse(m(x), y)
end I have not looked closely but suspect that your code can be made more Zygote-friendly. Maybe |
I believe the boxing comes from these two lines:
If you rename the second |
I followed your suggestion by changing the variable name:
But still the same error. |
@mcabbott Thanks for your answer. I am not very familiar with programming in a Zygote manner. What is meant by 'Zygote-friendly'? Using array operations and avoiding all broadcasting behaviors or vector operations? But sometimes it could be inevitable to do so. |
Broadcasting and "vector operations" are fine. Comprehensions over large (in terms of # of elements) arrays which reference multiple variables are often tricky. |
I don't fully understand this. Could you refer to which part of my code could be tricky for Zygote and the reason for that? |
Ok, the error was what I thought it was but I was looking in the wrong place! If you change these two lines: x = a_net.fc_output(vcat(time_idx, attention_output))
return x To: y = a_net.fc_output(vcat(time_idx, attention_output))
return y Or just: return a_net.fc_output(vcat(time_idx, attention_output)) To explain what's going on, let's use this simplified example: function hasbox(x)
nums = [x + i for i in 1:5]
x = sum(nums)
return x
end julia> @code_warntype hasbox(1)
MethodInstance for hasbox(::Int64)
from hasbox(x) in Main at REPL[7]:1
Arguments
#self#::Core.Const(hasbox)
x@_2::Int64
Locals
#23::var"#23#24"
nums::Vector
x@_5::Union{}
x@_6::Union{Int64, Core.Box}
Body::Any
1 ─ (x@_6 = x@_2)
│ (x@_6 = Core.Box(x@_6::Int64))
│ (#23 = %new(Main.:(var"#23#24"), x@_6::Core.Box))
│ %4 = #23::var"#23#24"
│ %5 = (1:5)::Core.Const(1:5)
│ %6 = Base.Generator(%4, %5)::Core.PartialStruct(Base.Generator{UnitRange{Int64}, var"#23#24"}, Any[var"#23#24", Core.Const(1:5)])
│ (nums = Base.collect(%6))
│ %8 = Main.sum(nums)::Any
│ Core.setfield!(x@_6::Core.Box, :contents, %8)
│ %10 = Core.isdefined(x@_6::Core.Box, :contents)::Bool
└── goto #3 if not %10
2 ─ goto #4
3 ─ Core.NewvarNode(:(x@_5))
└── x@_5
4 ┄ %15 = Core.getfield(x@_6::Core.Box, :contents)::Any
└── return %15 That's a lot to take in, but the important part is the But there are no anonymous functions or closures in Edit: one way to make the issue more obvious is to use function hasbox(x)
nums = map(i -> x + i, 1:5)
x = sum(nums)
return x
end Here it's clear that |
Thanks for the reply from @ToucheSir. The explanation is very clear and now the problem is solved. If there is no one else has anything to add, we can close this issue. (But I think Flux can be more specific when reporting bugs, such error messages make it pretty hard to track where the problem is.) |
Glad we could help. Part of the reason the errors aren't better is that they don't actually come from Flux. Flux handles automatic differentiation through Zygote.jl, which relies on Julia compiler internals to function. Unfortunately this can lead to the occasional obscure error because what Zygote sees is much lower level than the code you write. Will have a think about if there's anything we can do to improve there, but no promises :) |
Had very similar error when switching from implicit parameter definition to explicit. Resolved when I switched back. |
I am pretty new to Julia and Flux. I am trying to build a simple neural network but using an attention layer. Such an attention model includes a self-defined attention layer and some non-trainable hyperparameters in the struct. I wrote the code as follows using Julia 1.8.2 and Flux v0.13.9, which works fine in the inference(feed-forward) mode:
But when I trained it, I got the following error message and a warning:
I followed the instruction from the official tutorial on custom layers, but it doesn’t specify how to get custom layers properly trained. How should I properly define a custom model with some non-trainable hyperparameters?
The text was updated successfully, but these errors were encountered: