Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using Float64: ERROR: UndefRefError: access to undefined reference #490

Closed
contradict opened this issue May 22, 2020 · 15 comments · Fixed by #592
Closed

Error when using Float64: ERROR: UndefRefError: access to undefined reference #490

contradict opened this issue May 22, 2020 · 15 comments · Fixed by #592

Comments

@contradict
Copy link

contradict commented May 22, 2020

Distilled from this discussion. I couldn't remove anything further and still reproduce the crash.

Julia 1.4.1, Flux 0.10.4

using Flux                                                                       
                                                                                 
function mwe(T)                                                                  
    int1 = Dense(4, 280)                                                         
    resd(X) = reshape(int1(X), 10, 7, 4, :)                                      
    tc1 = ConvTranspose((4, 3), 4 => 4, relu, stride = (2, 2), pad = 1)          
    mdl = Chain(resd, tc1)                                                       
    z = [1,2,3,4]                                                                
    X̂ = mdl(z)                                                                   
    X = randn(T, size(X̂)...)                                                     
    loss(y) = -sum(Flux.binarycrossentropy.(mdl(z), y))                          
    ps = Flux.params(mdl)                                                           
    gs = gradient(ps) do                                                            
        loss(X)                                                                  
    end                                                                          
end                                                                                 

julia> mwe(Float32); # success

julia> mwe(Float64)

ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex at ./array.jl:789 [inlined]
 [2] conv_direct!(::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::NNlib.DenseConvDims{3,(4, 3, 1),4,4,(2, 2, 1),(1, 1, 1, 1, 0, 0),(1, 1, 1),false}; alpha::Float64, beta::Bool) at /home/russel/.julia/packages/NNlib/FAI3o/src/impl/conv_direct.jl:98
 [3] conv_direct! at /home/russel/.julia/packages/NNlib/FAI3o/src/impl/conv_direct.jl:51 [inlined]
 [4] conv!(::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::NNlib.DenseConvDims{3,(4, 3, 1),4,4,(2, 2, 1),(1, 1, 1, 1, 0, 0),(1, 1, 1),false}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/russel/.julia/packages/NNlib/FAI3o/src/conv.jl:99
 [5] conv!(::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::NNlib.DenseConvDims{3,(4, 3, 1),4,4,(2, 2, 1),(1, 1, 1, 1, 0, 0),(1, 1, 1),false}) at /home/russel/.julia/packages/NNlib/FAI3o/src/conv.jl:97
 [6] conv!(::Array{AbstractFloat,4}, ::Array{AbstractFloat,4}, ::Array{Float32,4}, ::NNlib.DenseConvDims{2,(4, 3),4,4,(2, 2),(1, 1, 1, 1),(1, 1),false}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/russel/.julia/packages/NNlib/FAI3o/src/conv.jl:70
 [7] conv! at /home/russel/.julia/packages/NNlib/FAI3o/src/conv.jl:70 [inlined]
 [8] conv(::Array{AbstractFloat,4}, ::Array{Float32,4}, ::NNlib.DenseConvDims{2,(4, 3),4,4,(2, 2),(1, 1, 1, 1),(1, 1),false}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/russel/.julia/packages/NNlib/FAI3o/src/conv.jl:116
 [9] conv at /home/russel/.julia/packages/NNlib/FAI3o/src/conv.jl:114 [inlined]
 [10] FluxML/Flux.jl#1837 at /home/russel/.julia/packages/Zygote/YeCEW/src/lib/nnlib.jl:41 [inlined]
 [11] (::Zygote.var"#4556#back#1839"{Zygote.var"#1837#1838"{Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},Array{Float32,4},Array{Float32,4},NNlib.DenseConvDims{2,(4, 3),4,4,(2, 2),(1, 1, 1, 1),(1, 1),false}}})(::Array{AbstractFloat,4}) at /home/russel/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [12] ConvTranspose at /home/russel/.julia/packages/Flux/Fj3bt/src/layers/conv.jl:148 [inlined]
 [13] (::typeof((λ)))(::Array{Float64,4}) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface2.jl:0
 [14] applychain at /home/russel/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:36 [inlined]
 [15] (::typeof((applychain)))(::Array{Float64,4}) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface2.jl:0
 [16] applychain at /home/russel/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:36 [inlined]
 [17] (::typeof((applychain)))(::Array{Float64,4}) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface2.jl:0
 [18] Chain at /home/russel/.julia/packages/Flux/Fj3bt/src/layers/basic.jl:38 [inlined]
 [19] (::typeof((λ)))(::Array{Float64,4}) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface2.jl:0
 [20] loss at /home/russel/Desktop/MWE.jl:13 [inlined]
 [21] (::typeof((λ)))(::Float64) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface2.jl:0
 [22] FluxML/Flux.jl#9 at /home/russel/Desktop/MWE.jl:16 [inlined]
 [23] (::typeof((λ)))(::Float64) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface2.jl:0
 [24] (::Zygote.var"#49#50"{Zygote.Params,Zygote.Context,typeof((λ))})(::Float64) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface.jl:179
 [25] gradient(::Function, ::Zygote.Params) at /home/russel/.julia/packages/Zygote/YeCEW/src/compiler/interface.jl:55
 [26] mwe(::Type{T} where T) at /home/russel/Desktop/MWE.jl:15
 [27] top-level scope at REPL[5]:1
 [28] eval(::Module, ::Any) at ./boot.jl:331
 [29] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [30] run_backend(::REPL.REPLBackend) at /home/russel/.julia/packages/Revise/MgvIv/src/Revise.jl:1023
 [31] top-level scope at none:0
@DhairyaLGandhi
Copy link
Member

Perhaps needs a loosening of the signature in NNlib

@alecokas
Copy link

I've been experiencing a similar problem as raised here by @contradict (UndefRefError: access to undefined reference with a convolutional VAE). Is this issue still under consideration @DhairyaLGandhi ?

@CarloLucibello
Copy link
Member

although the error here is weird and should be fixed, I'm not sure we want to support mixed Float32/Float64 computations, we should at least throw a warning. For example, all layers in the mwe function should be converted to Float64 using the f64 method when T == Float64

@alecokas
Copy link

alecokas commented Jul 3, 2020

That sounds reasonable @CarloLucibello. I've spent a couple days trying to identify the root of the error to create a fix but have not been successful yet. Even when running the accepted solution in the linked discussion, I still encounter typing issues while trying to backprop the pooling layer.

@zomborid
Copy link

zomborid commented Oct 8, 2021

I have run into this problem in a different way.
The main culprit seems to be in NNlib.
In conv.jl:89

y = similar(x, promote_type(xT, wT), output_size(cdims)...,
                               channels_out(cdims), size(x,N))

and conv_direct.jl:98, 141

y[w_idx, h_idx, d_idx, c_out, batch] = alpha*dotprod + beta*y[w_idx, h_idx, d_idx, c_out, batch]

The problem arises when similar creates an array of undef values and conv_direct! tries to read them.
Even though beta is set to false by default, asking for the value of y[...] results in an exception.

My quick workaround is changing the behavior of beta to select the value of y[...] or 0.
Maybe the initializer of the given datatype should be fixed so it wont result in undef values.

y[w_idx, h_idx, d_idx, c_out, batch] = alpha*dotprod + (beta ? y[w_idx, h_idx, d_idx, c_out, batch] : 0)

(I am not an expert in Julia, so I am not sure which behavior should change.)

@DhairyaLGandhi
Copy link
Member

That seems reasonable but similar shouldn't produce undef values. We have fallbacks for mixed f32/64 as well which warn appropriately.

@zomborid
Copy link

zomborid commented Oct 8, 2021

I think it is not about f32/64 specifically, but some internal type handling during the gradient call.
I dont know how gradient works internally, but i would guess it passes a special type through similar that causes undef initialization.
In my case similar(zeros(3,3), Num, 2,2) produces #undef values.

@ToucheSir
Copy link
Member

FWIW the original MWE works without issue for me on Flux 0.12.7 and Zygote 0.6.23. We'll need a new one to keep investigating this.

@ToucheSir
Copy link
Member

@zomborid beta in comv_direct! is actually not a boolean param, but assigned a bool value as a performance optimization. See

By defaulting `beta` to `false`, we make use of the Bradbury promotion trick to override
`NaN`'s that may pre-exist within our output buffer, as `false*NaN == 0.0`, whereas
`0.0*NaN == NaN`. Only set `beta` if you are certain that none of the elements within
`y` are `NaN`.
.

@zomborid
Copy link

For my case I finally decided on the workaround of defining similar in the following way:

Base.similar(a::AbstractArray, ::Type{MySpecificType}, dims::Base.DimOrInd...) = fill(MySpecificType(), dims)

Although I think that reading a possibly undefined value is a bug in Flux.
Maybe specialising on primitive types using the existing performance trick and in general using a properly initialized array would be better.

@ToucheSir
Copy link
Member

ToucheSir commented Oct 11, 2021

Wait, are you defining a completely custom numeric type? Certainly I'm surprised that it works at all then, though it is a fair argument that the optimized for simplicity direct conv implementation should be able unknown numeric types as long as they adhere to some interface.

@Seelengrab
Copy link

Seelengrab commented Oct 16, 2021

The original MWE in the OP works for me on 1.7-rc1.

Since the extended issue seems to arise from similar, this can occur with any mutable user defined type. similar is even documented to return an uninitialized array:

help?> similar
search: similar

  similar(array, [element_type=eltype(array)], [dims=size(array)])

  Create an uninitialized mutable array with the given element type and size, based upon the given
  source array.

If Flux requires this to be zeroed, I'd suggest using either zeros(T, dims..) or explicitly zeroing the resulting array (presumably the better choice, since that preserves array type). Using fill(zero(T), dims..) would run into having each entry of the resulting array be ===, since it doesn't copy for mutable types T (see JuliaLang/julia#41209).

@natema
Copy link

natema commented Apr 25, 2023

The MWE runs fine on Julia 1.8.5 with Flux 0.13.15.

@ToucheSir
Copy link
Member

So the fix is basically what @Seelengrab mentioned: add a loop at the start of the direct conv functions which zeros out the destination array. The original routine was written cleverly, but without regard for more exotic numeric types. Because the functions in question are fallback methods I would not consider this super high priority, but happy to guide someone through a PR if there's interest.

@ToucheSir ToucheSir transferred this issue from FluxML/Flux.jl Apr 25, 2023
@adrhill
Copy link
Contributor

adrhill commented Jun 17, 2024

@gdalle and I ran into this issue with SparseConnectivityTracer.jl's tracer types.

Because the functions in question are fallback methods I would not consider this super high priority, but happy to guide someone through a PR if there's interest.

I think I'll take you up on that. Is it enough to add the following line?

fill!(y, zero(yT))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants