Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference failure with 0.6 + StaticArrays hcat #21244

Closed
tkoolen opened this issue Mar 31, 2017 · 16 comments · Fixed by #21933
Closed

Inference failure with 0.6 + StaticArrays hcat #21244

tkoolen opened this issue Mar 31, 2017 · 16 comments · Fixed by #21933
Labels
bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@tkoolen
Copy link
Contributor

tkoolen commented Mar 31, 2017

With StaticArrays latest master,

using StaticArrays

struct Foo
    bar::SMatrix{3, 3, Float64}
    Foo() = new(eye(SMatrix{3, 3, Float64}))
end

function f(foo::Foo)
    hcat(foo.bar[:, SVector(1, 2)], zeros(SMatrix{3, 3, Float64}))
end

foo = Foo()
f(foo)

sometimes results in

WARNING: An error occurred during inference. Type inference is now partially disabled.

signal (11): Segmentation fault
while loading /home/twan/code/RigidBodyDynamics/v0.6/RigidBodyDynamics/segfault.jl, in expression starting on line 13
_ZN4llvm17ScheduleDAGMILive19updatePressureDiffsENS_8ArrayRefINS_16RegisterMaskPairEEE at /home/twan/code/julia/usr/bin/../lib/libLLVM-3.9.so (unknown line)
_ZN4llvm17ScheduleDAGMILive15initRegPressureEv at /home/twan/code/julia/usr/bin/../lib/libLLVM-3.9.so (unknown line)
_ZN4llvm17ScheduleDAGMILive8scheduleEv at /home/twan/code/julia/usr/bin/../lib/libLLVM-3.9.so (unknown line)
unknown function (ip: 0x7fd185333fd0)
unknown function (ip: 0x7fd18533c83f)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/twan/code/julia/usr/bin/../lib/libLLVM-3.9.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/twan/code/julia/usr/bin/../lib/libLLVM-3.9.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/twan/code/julia/usr/bin/../lib/libLLVM-3.9.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/twan/code/julia/usr/bin/../lib/libLLVM-3.9.so (unknown line)
operator() at /home/twan/code/julia/src/jitlayers.cpp:442 [inlined]
_M_invoke at /usr/include/c++/5/functional:1857
operator() at /usr/include/c++/5/functional:2267 [inlined]
addModuleSet<llvm::SmallVector<std::unique_ptr<llvm::Module>, 1u>, llvm::RTDyldMemoryManager*, std::unique_ptr<llvm::orc::LambdaResolver<JuliaOJIT::addModule(std::unique_ptr<llvm::Module>)::<lambda(const string&)>, JuliaOJIT::addModule(std::unique_ptr<llvm::Module>)::<lambda(const string&)> >, std::default_delete<llvm::orc::LambdaResolver<JuliaOJIT::addModule(std::unique_ptr<llvm::Module>)::<lambda(const string&)>, JuliaOJIT::addModule(std::unique_ptr<llvm::Module>)::<lambda(const string&)> > > > > at /home/twan/code/julia/usr/include/llvm/ExecutionEngine/Orc/IRCompileLayer.h:73 [inlined]
addModule at /home/twan/code/julia/src/jitlayers.cpp:569
jl_add_to_ee at /home/twan/code/julia/src/jitlayers.cpp:792 [inlined]
jl_finalize_function at /home/twan/code/julia/src/jitlayers.cpp:803
getAddressForFunction at /home/twan/code/julia/src/codegen.cpp:1360
jl_generate_fptr at /home/twan/code/julia/src/codegen.cpp:1457
jl_compile_method_internal at /home/twan/code/julia/src/julia_internal.h:313 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:341 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
println at ./boot.jl:378 [inlined]
println at ./boot.jl:382
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
typeinf_loop at ./inference.jl:2657
typeinf_frame at ./inference.jl:2467
typeinf_code at ./inference.jl:2537
unknown function (ip: 0x7fd18128c5dd)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
typeinf_ext at ./inference.jl:2576
unknown function (ip: 0x7fd18126bfe2)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
jl_apply at /home/twan/code/julia/src/julia.h:1416 [inlined]
jl_type_infer at /home/twan/code/julia/src/gf.c:269
jl_compile_for_dispatch at /home/twan/code/julia/src/gf.c:1629
jl_compile_method_internal at /home/twan/code/julia/src/julia_internal.h:294 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:341 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
f at /home/twan/code/RigidBodyDynamics/v0.6/RigidBodyDynamics/segfault.jl:9
unknown function (ip: 0x7fcf53f60792)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
do_call at /home/twan/code/julia/src/interpreter.c:75
eval at /home/twan/code/julia/src/interpreter.c:242
jl_interpret_toplevel_expr at /home/twan/code/julia/src/interpreter.c:34
jl_toplevel_eval_flex at /home/twan/code/julia/src/toplevel.c:577
jl_parse_eval_all at /home/twan/code/julia/src/ast.c:873
jl_load at /home/twan/code/julia/src/toplevel.c:616
include_from_node1 at ./loading.jl:539
unknown function (ip: 0x7fd1813edf3b)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
include at ./sysimg.jl:14
unknown function (ip: 0x7fd18129167b)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
process_options at ./client.jl:305
_start at ./client.jl:371
unknown function (ip: 0x7fd18141b578)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
jl_apply at /home/twan/code/julia/ui/../src/julia.h:1416 [inlined]
true_main at /home/twan/code/julia/ui/repl.c:127
main at /home/twan/code/julia/ui/repl.c:264
__libc_start_main at /build/glibc-9tT8Do/glibc-2.23/csu/../csu/libc-start.c:291
unknown function (ip: 0x401668)
Allocations: 3910719 (Pool: 3909327; Big: 1392); GC: 5
Segmentation fault (core dumped)

other times in

WARNING: An error occurred during inference. Type inference is now partially disabled.
StackOverflowError()
forall_exists_subtype at /home/twan/code/julia/src/subtype.c:881
jl_subtype_env at /home/twan/code/julia/src/subtype.c:948
abstract_eval at ./inference.jl:1902
unknown function (ip: 0x7efefe0b0df6)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
copy! at ./abstractarray.jl:565
abstract_eval_call at ./inference.jl:1866
abstract_eval at ./inference.jl:1915
unknown function (ip: 0x7efefe0b0df6)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234
copy! at ./abstractarray.jl:565
abstract_eval_call at ./inference.jl:1866
abstract_eval at ./inference.jl:1915
unknown function (ip: 0x7efefe0b0df6)
jl_call_fptr_internal at /home/twan/code/julia/src/julia_internal.h:326 [inlined]
jl_call_method_internal at /home/twan/code/julia/src/julia_internal.h:345 [inlined]
jl_apply_generic at /home/twan/code/julia/src/gf.c:2234

<repeating>

versioninfo():

Julia Version 0.6.0-pre.alpha.329
Commit cded1db (2017-03-31 00:28 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

StaticArrays SHA: bd3236ac39372aa84891bea066c71d63559af29e

cc: @andyferris

@Keno
Copy link
Member

Keno commented Mar 31, 2017

StaticArrays is doing bad things with its _index_shape method. I had that fixed locally, but haven't put it up yet.

@tkoolen
Copy link
Contributor Author

tkoolen commented Mar 31, 2017

I don't see an _index_shape method in StaticArrays master or 0.4.0, but I look forward to a PR!

@Keno
Copy link
Member

Keno commented Mar 31, 2017

I misremembered the name: https://github.com/JuliaArrays/StaticArrays.jl/blob/master/src/indexing.jl#L55-L66

@andyferris
Copy link
Member

I'm curious - what is the problem with these methods?

@Keno
Copy link
Member

Keno commented Apr 1, 2017

Your @pure increment bypasses inference's termination heuristics and you have no terminating case. They can be written either using a decrement or over the tuple argument.

@martinholters
Copy link
Member

To expand a bit: Inference ensures termination by limiting the complexity of types in terms of nesting depth, tuple length, etc. But if the argument to index_sizes cannot be inferred, then there is no termination criterion for the recursion of _index_sizes, and the type that changes with every recursion is Val{1}, Val{2}, ... which stays at very low complexity. Hence, inference tries to infer a quasi-infinite number of specializations for _index_sizes, and due to the recursive nature of inference, the result is a stack overflow. (I hope I got that right?)

But while for this particular case a fix to StaticArrays seems in order, we might want to keep this issue open as a reminder that the inference termination heuristics are far from perfect. (Or do we have a dedicated issue for that?)

@andyferris
Copy link
Member

While increment does indeed "continue forever" (it's a generator for the natural numbers), used in conjunction with the _index_sizes method it seems it could only possibly iterate a small number of times (i.e. the number of input arguments to _input_sizes - there is a terminating case for _input_sizes whenever the varargs runs out).

In the exceedingly rare case that this is a large number of input arguments, then the varargs would be so long that inference will give up and widen, and it would call increment dynamically.

Sorry, I'm just a bit confused what the problem is here, and what the solution is. Do I have a bug that makes _index_sizes non-terminating? Is inference getting confused about something that it shouldn't be confused about?

@martinholters
Copy link
Member

While _index_sizes is indeed terminating, inference might fail to find the exact type of the inds parameter, because the call-site splats something of unknown (or un-inferable) length. Say inds is inferred as Tuple, then, as its first element might be an Int, inference of _index_sizes(Val{1}, (), inds...) needs to consider the method _index_sizes(v::Type{Val{N}}, x::Tuple, ::Int, inds...) specialized as _index_sizes(v::Type{Val{1}}, x::Tuple{}, ::Int, inds...) with inds::Tuple again. That, in turn, needs to consider (among others) the same method again, specialized as _index_sizes(v::Type{Val{2}}, x::Tuple{Size}, ::Int, inds...) with inds::Tuple once more. This iterates, and while the type of x will at some point be limited to Tuple (or maybe Tuple{Vararg{Size}}, not 100% sure), the Val parameter will just continue to be increased, leading to different specializations every time.

Oh, I should mention that I haven't checked whether this is the actual cause of the OP problem.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 3, 2017

Yes. At this time, it is assumed that an @pure function will never be called in a recursive context. While not invalid, we currently do not have the logic in place in inference to detect this situation and abort inference on the recursion (before it may sometimes cause the crash observed here).

@martinholters
Copy link
Member

Ok, confirmed insofar as replacing all _index_sizes methods with the specializations needed for the OP case

@inline _index_sizes(s::Size, v::Type{Val{1}}, x::Tuple, a::Colon, inds...) = _index_sizes(s, increment(v), (x..., Size(s[1])), inds...)
@inline _index_sizes(s::Size, v::Type{Val{2}}, x::Tuple, a::StaticArray, inds...) = _index_sizes(s, increment(v), (x..., Size(a)), inds...)
@inline _index_sizes(s::Size, ::Type{Val{3}}, x::Tuple) = x

fixes it.

So let me ask again, should we keep this issue open as a reminder that the inference termination heuristics are far from perfect, or do we have a dedicated issue for that?

@JeffBezanson JeffBezanson added the bug Indicates an unexpected problem or unintended behavior label Apr 3, 2017
@JeffBezanson JeffBezanson added this to the 0.6.x milestone Apr 4, 2017
martinholters added a commit to martinholters/StaticArrays.jl that referenced this issue Apr 7, 2017
@martinholters
Copy link
Member

Anticipating this to be fixed in StaticArrays, here is a self-contained, somewhat reduced repro of the underlying problem for future reference:

immutable Size{s}; end

Base.@pure increment(::Type{Val{N}}) where {N} = Val{N+1}

@inline index_sizes(s::Size, inds...) = _index_sizes(s, Val{1}, (), inds...)
@inline _index_sizes(s::Size, ::Type{Val{N}}, x::Tuple) where {N} = x
@inline _index_sizes(s::Size, v::Type{Val{N}}, x::Tuple, ::Int, inds...) where {N} = _index_sizes(s, increment(v), (x..., Size()), inds...)
@inline _index_sizes(s::Size, v::Type{Val{N}}, x::Tuple, a::Colon, inds...) where {N} = _index_sizes(s, increment(v), (x..., Size(s[N])), inds...)

code_warntype(index_sizes, Tuple{Vararg{Any}})

@martinholters
Copy link
Member

martinholters commented Apr 7, 2017

Working at fixing it in StaticArrays, I found another case that is also quite entertaining; reduced, self-contained repro:

immutable Size{s}; end

# EDIT: The @pure here is not necessary to trigger the problem
#=Base.@pure=# tail(::Type{Size{S}}) where {S} = Size{Base.tail(S)} 
@inline tail(::S) where {S<:Size} = tail(S)()

@inline index_sizes(s::Size, inds...) = _index_sizes(s, (), inds...)
@inline _index_sizes(s::Size, x::Tuple) = x
@inline _index_sizes(s::Size, x::Tuple, ::Int, inds...) = _index_sizes(tail(s), (x..., Size()), inds...)
@inline _index_sizes(s::Size, x::Tuple, a::Colon, inds...) = _index_sizes(tail(s), (x..., Size(s[1])), inds...)

for n in 1:16
    t = @elapsed code_warntype(DevNull, index_sizes, NTuple{n, Any})
    println("$n: $t")
end
@elapsed code_warntype(DevNull, index_sizes, Tuple{Vararg{Any}})

Exponential growth in run-time, similar to #20848 (comment), but even more pronounced.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 25, 2017

We will need to refactor the inference code somewhat substantially to be able to track this information correctly. For now, I simply recommend avoiding marking functions as @pure whenever possible (the inference fix for this example will necessarily be to ignore the @pure declaration anyways).

@martinholters
Copy link
Member

For the OP case, where the recursion is infinitely deep, active will hold the same method a great many times, which could be detected to bail out, e.g. like so:

--- a/base/inference.jl
+++ b/base/inference.jl
@@ -1282,6 +1282,18 @@ function abstract_call_gf_by_type(f::ANY, atype::ANY, sv::InferenceState)
             fullmatch = true
         end
 
+        method_active_count = 0
+        for infstate in active
+            infstate === nothing && continue
+            infstate = infstate::InferenceState
+            if isdefined(infstate.linfo, :def) && method === infstate.linfo.def
+                method_active_count += 1
+            end
+        end
+        if method_active_count > 200 # or whatever limit makes most sense
+            sig = method.sig
+        end
+
         # limit argument type tuple growth
         msig = unwrap_unionall(m[3].sig)
         lsig = length(msig.parameters)

This fixes the case from #21244 (comment) (which should be equivalent to the OP case), and does not seem to have any adverse effects on first sight. I don't know how kosher this is, as for recursion looping through multiple functions/methods, it will depend on the entry function where this heuristic kicks in. Might be better than nothing, still?

@martinholters
Copy link
Member

A much tougher problem seems to be a bifurcating recursion, e.g. the following non-sensical code:

function foo(y, ::Any, xs...)
    foo((y..., Val{0}()), xs...)
    foo((y..., Val{1}()), xs...)
    foo((y..., Val{2}()), xs...)
    foo((y..., Val{3}()), xs...)
    foo((y..., Val{4}()), xs...)
    foo((y..., Val{5}()), xs...)
    foo((y..., Val{6}()), xs...)
    foo((y..., Val{7}()), xs...)
    # add more for even greater fun
end
foo(y) = y

Here, code_warntype(foo, Tuple{Tuple{}, Vararg{Any,3}}) takes 2.8 seconds on my system, while code_warntype(foo, Tuple{Tuple{}, Vararg{Any,4}}) needs 212 seconds, and I didn't dare trying higher values.

I think for cases like this, we would indeed need to track how often a certain method has been inferred (in the current inference run) to figure out that something funny is going on.

@tkoolen
Copy link
Contributor Author

tkoolen commented Sep 24, 2017

For reference, the lines in StaticArrays @Keno was referring to are
https://github.com/JuliaArrays/StaticArrays.jl/blob/bd3236ac39372aa84891bea066c71d63559af29e/src/indexing.jl#L55-L66

(Keno's link was referring to master, which has since changed).

tkoolen added a commit to tkoolen/StaticArrays.jl that referenced this issue May 29, 2018
This set was introduced to test an inference issue (JuliaLang/julia#21244) in an old version of Julia.

It currently fails on Julia master because of JuliaLang/julia#27276.

In my opinion, it's currently not really testing anything valuable, and the Julia PR that fixed the issue has associated test cases that should cover this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants