Performance opportunities #206

timholy · 2019-03-22T11:44:13Z

In my initial performance work, the big 3 improvements were local method tables (framecode.methodtables), reusing framedata (the junk mechanism), and getting rid of runtime dispatch internally within the interpreter (by manual union-splitting, aka, if isa(x, ConcreteType)...).

Some possible future improvements:

Recent changes diving into Core._apply, Core._apply_latest, and invoke hurt performance. Perhaps some of that might be clawed back by moving more of this work into optimize!?
ccalls are slow. I just pushed a teh/compiled_ccall that contains an old attempt to create one compiled function per ccall. I was most interested in seeing whether it would circumvent MWE of char crash #28 (it didn't seem to), but I don't think I looked at its runtime implications (esp on a ccall-heavy workload). I suspect that branch is pretty close to usable, if anyone wants to pick it up and run with it.
specifically look for calls to iterate and do something special, a.k.a., (1) determine whether iterate could possibly hit a breakpoint or error (if not then you're allowed to play tricks), (2) see if you can determine whether the loop will be long, and if so (3) pass the iterator and frame to a special run_loop function. That would have to compile for each iterator type, so is worth doing only if the loop is long. But you would save the time spent diving in and out of iterate, and that could be quite helpful.
apply similar treatment to arrayrefs. Even our builtin handling of arrayref could be improved by separately treating small dimensions (e.g., if nargs == 1 ... elseif nargs == 2 ... elseif nargs == 3 ... else <varargs version> end.

Obviously spending some time with ProfileView will be useful. I did a ton of that at the beginning and fixed most everything I that I noticed. But I haven't done it in a while and we may have regressed in places as we expanded our functionality.

The text was updated successfully, but these errors were encountered:

KristofferC · 2019-03-28T12:35:06Z

What is your opinion about pushing things to compiled_methods that are very commonly used but unlikely to be of interest for breakpoints.

For example, iterate(a::Array) (with Core_apply now being stepped through) makes us have to do

[1] promote_typeof(x) at promotion.jl:263
[2] promote_typeof(x, xs) at promotion.jl:264
[3] -(a, b) at int.jl:797
[4] iterate(A, i) at array.jl:705
[5] iterate(A) at array.jl:705

Is it a slippery slope to start doing this?

pfitzseb · 2019-03-28T13:11:48Z

We could temporarily pop them from compiled_methods when someone tries to set a breakpoint in one of them. I think it does make sense to these kinds of methods by default.

timholy · 2019-03-28T13:40:52Z

I worry about that slope. Where do we stop? Will packages suddenly fill with @require JuliaInterpreter begin push!(...) end blocks?

timholy · 2019-03-28T14:40:46Z

I do propose special handling of iterate above, one that would effectively inline the calls to iterate for all containers, not just ::Array.

Also, if we start using compiled frames, one thought: I think there are only two classes of slow-to-interpret methods: ones that iterate and ones that recurse. If we can detect those specifically then we might save ourselves compile time.

KristofferC · 2019-03-28T16:07:33Z

Here is a fun one:

julia> using JuliaInterpreter, Plots

julia> @time @interpret plot(rand(5))
  5.703063 seconds (39.36 M allocations: 2.545 GiB, 6.56% gc time)

vs

julia> using JuliaInterpreter, CSV, Plots # note the CSV load

julia> @time @interpret plot(rand(5))
 16.160595 seconds (51.27 M allocations: 2.868 GiB, 3.46% gc time)

timholy · 2019-03-28T17:43:20Z

Is it all method-table searches? (Basically, whichtt or related?)

KristofferC · 2019-03-28T17:55:58Z

I haven't profiled yet, just observed it.

KristofferC · 2019-03-29T09:33:33Z

I think this is the problem:

julia> using BenchmarkTools

julia> c = Some{Any}("foo")
Some("foo")

julia> @code_warntype convert(Some{Any}, c)
Body::Some{Any}
1 ─ %1 = (Base.getfield)(x, :value)::Any
│   %2 = %new(Some{Any}, %1)::Some{Any}
└──      return %2

julia> @btime convert(Some{Any}, $c)
  4.299 ns (1 allocation: 16 bytes)
Some("foo")

julia> using CSV

julia> @code_warntype convert(Some{Any}, c)
Body::Some{Any}
1 ─ %1 = $(Expr(:static_parameter, 1))::Core.Compiler.Const(Any, false)
│   %2 = (Base.getfield)(x, :value)::Any
│   %3 = (Base.convert)(%1, %2)::Any
│   %4 = (Some{Any})(%3)::Some{Any}
└──      return %4

julia> @btime convert(Some{Any}, $c)
  923.077 ns (1 allocation: 16 bytes)
Some("foo")

KristofferC · 2019-03-29T10:10:48Z

With JuliaLang/julia#31536 we are at

julia>  @time @interpret plot(rand(5))
  2.930957 seconds (13.54 M allocations: 481.296 MiB, 2.31% gc time)

for the non CSV case and

julia>  @time @interpret plot(rand(5))
  5.765768 seconds (15.13 M allocations: 507.931 MiB, 1.52% gc time)

with CSV. So much better but still signficiantly slower.

KristofferC · 2019-03-30T10:23:25Z

Changing junk from a Set back to a Vector I get

julia> @time (p = @interpret(plot(rand(5))); @interpret(display(p)))
  9.845403 seconds (64.34 M allocations: 2.011 GiB, 3.19% gc time)

vs master

julia> @time (p = @interpret(plot(rand(5))); @interpret(display(p)))
 12.480055 seconds (66.37 M allocations: 2.219 GiB, 2.56% gc time)

We changed this when we found the bug when I executed the same snippet twice and managed to get two of the same frames into the Vector. However, perhaps we can be smarter here than just uses a full blown IdSet. We don't end up with that many frames in junk so we should be able to exploit that. Also, can it ever happen that the same Frame end up in junk if we empty it when we enter create the "first" frame from toplevel?

timholy · 2019-03-30T10:58:24Z

👍 In theory it seems possible to design the logic such that we never put the same frame in twice. In that case we can just use Vector without any kind of checking. For development purposes, we might want to start with checking and an assertion, and then strip that out when we find it's no longer necessary.

timholy mentioned this issue Mar 22, 2019

Compiled frames: a sketch #204

Open

This comment has been minimized.

Sign in to view

KristofferC mentioned this issue Mar 26, 2019

WIP: also build compiled calls for ccall #216

Merged

timholy mentioned this issue Mar 27, 2019

fix lookup for SSAValues in foreigncalls and quote the library if it is a symbol #222

Closed

KristofferC mentioned this issue Mar 29, 2019

Loading CSV.jl makes conversion of a Some{Any} to a Some{Any} 200 times slower JuliaLang/julia#31535

Closed

KristofferC added the performance label Mar 29, 2019

timholy mentioned this issue Apr 2, 2019

Debugger sluggish with high-iteration comprehensions JunoLab/Juno.jl#281

Closed

KristofferC mentioned this issue Jun 11, 2019

Incredibly Slow Dictionary Lookup when Debugging JuliaDebug/Debugger.jl#192

Closed

KristofferC mentioned this issue Feb 24, 2020

Debugger.jl halted on the line of JSON.parse() JuliaDebug/Debugger.jl#243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance opportunities #206

Performance opportunities #206

timholy commented Mar 22, 2019 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

KristofferC commented Mar 28, 2019

pfitzseb commented Mar 28, 2019

timholy commented Mar 28, 2019

timholy commented Mar 28, 2019

KristofferC commented Mar 28, 2019

timholy commented Mar 28, 2019

KristofferC commented Mar 28, 2019

KristofferC commented Mar 29, 2019 •

edited

Loading

KristofferC commented Mar 29, 2019

KristofferC commented Mar 30, 2019 •

edited

Loading

timholy commented Mar 30, 2019

Performance opportunities #206

Performance opportunities #206

Comments

timholy commented Mar 22, 2019 • edited Loading

This comment has been minimized.

This comment has been minimized.

KristofferC commented Mar 28, 2019

pfitzseb commented Mar 28, 2019

timholy commented Mar 28, 2019

timholy commented Mar 28, 2019

KristofferC commented Mar 28, 2019

timholy commented Mar 28, 2019

KristofferC commented Mar 28, 2019

KristofferC commented Mar 29, 2019 • edited Loading

KristofferC commented Mar 29, 2019

KristofferC commented Mar 30, 2019 • edited Loading

timholy commented Mar 30, 2019

timholy commented Mar 22, 2019 •

edited

Loading

KristofferC commented Mar 29, 2019 •

edited

Loading

KristofferC commented Mar 30, 2019 •

edited

Loading