10x latency for `remotecall`+`fetch` since 0.4 #19838

andreasnoack · 2017-01-03T16:04:24Z

The small example below is run on my laptop.

Julia 0.4

julia> addprocs(4);

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.919469 seconds (348.05 k allocations: 14.739 MB, 0.42% gc time)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.006327 seconds (1.40 k allocations: 86.973 KB)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.003600 seconds (1.15 k allocations: 74.859 KB)

Julia master

julia> addprocs(4);

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  2.110132 seconds (402.37 k allocations: 18.438 MB, 0.35% gc time)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.050673 seconds (40.19 k allocations: 1.937 MB)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.047017 seconds (39.41 k allocations: 1.837 MB)

This might be what is causing #17301

cc: @amitmurthy

amitmurthy · 2017-01-03T18:39:52Z

On initial debugging it appears to be an underlying issue with probably code compilation and/or map.

If you define the call in a function

foo() = map(fetch, [@spawnat p randn(1) for p in workers()])

On 0.4

julia> @time foo();
  0.001577 seconds (1.05 k allocations: 69.813 KB)

julia> @time foo();
  0.001652 seconds (1.04 k allocations: 69.250 KB)

On master:

julia> @time foo();
  0.001834 seconds (645 allocations: 29.656 KB)

julia> @time foo();
  0.001883 seconds (638 allocations: 29.078 KB)

Surprisingly, even if you just replace map we get similar differences:

With

@time begin
    futs = []
    for p in workers()
        push!(futs, @spawnat p randn(1))
    end
    results = []
    for x in futs
        push!(results, fetch(x))
    end
    results
 end;

On 0.4

  0.003513 seconds (1.07 k allocations: 70.266 KB)

On master

  0.004656 seconds (1.34 k allocations: 55.219 KB)

The huge increase in allocations with map used unwrapped between 0.4 and master is due to code compilation?

amitmurthy · 2017-01-03T18:42:19Z

@vtjnash any thoughts?

JeffBezanson · 2017-01-03T19:08:01Z

Yes, this could be compilation due to new closures introduced by each top-level expression.

andreasnoack added parallelism Parallel or distributed computation regression Regression in behavior compared to a previous version labels Jan 3, 2017

JeffBezanson added the performance Must go faster label Jan 3, 2017

andreasnoack mentioned this issue Jan 3, 2017

10x slowdown for creating DArray since 0.4 JuliaParallel/DistributedArrays.jl#119

Closed

andreasnoack closed this as completed Jan 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10x latency for `remotecall`+`fetch` since 0.4 #19838

10x latency for `remotecall`+`fetch` since 0.4 #19838

andreasnoack commented Jan 3, 2017

amitmurthy commented Jan 3, 2017

amitmurthy commented Jan 3, 2017

JeffBezanson commented Jan 3, 2017

10x latency for remotecall+fetch since 0.4 #19838

10x latency for remotecall+fetch since 0.4 #19838

Comments

andreasnoack commented Jan 3, 2017

Julia 0.4

Julia master

amitmurthy commented Jan 3, 2017

amitmurthy commented Jan 3, 2017

JeffBezanson commented Jan 3, 2017

10x latency for `remotecall`+`fetch` since 0.4 #19838

10x latency for `remotecall`+`fetch` since 0.4 #19838