Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10x latency for remotecall+fetch since 0.4 #19838

Closed
andreasnoack opened this issue Jan 3, 2017 · 3 comments
Closed

10x latency for remotecall+fetch since 0.4 #19838

andreasnoack opened this issue Jan 3, 2017 · 3 comments
Labels
parallelism Parallel or distributed computation performance Must go faster regression Regression in behavior compared to a previous version

Comments

@andreasnoack
Copy link
Member

The small example below is run on my laptop.

Julia 0.4

julia> addprocs(4);

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.919469 seconds (348.05 k allocations: 14.739 MB, 0.42% gc time)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.006327 seconds (1.40 k allocations: 86.973 KB)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.003600 seconds (1.15 k allocations: 74.859 KB)

Julia master

julia> addprocs(4);

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  2.110132 seconds (402.37 k allocations: 18.438 MB, 0.35% gc time)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.050673 seconds (40.19 k allocations: 1.937 MB)

julia> @time map(fetch, [@spawnat p randn(1) for p in workers()]);
  0.047017 seconds (39.41 k allocations: 1.837 MB)

This might be what is causing #17301

cc: @amitmurthy

@andreasnoack andreasnoack added parallelism Parallel or distributed computation regression Regression in behavior compared to a previous version labels Jan 3, 2017
@JeffBezanson JeffBezanson added the performance Must go faster label Jan 3, 2017
@amitmurthy
Copy link
Contributor

On initial debugging it appears to be an underlying issue with probably code compilation and/or map.

If you define the call in a function

foo() = map(fetch, [@spawnat p randn(1) for p in workers()])

On 0.4

julia> @time foo();
  0.001577 seconds (1.05 k allocations: 69.813 KB)

julia> @time foo();
  0.001652 seconds (1.04 k allocations: 69.250 KB)

On master:

julia> @time foo();
  0.001834 seconds (645 allocations: 29.656 KB)

julia> @time foo();
  0.001883 seconds (638 allocations: 29.078 KB)

Surprisingly, even if you just replace map we get similar differences:

With

@time begin
    futs = []
    for p in workers()
        push!(futs, @spawnat p randn(1))
    end
    results = []
    for x in futs
        push!(results, fetch(x))
    end
    results
 end;

On 0.4

  0.003513 seconds (1.07 k allocations: 70.266 KB)

On master

  0.004656 seconds (1.34 k allocations: 55.219 KB)

The huge increase in allocations with map used unwrapped between 0.4 and master is due to code compilation?

@amitmurthy
Copy link
Contributor

@vtjnash any thoughts?

@JeffBezanson
Copy link
Sponsor Member

Yes, this could be compilation due to new closures introduced by each top-level expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

3 participants