Specialization fixes for mapreducedim. #316

maleadt · 2020-07-24T16:40:32Z

Fixes #302

julia> k = KnetArray{Float32}(rand(10,100));

julia> c = CuArray{Float32}(rand(10,100));

julia> @benchmark sum(k)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     8.516 μs (0.00% GC)
  median time:      8.788 μs (0.00% GC)
  mean time:        8.817 μs (0.00% GC)
  maximum time:     23.335 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     3

julia> @benchmark sum(c)
BenchmarkTools.Trial: 
  memory estimate:  1.08 KiB
  allocs estimate:  37
  --------------
  minimum time:     12.593 μs (0.00% GC)
  median time:      19.867 μs (0.00% GC)
  mean time:        19.804 μs (0.00% GC)
  maximum time:     325.308 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark sum(k,dims=1)
BenchmarkTools.Trial: 
  memory estimate:  288 bytes
  allocs estimate:  11
  --------------
  minimum time:     2.899 μs (0.00% GC)
  median time:      3.016 μs (0.00% GC)
  mean time:        3.072 μs (0.00% GC)
  maximum time:     163.371 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark sum(c,dims=1)
BenchmarkTools.Trial: 
  memory estimate:  960 bytes
  allocs estimate:  33
  --------------
  minimum time:     3.236 μs (0.00% GC)
  median time:      3.493 μs (0.00% GC)
  mean time:        4.332 μs (4.12% GC)
  maximum time:     5.562 ms (32.07% GC)
  --------------
  samples:          10000
  evals/sample:     8

julia> @benchmark sum(abs2, k)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  2
  --------------
  minimum time:     8.926 μs (0.00% GC)
  median time:      9.180 μs (0.00% GC)
  mean time:        9.288 μs (0.00% GC)
  maximum time:     128.632 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     3

julia> @benchmark sum(abs2, c)
BenchmarkTools.Trial: 
  memory estimate:  1.08 KiB
  allocs estimate:  37
  --------------
  minimum time:     13.336 μs (0.00% GC)
  median time:      20.882 μs (0.00% GC)
  mean time:        20.743 μs (0.00% GC)
  maximum time:     66.558 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

From x10 to less than 50% overhead.

@denizyuret at this point the largest issues are gone, and it would be good to port over some over the tricks that Knet does. For example, I think the scalar reductions here avoid allocating an output container (I couldn't see a cudaMalloc in the profiler), which might account for the remaining overhead.

denizyuret · 2020-07-24T17:20:29Z

Looks good. The scalar reduction code is in Knet/deps/cuda20.jl, how can I help with the port?

codecov · 2020-07-24T19:33:20Z

Codecov Report

Merging #316 into master will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #316      +/-   ##
==========================================
+ Coverage   79.33%   79.34%   +0.01%     
==========================================
  Files         155      155              
  Lines        8902     8900       -2     
==========================================
  Hits         7062     7062              
+ Misses       1840     1838       -2

Impacted Files	Coverage Δ
src/mapreduce.jl	`100.00% <100.00%> (+4.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dc771d...3107df7. Read the comment docs.

maleadt added cuda array Stuff about CuArray. performance How fast can we go? labels Jul 24, 2020

Specialization fixes for mapreducedim.

3107df7

maleadt force-pushed the tb/mapreduce branch from 91eea96 to 3107df7 Compare July 24, 2020 17:01

maleadt merged commit afaec8e into master Jul 24, 2020

maleadt deleted the tb/mapreduce branch July 24, 2020 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialization fixes for mapreducedim. #316

Specialization fixes for mapreducedim. #316

maleadt commented Jul 24, 2020 •

edited

Loading

denizyuret commented Jul 24, 2020

codecov bot commented Jul 24, 2020 •

edited

Loading

Specialization fixes for mapreducedim. #316

Specialization fixes for mapreducedim. #316

Conversation

maleadt commented Jul 24, 2020 • edited Loading

denizyuret commented Jul 24, 2020

codecov bot commented Jul 24, 2020 • edited Loading

Codecov Report

maleadt commented Jul 24, 2020 •

edited

Loading

codecov bot commented Jul 24, 2020 •

edited

Loading