Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

Performance issue with v2.1.0 compared with v1.7.3 #701

Closed
findmyway opened this issue May 3, 2020 · 4 comments · Fixed by #704
Closed

Performance issue with v2.1.0 compared with v1.7.3 #701

findmyway opened this issue May 3, 2020 · 4 comments · Fixed by #704

Comments

@findmyway
Copy link
Contributor

Describe the bug
The performance of [email protected] is slower compared to v1.7.3 for small models.

To Reproduce
The Minimal Working Example (MWE) for this bug:

(@v1.4) pkg> st
  [587475ba] Flux v0.10.4
  [3a865a2d] CuArrays v2.1.0 #master (https://github.com/JuliaGPU/CuArray
  [be33ccc6] CUDAnative v3.0.4

julia> using Flux,CuArrays

julia> model = Chain(
           Dense(4, 128, relu),
           Dense(128, 128, relu),
           Dense(128, 2),
       ) |> gpu
Chain(Dense(4, 128, relu), Dense(128, 128, relu), Dense(128, 2))

julia> @benchmark  CuArrays.@sync model($(cu(rand(4))))
BenchmarkTools.Trial: 
  memory estimate:  8.80 KiB
  allocs estimate:  276
  --------------
  minimum time:     93.864 μs (0.00% GC)
  median time:      115.179 μs (0.00% GC)
  mean time:        125.542 μs (1.97% GC)
  maximum time:     50.622 ms (48.86% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> CuArrays.version()
v"10.1.243"

For comparison:

(@v1.4) pkg> st
  [be33ccc6] CUDAnative v2.10.2
  [3a865a2d] CuArrays v1.7.3
  [587475ba] Flux v0.10.3

julia> @benchmark  CuArrays.@sync model($(cu(rand(4))))
BenchmarkTools.Trial: 
  memory estimate:  8.16 KiB
  allocs estimate:  223
  --------------
  minimum time:     45.627 μs (0.00% GC)
  median time:      74.875 μs (0.00% GC)
  mean time:        85.175 μs (2.61% GC)
  maximum time:     32.836 ms (33.09% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> CUDAdrv.version()
v"10.1.0"

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete this section)
Details on Julia:

julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Additional context
Add any other context about the problem here.
Test with RTX 2080ti

Note that the model is quite small above. For some large models, the performance is similar between v2.1.0 and v1.7.3. However, I'm still quite interested in why there's a significant difference with small models.

@findmyway findmyway added the bug label May 3, 2020
@maleadt maleadt added performance and removed bug labels May 4, 2020
@maleadt
Copy link
Member

maleadt commented May 4, 2020

Bisect to bd38b15

@maleadt
Copy link
Member

maleadt commented May 4, 2020

Could you verify #704 works?

@findmyway
Copy link
Contributor Author

Yes, I can confirm it works. 🎉

Thanks!

@maleadt
Copy link
Member

maleadt commented May 4, 2020

Great. Thanks for the report!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants