Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Julia 1.6 #1514

Merged
merged 69 commits into from
May 19, 2021
Merged

Upgrade to Julia 1.6 #1514

merged 69 commits into from
May 19, 2021

Conversation

ali-ramadhan
Copy link
Member

Just updating the Manifest.toml and the Buildkite Julia version number to see if everything passes. I've been using 1.6 fine on my laptop so this should test 1.6 + GPUs.

@ali-ramadhan
Copy link
Member Author

Everything looks good on the CPU but the GPU unit tests segfault when testing field setting (tried debugging but can't figure out why). All the other GPU tests pass although CI seems much slower for GPU tests (~3x slower?).

Could be related to segfault in CliMA/ClimateMachine.jl#2146 ? @charleskawczynski @jakebolewski were you able to figure out why it was segfaulting?

@ali-ramadhan
Copy link
Member Author

Okay so the segfault was because we were trying to set a field using an Int128 or UInt128.

It's not an important test so I removed it but it's a little weird that it just started failing since the test has been in since Oceananigans.jl v0.1.0...

@charleskawczynski
Copy link
Member

All the other GPU tests pass although CI seems much slower for GPU tests (~3x slower?).

😞

Could be related to segfault in CliMA/ClimateMachine.jl#2146 ? @charleskawczynski @jakebolewski were you able to figure out why it was segfaulting?

Not yet, I'm going to try looking into it. @jakebolewski suggested first upgrading some packages first-- so I'm doing that now.

@jakebolewski
Copy link

hmm good to know about possible performance issues, we should be on the lookout. I suspect that (one of the) segfaults is an OOB error that got masked by @inbounds, running that now locally.

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Mar 29, 2021

I opened an issue about the Int128/UInt128 segfault (JuliaGPU/CUDA.jl#793) but will revisit this PR later to look into the performance regression.

@ali-ramadhan
Copy link
Member Author

Hmmm a lot of failures due to CUDA scalar getindex operations even though we explicitly set CUDA.allowscalar(true) in runtests.jl...

We could take this opportunity to get rid of all scalar operations in the tests and just use CUDA.@allowscalar where it's needed.

Maybe new CUDA scalar operations are hurting performance and that's why GPU CI has slowed down?

@navidcy
Copy link
Collaborator

navidcy commented Apr 9, 2021

@ali-ramadhan, I found the same issue in GeophysicalFlows.jl
I think (but I'm not sure) that there was some change on how CUDA.@allowscalar works. For example, previously you could define a new array like:

CUDA.@allowscalar newarray = [i*j for i=1:3, j=1:4]

but now that's not possible! Instead you need to do

newarray = CUDA.@allowscalar [i*j for i=1:3, j=1:4]

Perhaps the GPU-connoisseurs might have some more insight on this? cc @maleadt, @vchuravy

@maleadt
Copy link
Collaborator

maleadt commented Apr 10, 2021

That probably shouldn't have changed, can you file an issue on CUDA.jl/GPUArrays.jl? I'll have a look next week. The only change to @allowscalar that comes to mind is task/thread-safety, which does come at a certain performance cost (it now does a TLS lookup instead of a simple pointer check, but the cost of that should be negligible compared to the subsequent memory transfer).

@navidcy
Copy link
Collaborator

navidcy commented Apr 11, 2021

That probably shouldn't have changed, can you file an issue on CUDA.jl/GPUArrays.jl? I'll have a look next week. The only change to @allowscalar that comes to mind is task/thread-safety, which does come at a certain performance cost (it now does a TLS lookup instead of a simple pointer check, but the cost of that should be negligible compared to the subsequent memory transfer).

Seems I can't reproduce the supposed error so, sorry, my bad... Something else must have been the issue. 😔

@navidcy
Copy link
Collaborator

navidcy commented Apr 14, 2021

I fixed the doctests.

However, there is an issue with Base.typename() which seems to have changed behavior in Julia v1.6; see JuliaLang/julia#40474

We should modify our show() overloads to convert output from, e.g., typename(CPU) to CPU.

@navidcy
Copy link
Collaborator

navidcy commented Apr 14, 2021

I suggest changing

print(io, "IncompressibleModel{$(Base.typename(A)), $(eltype(model.grid))}",

to

print(io, "IncompressibleModel{"*string(Base.nameof(A))*", $(eltype(model.grid))}",

@ali-ramadhan
Copy link
Member Author

WENO is failing on the GPU because CUDAKernels.jl is trying to overdub ^ with CUDA.pow which doesn't exist anymore: JuliaGPU/KernelAbstractions.jl#249

Otherwise we should be pretty close to having all tests passing!

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented May 18, 2021

Finally all tests pass 🎉 Thanks @navidcy and @vchuravy for all your help!

@glwagner Let me know when it would be a good time to merge this PR and tag a new release.

Ran the incompressible model benchmarks and in general it seems that with Julia 1.6 Oceananigans allocates more memory and is a bit slower on the CPU but a bit faster on the GPU.

Quick benchmark

Julia 1.6

Oceananigans v0.57.2
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V
                                            Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬──────────┬────────┬─────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │   memory │ allocs │ samples │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼──────────┼────────┼─────────┤
│           CPU │     Float64 │  32 │   4.996 ms │   5.047 ms │   5.113 ms │   5.770 ms │ 1.77 MiB │   2301 │      10 │
│           CPU │     Float64 │  64 │  34.951 ms │  35.967 ms │  36.414 ms │  41.417 ms │ 1.77 MiB │   2301 │      10 │
│           CPU │     Float64 │ 128 │ 301.074 ms │ 301.964 ms │ 302.498 ms │ 307.989 ms │ 1.77 MiB │   2301 │      10 │
│           CPU │     Float64 │ 256 │    2.894 s │    2.895 s │    2.895 s │    2.896 s │ 1.77 MiB │   2301 │       2 │
│           GPU │     Float64 │  32 │   2.859 ms │   2.923 ms │   3.025 ms │   3.987 ms │ 2.80 MiB │   6914 │      10 │
│           GPU │     Float64 │  64 │   2.912 ms │   3.101 ms │   3.308 ms │   5.368 ms │ 2.78 MiB │   6993 │      10 │
│           GPU │     Float64 │ 128 │   4.894 ms │   5.019 ms │   5.360 ms │   8.565 ms │ 2.80 MiB │   8667 │      10 │
│           GPU │     Float64 │ 256 │  33.569 ms │  36.266 ms │  36.029 ms │  36.883 ms │ 3.24 MiB │  37307 │      10 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴──────────┴────────┴─────────┘

Julia 1.5

Oceananigans v0.57.2
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V
                                             Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬─────────────┬────────┬─────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │      memory │ allocs │ samples │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼─────────────┼────────┼─────────┤
│           CPU │     Float64 │  32 │   4.905 ms │   5.116 ms │   5.237 ms │   6.262 ms │  360.44 KiB │   2142 │      10 │
│           CPU │     Float64 │  64 │  34.327 ms │  36.419 ms │  36.188 ms │  37.507 ms │  360.44 KiB │   2142 │      10 │
│           CPU │     Float64 │ 128 │ 297.567 ms │ 299.404 ms │ 299.856 ms │ 303.081 ms │  360.44 KiB │   2142 │      10 │
│           CPU │     Float64 │ 256 │    2.786 s │    2.786 s │    2.786 s │    2.787 s │  360.44 KiB │   2142 │       2 │
│           GPU │     Float64 │  32 │   2.729 ms │   2.781 ms │   2.885 ms │   3.717 ms │  839.83 KiB │   7132 │      10 │
│           GPU │     Float64 │  64 │   2.894 ms │   3.055 ms │   3.136 ms │   4.046 ms │  879.45 KiB │   7124 │      10 │
│           GPU │     Float64 │ 128 │   4.107 ms │   4.836 ms │   4.890 ms │   6.086 ms │  964.73 KiB │   7142 │      10 │
│           GPU │     Float64 │ 256 │  21.667 ms │  35.642 ms │  34.245 ms │  35.746 ms │    1.11 MiB │   7134 │      10 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴─────────────┴────────┴─────────┘

@navidcy
Copy link
Collaborator

navidcy commented May 19, 2021

OMG

@navidcy navidcy self-requested a review May 19, 2021 00:12
Copy link
Collaborator

@navidcy navidcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve so much

@tomchor
Copy link
Collaborator

tomchor commented May 19, 2021

Thanks @ali-ramadhan! That's amazing work.

Quick quick question: Should we be concerned about the GPU memory allocations? They're roughly 3x larger for 1.6 which is a pretty big difference! Especially considering the size limitations on GPUs.

@navidcy
Copy link
Collaborator

navidcy commented May 19, 2021

Thanks @ali-ramadhan! That's amazing work.

Quick quick question: Should we be concerned about the GPU memory allocations? They're roughly 3x larger for 1.6 which is a pretty big difference! Especially considering the size limitations on GPUs.

Actually that's a good point. I didn't notice this. I was so overwhelmed with the All checks have passed that I couldn't see straight.

.buildkite/pipeline.yml Outdated Show resolved Hide resolved
@ali-ramadhan
Copy link
Member Author

Quick quick question: Should we be concerned about the GPU memory allocations? They're roughly 3x larger for 1.6 which is a pretty big difference! Especially considering the size limitations on GPUs.

Ah so in the benchmarks those are just CPU memory allocations since BenchmarkTools.jl doesn't measure GPU allocations. CUDA.@time can measure GPU allocations but I haven't used it much on Oceananigans.

I don't think it's a cause for worry but it might be good to do some profiling at some point to figure out where the extra memory allocations are coming from.

Interestingly the benchmarks suggest that GPU models are actually a bit faster now 👀

@glwagner
Copy link
Member

I'm not sure --- but it's possible that some GPU utilities not under Oceananigans.jl control incur memory allocations. I don't think we have much GPU-specific code in our codebase (except for pressure solvers...) Definitely a good thing to keep tabs on and open issues in the relevant packages if it affects the performance of our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package 📦 Quite meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants