Upgrade to Julia 1.6 #1514

ali-ramadhan · 2021-03-29T11:28:19Z

Just updating the Manifest.toml and the Buildkite Julia version number to see if everything passes. I've been using 1.6 fine on my laptop so this should test 1.6 + GPUs.

ali-ramadhan · 2021-03-29T13:33:39Z

Everything looks good on the CPU but the GPU unit tests segfault when testing field setting (tried debugging but can't figure out why). All the other GPU tests pass although CI seems much slower for GPU tests (~3x slower?).

Could be related to segfault in CliMA/ClimateMachine.jl#2146 ? @charleskawczynski @jakebolewski were you able to figure out why it was segfaulting?

ali-ramadhan · 2021-03-29T13:48:19Z

Okay so the segfault was because we were trying to set a field using an Int128 or UInt128.

It's not an important test so I removed it but it's a little weird that it just started failing since the test has been in since Oceananigans.jl v0.1.0...

charleskawczynski · 2021-03-29T15:16:05Z

All the other GPU tests pass although CI seems much slower for GPU tests (~3x slower?).

😞

Could be related to segfault in CliMA/ClimateMachine.jl#2146 ? @charleskawczynski @jakebolewski were you able to figure out why it was segfaulting?

Not yet, I'm going to try looking into it. @jakebolewski suggested first upgrading some packages first-- so I'm doing that now.

jakebolewski · 2021-03-29T15:53:16Z

hmm good to know about possible performance issues, we should be on the lookout. I suspect that (one of the) segfaults is an OOB error that got masked by @inbounds, running that now locally.

ali-ramadhan · 2021-03-29T17:30:30Z

I opened an issue about the Int128/UInt128 segfault (JuliaGPU/CUDA.jl#793) but will revisit this PR later to look into the performance regression.

ali-ramadhan · 2021-04-09T20:49:52Z

Hmmm a lot of failures due to CUDA scalar getindex operations even though we explicitly set CUDA.allowscalar(true) in runtests.jl...

We could take this opportunity to get rid of all scalar operations in the tests and just use CUDA.@allowscalar where it's needed.

Maybe new CUDA scalar operations are hurting performance and that's why GPU CI has slowed down?

navidcy · 2021-04-09T21:59:35Z

@ali-ramadhan, I found the same issue in GeophysicalFlows.jl
I think (but I'm not sure) that there was some change on how CUDA.@allowscalar works. For example, previously you could define a new array like:

CUDA.@allowscalar newarray = [i*j for i=1:3, j=1:4]

but now that's not possible! Instead you need to do

newarray = CUDA.@allowscalar [i*j for i=1:3, j=1:4]

Perhaps the GPU-connoisseurs might have some more insight on this? cc @maleadt, @vchuravy

maleadt · 2021-04-10T06:32:38Z

That probably shouldn't have changed, can you file an issue on CUDA.jl/GPUArrays.jl? I'll have a look next week. The only change to @allowscalar that comes to mind is task/thread-safety, which does come at a certain performance cost (it now does a TLS lookup instead of a simple pointer check, but the cost of that should be negligible compared to the subsequent memory transfer).

navidcy · 2021-04-11T22:45:33Z

That probably shouldn't have changed, can you file an issue on CUDA.jl/GPUArrays.jl? I'll have a look next week. The only change to @allowscalar that comes to mind is task/thread-safety, which does come at a certain performance cost (it now does a TLS lookup instead of a simple pointer check, but the cost of that should be negligible compared to the subsequent memory transfer).

Seems I can't reproduce the supposed error so, sorry, my bad... Something else must have been the issue. 😔

navidcy · 2021-04-14T03:19:36Z

I fixed the doctests.

However, there is an issue with Base.typename() which seems to have changed behavior in Julia v1.6; see JuliaLang/julia#40474

We should modify our show() overloads to convert output from, e.g., typename(CPU) to CPU.

navidcy · 2021-04-14T03:34:00Z

I suggest changing

Oceananigans.jl/src/Models/IncompressibleModels/show_incompressible_model.jl

Line 6 in 8f98ddb

print(io, "IncompressibleModel{$(Base.typename(A)), $(eltype(model.grid))}",

to

print(io, "IncompressibleModel{"*string(Base.nameof(A))*", $(eltype(model.grid))}",

ali-ramadhan · 2021-05-11T19:58:03Z

WENO is failing on the GPU because CUDAKernels.jl is trying to overdub ^ with CUDA.pow which doesn't exist anymore: JuliaGPU/KernelAbstractions.jl#249

Otherwise we should be pretty close to having all tests passing!

ali-ramadhan · 2021-05-18T23:36:56Z

Finally all tests pass 🎉 Thanks @navidcy and @vchuravy for all your help!

@glwagner Let me know when it would be a good time to merge this PR and tag a new release.

Ran the incompressible model benchmarks and in general it seems that with Julia 1.6 Oceananigans allocates more memory and is a bit slower on the CPU but a bit faster on the GPU.

Quick benchmark

Julia 1.6

Oceananigans v0.57.2
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V

                                            Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬──────────┬────────┬─────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │   memory │ allocs │ samples │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼──────────┼────────┼─────────┤
│           CPU │     Float64 │  32 │   4.996 ms │   5.047 ms │   5.113 ms │   5.770 ms │ 1.77 MiB │   2301 │      10 │
│           CPU │     Float64 │  64 │  34.951 ms │  35.967 ms │  36.414 ms │  41.417 ms │ 1.77 MiB │   2301 │      10 │
│           CPU │     Float64 │ 128 │ 301.074 ms │ 301.964 ms │ 302.498 ms │ 307.989 ms │ 1.77 MiB │   2301 │      10 │
│           CPU │     Float64 │ 256 │    2.894 s │    2.895 s │    2.895 s │    2.896 s │ 1.77 MiB │   2301 │       2 │
│           GPU │     Float64 │  32 │   2.859 ms │   2.923 ms │   3.025 ms │   3.987 ms │ 2.80 MiB │   6914 │      10 │
│           GPU │     Float64 │  64 │   2.912 ms │   3.101 ms │   3.308 ms │   5.368 ms │ 2.78 MiB │   6993 │      10 │
│           GPU │     Float64 │ 128 │   4.894 ms │   5.019 ms │   5.360 ms │   8.565 ms │ 2.80 MiB │   8667 │      10 │
│           GPU │     Float64 │ 256 │  33.569 ms │  36.266 ms │  36.029 ms │  36.883 ms │ 3.24 MiB │  37307 │      10 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴──────────┴────────┴─────────┘

Julia 1.5

Oceananigans v0.57.2
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)
  GPU: TITAN V

                                             Incompressible model benchmarks
┌───────────────┬─────────────┬─────┬────────────┬────────────┬────────────┬────────────┬─────────────┬────────┬─────────┐
│ Architectures │ Float_types │  Ns │        min │     median │       mean │        max │      memory │ allocs │ samples │
├───────────────┼─────────────┼─────┼────────────┼────────────┼────────────┼────────────┼─────────────┼────────┼─────────┤
│           CPU │     Float64 │  32 │   4.905 ms │   5.116 ms │   5.237 ms │   6.262 ms │  360.44 KiB │   2142 │      10 │
│           CPU │     Float64 │  64 │  34.327 ms │  36.419 ms │  36.188 ms │  37.507 ms │  360.44 KiB │   2142 │      10 │
│           CPU │     Float64 │ 128 │ 297.567 ms │ 299.404 ms │ 299.856 ms │ 303.081 ms │  360.44 KiB │   2142 │      10 │
│           CPU │     Float64 │ 256 │    2.786 s │    2.786 s │    2.786 s │    2.787 s │  360.44 KiB │   2142 │       2 │
│           GPU │     Float64 │  32 │   2.729 ms │   2.781 ms │   2.885 ms │   3.717 ms │  839.83 KiB │   7132 │      10 │
│           GPU │     Float64 │  64 │   2.894 ms │   3.055 ms │   3.136 ms │   4.046 ms │  879.45 KiB │   7124 │      10 │
│           GPU │     Float64 │ 128 │   4.107 ms │   4.836 ms │   4.890 ms │   6.086 ms │  964.73 KiB │   7142 │      10 │
│           GPU │     Float64 │ 256 │  21.667 ms │  35.642 ms │  34.245 ms │  35.746 ms │    1.11 MiB │   7134 │      10 │
└───────────────┴─────────────┴─────┴────────────┴────────────┴────────────┴────────────┴─────────────┴────────┴─────────┘

navidcy · 2021-05-19T00:12:36Z

OMG

navidcy

I approve so much

tomchor · 2021-05-19T01:20:25Z

Thanks @ali-ramadhan! That's amazing work.

Quick quick question: Should we be concerned about the GPU memory allocations? They're roughly 3x larger for 1.6 which is a pretty big difference! Especially considering the size limitations on GPUs.

navidcy · 2021-05-19T01:58:23Z

Thanks @ali-ramadhan! That's amazing work.

Quick quick question: Should we be concerned about the GPU memory allocations? They're roughly 3x larger for 1.6 which is a pretty big difference! Especially considering the size limitations on GPUs.

Actually that's a good point. I didn't notice this. I was so overwhelmed with the All checks have passed that I couldn't see straight.

.buildkite/pipeline.yml

ali-ramadhan · 2021-05-19T02:19:14Z

Quick quick question: Should we be concerned about the GPU memory allocations? They're roughly 3x larger for 1.6 which is a pretty big difference! Especially considering the size limitations on GPUs.

Ah so in the benchmarks those are just CPU memory allocations since BenchmarkTools.jl doesn't measure GPU allocations. CUDA.@time can measure GPU allocations but I haven't used it much on Oceananigans.

I don't think it's a cause for worry but it might be good to do some profiling at some point to figure out where the extra memory allocations are coming from.

Interestingly the benchmarks suggest that GPU models are actually a bit faster now 👀

glwagner · 2021-05-19T17:34:44Z

I'm not sure --- but it's possible that some GPU utilities not under Oceananigans.jl control incur memory allocations. I don't think we have much GPU-specific code in our codebase (except for pressure solvers...) Definitely a good thing to keep tabs on and open issues in the relevant packages if it affects the performance of our code.

francispoulin approved these changes Mar 29, 2021

View reviewed changes

ali-ramadhan mentioned this pull request Mar 29, 2021

NVPTX i128 support broken on LLVM 11 / Julia 1.6 JuliaGPU/CUDA.jl#793

Closed

navidcy mentioned this pull request Apr 5, 2021

Checkpoint is never created after upgrade (both with Julia 1.5 and 1.6) #1547

Closed

ali-ramadhan mentioned this pull request Apr 7, 2021

set!() using function fails in Julia 1.6 GPU #1555

Closed

ali-ramadhan added 6 commits April 9, 2021 15:13

Upgrade to Julia 1.6

66c4908

Update other environments

93c9b19

Update Dockerfile

92319eb

Dont test Int/UInt128

41eb770

Also dont test Int/UInt128 for setting ReducedFields

a055328

Upgrade to KernelAbstractions.jl v0.6.0

e956c8e

ali-ramadhan force-pushed the ali/julia-1.6 branch from 1dfaa98 to e956c8e Compare April 9, 2021 19:15

Fix CUDA allowscalar scope issue

e21d4e4

navidcy added 3 commits April 14, 2021 10:59

Merge branch 'master' into ali/julia-1.6

41c86b5

update doctests to work on Julia v1.6

398db85

Base.typename() changed behaviour in Julia; see JuliaLang/julia#40474

8f98ddb

some more doctests + fix/hack for Base.typename()

5c494cb

This was referenced Apr 14, 2021

CompatHelper: bump compat for "CUDA" to "3.0" #1565

Closed

CompatHelper: bump compat for "KernelAbstractions" to "0.6" #1524

Closed

full docs again

eec6f5d

ali-ramadhan added 4 commits May 11, 2021 11:19

Use Cassette.jl v0.3.6

0f0a4b3

Fix output writing tests

7680a10

Temporarily run GPU tests on Tartarus

c6828fd

Fix the temporary fix

c9515d0

ali-ramadhan added 8 commits May 11, 2021 18:03

Merge branch 'master' into ali/julia-1.6

7cb428a

Update to CUDAKernels.jl without CUDA.pow

63c93c8

Fix the temporary fix again

15d386c

Fix missing import statement

826e8d5

Update benchmarks Manifest.toml

92cb8e7

Upgrade to fixed up CUDAKernels.jl

eca75d7

Merge branch 'master' into ali/julia-1.6

1769afb

Update Manifest.toml

8ba7859

navidcy self-requested a review May 19, 2021 00:12

navidcy approved these changes May 19, 2021

View reviewed changes

navidcy reviewed May 19, 2021

View reviewed changes

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

Switch back to using Sverdrup for GPU CI

d71dc98

ali-ramadhan added 4 commits May 19, 2021 08:05

Merge branch 'master' into ali/julia-1.6

819870c

Update tests

f7d5e9d

Merge branch 'master' into ali/julia-1.6

6325232

Bump v0.58.0

11eb444

ali-ramadhan merged commit 9248437 into master May 19, 2021

ali-ramadhan deleted the ali/julia-1.6 branch May 19, 2021 17:44

ali-ramadhan mentioned this pull request May 21, 2021

Realistic parameter estimation: high-fidelity LES data versus 1D turbulence parameterization JuliaLabs/EnzymaticOcean#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to Julia 1.6 #1514

Upgrade to Julia 1.6 #1514

ali-ramadhan commented Mar 29, 2021

ali-ramadhan commented Mar 29, 2021

ali-ramadhan commented Mar 29, 2021

charleskawczynski commented Mar 29, 2021

jakebolewski commented Mar 29, 2021

ali-ramadhan commented Mar 29, 2021 •

edited

Loading

ali-ramadhan commented Apr 9, 2021

navidcy commented Apr 9, 2021 •

edited

Loading

maleadt commented Apr 10, 2021

navidcy commented Apr 11, 2021

navidcy commented Apr 14, 2021

navidcy commented Apr 14, 2021

ali-ramadhan commented May 11, 2021

ali-ramadhan commented May 18, 2021 •

edited

Loading

navidcy commented May 19, 2021

navidcy left a comment

tomchor commented May 19, 2021 •

edited

Loading

navidcy commented May 19, 2021

ali-ramadhan commented May 19, 2021

glwagner commented May 19, 2021

Upgrade to Julia 1.6 #1514

Upgrade to Julia 1.6 #1514

Conversation

ali-ramadhan commented Mar 29, 2021

ali-ramadhan commented Mar 29, 2021

ali-ramadhan commented Mar 29, 2021

charleskawczynski commented Mar 29, 2021

jakebolewski commented Mar 29, 2021

ali-ramadhan commented Mar 29, 2021 • edited Loading

ali-ramadhan commented Apr 9, 2021

navidcy commented Apr 9, 2021 • edited Loading

maleadt commented Apr 10, 2021

navidcy commented Apr 11, 2021

navidcy commented Apr 14, 2021

navidcy commented Apr 14, 2021

ali-ramadhan commented May 11, 2021

ali-ramadhan commented May 18, 2021 • edited Loading

Quick benchmark

Julia 1.6

Julia 1.5

navidcy commented May 19, 2021

navidcy left a comment

Choose a reason for hiding this comment

tomchor commented May 19, 2021 • edited Loading

navidcy commented May 19, 2021

ali-ramadhan commented May 19, 2021

glwagner commented May 19, 2021

ali-ramadhan commented Mar 29, 2021 •

edited

Loading

navidcy commented Apr 9, 2021 •

edited

Loading

ali-ramadhan commented May 18, 2021 •

edited

Loading

tomchor commented May 19, 2021 •

edited

Loading