-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Julia 1.6 #1514
Upgrade to Julia 1.6 #1514
Conversation
Everything looks good on the CPU but the GPU unit tests segfault when testing field setting (tried debugging but can't figure out why). All the other GPU tests pass although CI seems much slower for GPU tests (~3x slower?). Could be related to segfault in CliMA/ClimateMachine.jl#2146 ? @charleskawczynski @jakebolewski were you able to figure out why it was segfaulting? |
Okay so the segfault was because we were trying to set a field using an It's not an important test so I removed it but it's a little weird that it just started failing since the test has been in since Oceananigans.jl v0.1.0... |
😞
Not yet, I'm going to try looking into it. @jakebolewski suggested first upgrading some packages first-- so I'm doing that now. |
hmm good to know about possible performance issues, we should be on the lookout. I suspect that (one of the) segfaults is an OOB error that got masked by |
I opened an issue about the |
1dfaa98
to
e956c8e
Compare
Hmmm a lot of failures due to CUDA scalar We could take this opportunity to get rid of all scalar operations in the tests and just use Maybe new CUDA scalar operations are hurting performance and that's why GPU CI has slowed down? |
@ali-ramadhan, I found the same issue in GeophysicalFlows.jl CUDA.@allowscalar newarray = [i*j for i=1:3, j=1:4] but now that's not possible! Instead you need to do newarray = CUDA.@allowscalar [i*j for i=1:3, j=1:4] Perhaps the GPU-connoisseurs might have some more insight on this? cc @maleadt, @vchuravy |
That probably shouldn't have changed, can you file an issue on CUDA.jl/GPUArrays.jl? I'll have a look next week. The only change to |
Seems I can't reproduce the supposed error so, sorry, my bad... Something else must have been the issue. 😔 |
I fixed the doctests. However, there is an issue with We should modify our |
I suggest changing
to print(io, "IncompressibleModel{"*string(Base.nameof(A))*", $(eltype(model.grid))}", |
WENO is failing on the GPU because CUDAKernels.jl is trying to overdub Otherwise we should be pretty close to having all tests passing! |
Finally all tests pass 🎉 Thanks @navidcy and @vchuravy for all your help! @glwagner Let me know when it would be a good time to merge this PR and tag a new release. Ran the incompressible model benchmarks and in general it seems that with Julia 1.6 Oceananigans allocates more memory and is a bit slower on the CPU but a bit faster on the GPU. Quick benchmarkJulia 1.6
Julia 1.5
|
OMG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve so much
Thanks @ali-ramadhan! That's amazing work. Quick quick question: Should we be concerned about the GPU memory allocations? They're roughly 3x larger for 1.6 which is a pretty big difference! Especially considering the size limitations on GPUs. |
Actually that's a good point. I didn't notice this. I was so overwhelmed with the |
Ah so in the benchmarks those are just CPU memory allocations since BenchmarkTools.jl doesn't measure GPU allocations. I don't think it's a cause for worry but it might be good to do some profiling at some point to figure out where the extra memory allocations are coming from. Interestingly the benchmarks suggest that GPU models are actually a bit faster now 👀 |
I'm not sure --- but it's possible that some GPU utilities not under Oceananigans.jl control incur memory allocations. I don't think we have much GPU-specific code in our codebase (except for pressure solvers...) Definitely a good thing to keep tabs on and open issues in the relevant packages if it affects the performance of our code. |
Just updating the
Manifest.toml
and the Buildkite Julia version number to see if everything passes. I've been using 1.6 fine on my laptop so this should test 1.6 + GPUs.