-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary tracers #452
Arbitrary tracers #452
Conversation
…f tendencies for consistency
…onditions; cleans up forcing API
…r of tracers; making AMD closures accept arbitrary tracers
…cleanup of docstrings and overlapping AMD functionality
…e/Oceananigans.jl into arbitrary-tracers
…rs are consistent with the specified buoyancy model. Changes non-dimensional model to use a buoyancy tracer.
Unrolling inside loops may be the way to go then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Not much to disagree with so mostly just had minor comments.
-
Good to make sure model is consistent with
validate_buoyancy
. I'm okay with being able to change the name of temperature and salinity, although probably not something I used. On Slack you suggested changing:T
to:θ
which I'd be in favor of. -
Nice!
-
Seems reasonable. Side note: Would be nice to have something like
validate_boundary_conditions
in the future to make inferences like this safe by e.g. checking that if onex
boundary condition isPeriodic
then all otherx
BCs are also periodic. -
Out of curiousity, would a user ever want to run some tracers with LES and other tracers with constant diffusivity? Seems like it'll be a weird/inconsistent model.
Things to do before merging:
- The
TurbulenceClosures
module has been quite significantly refactored, but we don't have an LES regression test (kind of my fault, I'll open an issue). I wonder if it's worth rerunning the stratified Couette flow verification as a manual regression test? Or we can add an LES regression test in master and then make sure this branch produces the same numbers. - Need to figure out why the model has slowed down by a factor of ~2-3x. Will try benchmarking the
arbitrary-tracers-inner-loops
branch again.
Other comments:
- Running with multiple passive tracers seems like the case where using one array instead of two for the tendency (issue Can we inline source term calculation to save memory? #98) will be important.
…terior_source_terms
… Val type for tracer index
That's not out of the question. We could make the |
I've pushed some commits that fix the performance issue. This is done by using the The flux divergence for constant isotropic diffusivity, for example, is now Benchmarks on cyclops are now MasterOceananigans package status:
Status `~/.julia/environments/v1.1/Project.toml`
[9e8cae18] Oceananigans v0.11.1 #master (https://github.com/climate-machine/Oceananigans.jl.git)
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/.julia/packages/GPUArrays/fLiQ1/src/indexing.jl:16
Running static ocean benchmark: 32× 32× 32 (GPU, Float64)...
Running static ocean benchmark: 64× 64× 64 (GPU, Float64)...
Running static ocean benchmark: 128×128×128 (GPU, Float64)...
Running static ocean benchmark: 256×256×256 (GPU, Float64)...
──────────────────────────────────────────────────────────────────────────────────────
Static ocean benchmarks Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 82.9s / 0.58% 8.26GiB / 0.37%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────────────────────
256×256×256 (GPU, Float64) 10 287ms 59.8% 28.7ms 7.73MiB 24.5% 791KiB
32× 32× 32 (GPU, Float64) 10 75.9ms 15.8% 7.59ms 8.28MiB 26.3% 848KiB
128×128×128 (GPU, Float64) 10 66.7ms 13.9% 6.67ms 7.79MiB 24.7% 798KiB
64× 64× 64 (GPU, Float64) 10 50.6ms 10.5% 5.06ms 7.73MiB 24.5% 791KiB
──────────────────────────────────────────────────────────────────────────────────────
|
Here are the results of Outer loops is faster in all cases. PS:
|
Nice! Can post 32^3 results just to make sure we don't kill small models? I think its probable they will be comparable. |
This is great news btw! |
Hmmm, interestingly in your benchmarks the 128^3 models were faster than the 32^3 models! Benchmark results from 32^3 models: For 1-2 tracers inner and outer loops seem the same, but for many tracers, inner loops seems better now on the GPU. For the CPU, outer loops is still much better. Not sure what to make of this. We're definitely not saturating the GPU so maybe it's not worth considering the case of small GPU models too much? Outer loops seems like the way to go? Outer loops (this PR)
Inner loops
|
The stratified Couette flow verficiation test was failing (after merging this PR into master) because I've fixed this. Hopefully tests will pass now. |
…--- cannot directly use checkpointing because underlying model type signatures have changed; must manually load fields as in other regression tests
@ali-ramadhan I've addressed your concerns: we now have a LES regression test, and some changes to inner kernels means that performance is now improved by this PR over master (for large models). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Arbitrary tracers Former-commit-id: e60a723
This PR introduces 'arbitrary tracer' functionality to Oceananigans.
With this PR, the user is now able to specify which tracers they would like to include in their simulation via the
Model
constructor:User-facing considerations
The specified tracers must be consistent with the specified buoyancy model. This consistency is ensured by
validate_buoyancy
in theModel
constructor. ForBuoyancyTracer
, the user must specify a tracer named:b
. ForSeawaterBuoyancy
, the user must specify tracers named:T
and:S
. (In the future, we could also allow the names of these buoyancy, temperature, and salinity tracers to be specified in the constructor for the respective buoyancy models, egSeawaterBuoyancy(temperature=:T, salinity=:S)
.)The constructor for
ModelForcing
(which will have to be merged with Forcing API refactor #444 when it is approved) now allows the user to specify forcing for any of the tracer fields. If the user specifies a forcing for a tracer field that does not exist, an error is thrown. If a user does not specify a forcing for some tracer field, a default zero forcing is applied.The same logic for
ModelForcing
applies toBoundaryConditions
/SolutionBoundaryConditions
. When boundary conditions for a tracer field are unspecified, a default tracer boundary condition is inferred from the x-velocity field: tracers either inheritPeriodic
boundary conditions, or are given aFlux, Nothing
boundary condition for all other cases.For closures, all tracer-diffusivity-related fields permit users to specify either:
NamedTuple
with fields for every tracer.In the future, we could possibly implement some mixed behavior where the user specifies both a default and particular values for tracer diffusivity. This would be useful in the (possibly rare) use-case of a large number of tracers with the same diffusivity, but where one or two of them require a special, different diffusivity. I am not sure this API is necessary so I left it for future work.
Diffusivity-like fields include:
κ
(constant component of tracer diffusivity) forConstantIsotropicDiffusivity
,AnisotropicMinimumDissipation
, andConstantSmagorinsky
Pr
(turbulent Prandtl number) forConstantSmagorinsky
κh
andκv
forConstantAnisotropicDiffusivity
.Internal algorithmic considerations
This implementation includes a major refactor of the time-stepping algorithm. In particular, kernels are launched for each tracer for all operations that involve tracers. This differs from the previous pattern, in which a single kernel was called in some cases (for example, to update the velocity and tracer fields, or to store previous source terms). The reason for this change is because I ran into some issues (dynamic function invocations) using
ntuple
to unroll a loop over tracers inside the kernel. In addition, I think that with a large number of tracers the kernels may become too large and their performance could degrade (but I'm not sure).This refactoring of the algorithm means we need to
@ali-ramadhan, can you help with this?
If there are changes in model performance, we can work on unrolling loops over the tracer fields inside our kernels. This is probably possible; it just requires some debugging. We would probably also want to make sure that this doesn't lead to poor performance for up to O(10) tracers.
If any of this PR is not satisfactory, I'm happy to work on it and iterate until the PR is in mergeable shape.
Resolves #25
Resolves #430
Partially addresses #448