Release v1.0.0 · WaterLily-jl/WaterLily.jl

WaterLily v1.0.0

Upgraded the solver for backend agnostic execution:

New version of @loop macro which integrates KernelAbstractions.jl (KA) to run multi-threaded on CPUs and GPUs. This replaces the @simd version of @loop as well as previous multi-threading code. Each @loop <expr> over <I in R> is expanded into a @kernel function and then run on the backend of the first variable in <expr>.
BREAKING CHANGE: Many high-level functions don't compile or run correctly or run much slower than expected on GPUs. Things as simple as sum or LinearAlgebra:norm2. These have been replaced in the code-base with lower-level functions, but unfortunately, users will need to take extra care when defining things like AutoBody(sdf, map) functions.
PERFORMANCE NOTE: KA allocates to the CPU on every loop. Reverting @loop to use @simd restores a perfectly non-allocating sim_step!. We tried other tools like Polyester.jl which had better multi-threading performance for small simulations, but large simulations is where we need the speed-up and so we chose KA.
PERFORMANCE NOTE: @loop is not fully optimized. For example, there is an execution overhead for each @loop call on GPUs. A few of the loops have been combined to help reduce this overhead, but many more would require major refactoring or modification of the @loop macro. Despite this we benchmarked up to 182x speed-up with GPU execution.
BREAKING CHANGE: The Simulation constructor arguments have changed. dims is now the internal field dimension (L,2L) not (L+2,2L+2), and U must now be an NTuple.
The Simulation constructor also take a new mem=Array argument which can be set to CUDA.CuArray or AMDGPU.ROCArray to set-up simulations on GPUs. The Flow and Poisson structs now use AbstractArrays for all fields to accommodate those arrays types.
DEFAULT CHANGE: sim_step!(remeasure=true) is now the default as that is the safer (but slower) option.
Poission now shares memory for the L, x, and z fields with Flow to reduce the memory footprint. The z field holds the RHS vector and is mutated by solve!.
The SOR! and GS! smoothers are not thread-safe, and have been replaced with a Jacobi preconditioned conjugate-gradient smoother held in new routines Jacobi! and pcg!.
PERFORMANCE NOTE: Because of the poor-scaling on small fields, the number of multi-grid levels has been set to a default maxlevels=4. The optimal number of levels is likely to be simulation and backend dependent.
PERFORMANCE NOTE: pcg! requires a lot of inner products, which are somewhat slow. Switching to the data-driven approximate inverse smoother may be beneficial in the future.
Because of the poor-scaling on small fields, the multi-grid-style recursive apply_sdf! has been replaced with measure_sdf! which simply @loops body.sdf().

There have also many changes to the code outside of src to support the upgrade:

The testing cases have been massively expanded. In particular, there are tests for every major function on CPU, CUDA, and AMDGPU backends.
The benchmarks have been massively expanded. In particular, benchmarks for each function within mom_step! as well as the 3D TGV and donut cases can be compared against previous commits, including pre 1.0 versions.
The examples have been brought up-to-date, including GPU execution for the 3D examples and a new jelly fish example demonstrating a deforming geometry.

The only (intentional) modelling change was to add correct_div!(σ) to Body.jl to enable the deformable jelly fish example. This has nothing to do with the backend upgrade and should have been added to master and then merged in - but it wasn't.

Diff since v0.2.4

Closed issues:

Use KernelAbstractions for loops? (#18)
Diverging pressure for rotational motion (#36)

Merged pull requests:

add function addBody (#35) (@Blagneaux)
Update for new Makie (#37) (@asinghvi17)
Boundary conditions kernel and dependencies (#38) (@b-fg)
Flow.jl MWE (#39) (@b-fg)
Moved creation of boundary conditions array out of Flow (#40) (@b-fg)
Cleaned up CUDAEnv/Flow.jl and fixed allowscalar in tests. Fixed BCs too. (#41) (@b-fg)
Started porting Flow.jl using KernelAbstractions.jl [WIP] (#43) (@b-fg)
Changed from cu to CuArray the way to create arrays in GPU memory. (#45) (@b-fg)
Added CUDAEnv/benchmark.jl where it breaks down mom_step. (#46) (@b-fg)
mom_step benchmark (#47) (@b-fg)
Added AMDGPU package (#48) (@b-fg)
Update to 1.0 (#49) (@weymouth)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0

WaterLily v1.0.0

Contributors