Skip to content

v1.0.0

Compare
Choose a tag to compare
@github-actions github-actions released this 26 Apr 11:10
· 372 commits to master since this release
34a09df

WaterLily v1.0.0

Upgraded the solver for backend agnostic execution:

  • New version of @loop macro which integrates KernelAbstractions.jl (KA) to run multi-threaded on CPUs and GPUs. This replaces the @simd version of @loop as well as previous multi-threading code. Each @loop <expr> over <I in R> is expanded into a @kernel function and then run on the backend of the first variable in <expr>.
  • BREAKING CHANGE: Many high-level functions don't compile or run correctly or run much slower than expected on GPUs. Things as simple as sum or LinearAlgebra:norm2. These have been replaced in the code-base with lower-level functions, but unfortunately, users will need to take extra care when defining things like AutoBody(sdf, map) functions.
  • PERFORMANCE NOTE: KA allocates to the CPU on every loop. Reverting @loop to use @simd restores a perfectly non-allocating sim_step!. We tried other tools like Polyester.jl which had better multi-threading performance for small simulations, but large simulations is where we need the speed-up and so we chose KA.
  • PERFORMANCE NOTE: @loop is not fully optimized. For example, there is an execution overhead for each @loop call on GPUs. A few of the loops have been combined to help reduce this overhead, but many more would require major refactoring or modification of the @loop macro. Despite this we benchmarked up to 182x speed-up with GPU execution.
  • BREAKING CHANGE: The Simulation constructor arguments have changed. dims is now the internal field dimension (L,2L) not (L+2,2L+2), and U must now be an NTuple.
  • The Simulation constructor also take a new mem=Array argument which can be set to CUDA.CuArray or AMDGPU.ROCArray to set-up simulations on GPUs. The Flow and Poisson structs now use AbstractArrays for all fields to accommodate those arrays types.
  • DEFAULT CHANGE: sim_step!(remeasure=true) is now the default as that is the safer (but slower) option.
  • Poission now shares memory for the L, x, and z fields with Flow to reduce the memory footprint. The z field holds the RHS vector and is mutated by solve!.
  • The SOR! and GS! smoothers are not thread-safe, and have been replaced with a Jacobi preconditioned conjugate-gradient smoother held in new routines Jacobi! and pcg!.
  • PERFORMANCE NOTE: Because of the poor-scaling on small fields, the number of multi-grid levels has been set to a default maxlevels=4. The optimal number of levels is likely to be simulation and backend dependent.
  • PERFORMANCE NOTE: pcg! requires a lot of inner products, which are somewhat slow. Switching to the data-driven approximate inverse smoother may be beneficial in the future.
  • Because of the poor-scaling on small fields, the multi-grid-style recursive apply_sdf! has been replaced with measure_sdf! which simply @loops body.sdf().

There have also many changes to the code outside of src to support the upgrade:

  • The testing cases have been massively expanded. In particular, there are tests for every major function on CPU, CUDA, and AMDGPU backends.
  • The benchmarks have been massively expanded. In particular, benchmarks for each function within mom_step! as well as the 3D TGV and donut cases can be compared against previous commits, including pre 1.0 versions.
  • The examples have been brought up-to-date, including GPU execution for the 3D examples and a new jelly fish example demonstrating a deforming geometry.

The only (intentional) modelling change was to add correct_div!(σ) to Body.jl to enable the deformable jelly fish example. This has nothing to do with the backend upgrade and should have been added to master and then merged in - but it wasn't.

Diff since v0.2.4

Closed issues:

  • Use KernelAbstractions for loops? (#18)
  • Diverging pressure for rotational motion (#36)

Merged pull requests:

  • add function addBody (#35) (@Blagneaux)
  • Update for new Makie (#37) (@asinghvi17)
  • Boundary conditions kernel and dependencies (#38) (@b-fg)
  • Flow.jl MWE (#39) (@b-fg)
  • Moved creation of boundary conditions array out of Flow (#40) (@b-fg)
  • Cleaned up CUDAEnv/Flow.jl and fixed allowscalar in tests. Fixed BCs too. (#41) (@b-fg)
  • Started porting Flow.jl using KernelAbstractions.jl [WIP] (#43) (@b-fg)
  • Changed from cu to CuArray the way to create arrays in GPU memory. (#45) (@b-fg)
  • Added CUDAEnv/benchmark.jl where it breaks down mom_step. (#46) (@b-fg)
  • mom_step benchmark (#47) (@b-fg)
  • Added AMDGPU package (#48) (@b-fg)
  • Update to 1.0 (#49) (@weymouth)