Design

Aims

separate data layout from interface
make it easier to use smaller pieces of code
make the code / physics easier to understand
separate vertical and horizontal where possible:
- horizontal:
  - will be spectral element
  - use quad elements with LGL quadrature
  - duplicate values at colocated nodes (supporting both DG and CG)
  - either regular grid (LES box), cubed sphere (GCM) or potentially arbitrary conformal mesh (e.g. for ocean and land)
    - support for loading an arbitrary 2D grid from standard mesh formats (GMSH)
  - can be distributed
- vertical
  - nodes will be vertical (i.e. radial on a sphere) aligned
  - should support a variety of discretizations (e.g. spectral element, FV, staggered grids)
  - support both duplicated or shared nodes for CG
  - implicit solves in the vertical
Support for 2D horizontal surfaces (e.g. boundary conditions, barotropic modes in ocean models, etc.)
Make interface more functional:
- i.e. users should write functions that receive and return immutable objects, rather than in-place modification

First steps plan

Data backends for 3D/2D/column/pancake

StructArray or roll our own?
Unit tests + GPU

Horizontal grid structure

topology
metrics
DSS

Horizontal operators

horiz gradient
horiz divergence
broadcasting

Basic examples

Poisson equation / Heat equation
- regular 2D mesh
- irregular 2D mesh
Wave equation

Column grid structure

1D tests of column
3D tests of both combined

Basic considerations

We only need to consider 2D topologies
- i.e. connectivity is the same for all elements in a stack
Aim for deep atmosphere
- optional support for shallow atmosphere shouldn't be too difficult: modify metric terms and Coriolis terms
Remainder model
- becomes a splitting of the vertical
- needs to only be vertical-only

Interfaces

column(values, i, j, h)
- view that can be indexed by [k, v], returning a struct
- used for everything vertical (divergence, gradient, integration, implicit solves, search up/down/bisect, interpolated view)
- for 3d fields, this will act like a function
pancake(values, k, v, h)
view that can be indexed by [i,j], returning a struct
face(pancake, face#) => view indexed by [i] (for horz num fluxes)
opposing(face) will give the face opposing the element
elements (read only)
- vizualization, VTK export
key assumptions:
- columns never access horizontal data other than gradients, etc, which can be computed before being passed in
- horizontal terms only act via derivatives (divergence / gradients / curl)
broadcasting (incl. 2D horz -> 3D fields), lazy variant
- reduction (integrate 3D -> 2D)

potential layouts

(some we will want, flexibility to add more)

GPU: thread model 1 thread per (i,j) within an element
CPU: ?
- 1 thread per pancake
- 1 thread per column
i,j horizontal node indices
k vertical node index
field: field index (to be figured out)
v: vertical element index
h: horizontal element index (i.e. uniquely identifies an element stack) (a) integer for arbitrary mesh - how to handle real vs ghost elements? (b) (hx,hy) for regular grid (LES) (c) (hx,hy,hface) for cubed sphere

TODO

How to handle (expensive) column physics on duplicated columns?
primary -> secondary sharing mechanism:
duplicate work
duplicate across processes
ultimately depends on threading model
How to handle columns that use different vertical staggering?
Interpolation:
1. what is h?
- regular grid/cubed sphere we can compute a direct lookup
- arbitrary mesh build some quad tree
1. what is v?
- linear lookup (ideally)
1. compute coordinate in reference element
2. barycentric interpolation
  - weighted average of all nodes in an element
  - if horizontal-only, can do it via a pancake

Communication

only certain fields require communication (e.g. tendencies, grads in DG)
can reuse communication buffers for efficiency
load data directly from recv buffers
data buffer

DG

opposing face 4 possible faces * 2 orientations

for a ghost element has only one face store whole faces contiguously (duplicate shared nodes if necessary)

elemtoelem[f,h] => if > 0, real element, if <0 then its the ghost face #
elemtoface[g,h] => if real element, 1:4 positive orienataion, -4:-1 flipped orientation; ghost face 1/-1

CG

store columns contiguous, based on the logical order for the sender

don't duplicate nodes unique numbering of columns on a given rank
if positive, real column, use linear indexing c = NqNq(h-1) + Nq*(j-1) + i
if < 0, ghost column, look up in the receive buffer

Stuff

Data: a value stored at each node
Grid: geometric information
- physical location
- metric terms
- face normals (need for DG/boundaries); can be computed from metrics if required
- stores its information using a Data object
Topology (move out of Grid)
- connection between horizontal
  - DG opposing horizontal faces and orientation
  - CG which horizontal nodes are collacated (0 for internal, 1 for non-vertex faces, 2+ for vertices)
- communication / rank partitioning
Field: Data + Grid
- map
- vertical reduce (integral)
- broadcast + lazy broadcast (e.g. combine 3D field, 2D field, scalar)
- grad, div, curl (split into horizontal/vertical)
- use a StructArray (or something that acts like an array of structs)
  - main limitation: can only be something take spatial derivatives of
  - requires scalar multiplication + addition

Fields

idea field element type can be one of:

real scalars (floating point numbers)
vector coefficient types (basically coeffients of a vector space)
- CartesianVector(x1,x2,x3)
- SphericalCartesian(u,v,w)
- Contravariant/Covariant wrt the reference element
- Contravariant1/Contravariant2/Contravariant3 projections onto contravariant
- to convert between these, you need geometry information SphericalCartesian(CartesianVector(1,2,3), geom)
tensors?
composite object (tuples or named tuples of the above)
- support for Tuples/NamedTuples
- add support for custom structs?
- should not support vector coefficients to avoid e.g. adding two SphericalCartesian objects at different locations

Composable operations

Can we compose kernels by combining multiple operations, such as broadcasting and tensor product operations.

gradient:
1. extract/ compute X to shared memory
2. synchronize
3. tensor product
```
A1 = (D ⊗ I) * X
A2 = (I ⊗ D) * X
```
4. combine
```
∂ξ∂x .* Covariant(A1,A2)
```

strong divergence

map to shared memory

C1 = J .* contravariant1(V)
C2 = J .* contravariant1(V)

synchronize

tensor on each matrix

D1 = (D ⊗ I) * C1
D2 = (I ⊗ D) * C2

combine
```
(D1 .+ D2) ./ J
```

filtering / interpolation
1. compute/load to shared memory
2. synchronize
3. tensor dimension 1 to shared meory
```
A = (F ⊗ I) * X
```
4. synchronize
5. tensor dimension 2
```
Y = (I ⊗ F) * A
```

Next steps

add (optional) surface geometry data structure to mesh
- IFH data structure: separate internal/boundary
- DG we need at all faces
- CG we only need at boundary faces
horizontal boundary conditions for CG/DG
- effectively we need a way to prescribe boundary fluxes
Additional operators

Curl
Hyperdiffusion
- scalars, vectors, tensors

high-level CG/DG divergence function: would specify
- input fields
- flux function
- numerical flux function for DG
- boundary fluxes
- additional options
  - integration mesh for overintegration
  - hyperdiffusion
Mesh warping (dependent on 1)

flat horizontal to check metric terms
- e.g. apply a sinusoid to x1
Cubed-sphere mesh
- equiangular
- S2 geometry

Vertical column standalone (dependent 1) (Charlie, Dennis)
- add Column data structure VF to DataLayouts
- finite difference mesh
  - heights of edges
  - heights of centers (can be computed)
  - number of ghost points top & bottom
  - handle flipped indexing ocean/land top down, atmos bottom up
- fields stored at centers or faces
- stencils for interpolation / gradients
- diffusion example
- aim for Ekman layer, stable boundary layer (Ignacio, Yair)
Boundary conditions for the vertical
- Identify people (Charlie, ?)
- Higher-level interface
- Identify required functionality
Integrated vertical/horizontal model
- hybrid mesh
- implicit solver for multiple columns
Performance and Parallelism

Profiling/track allocations
CPU threading
- may need to disable GC
- volume: can @threads for h = 1:Nh
- faces: will need to avoid race conditions
  - partition into non-overlapping face groups
  - duplicate work (matching current ClimateMachine model)
GPU support
- CG Bicleky jet (Sriharsha)
Distributed Memory / MPI
- what is the object that determines the distribution mechanism
  - topology at the lowest level
- only need to worry about horizontal
- Ghost exchange
  - DG ghost faces
  - CG ghost columns
  - (optional) Reusing shared buffers (e.g. for multistage RK)
  - (optional) Index directly into receive buffer
- Allreduce to compute norms
- Broadcast

Test cases to add

vector invariant bickley jet
wave / advection-diffusion
isentropic vortex
non Cartesian vectors

Initial GPU kernel development:

- Get bickley jet working with CG
    - Need specalized operators for DSS, divergence kernels
- Get bickley jet working with DG
    - Need specialized operators for DG weak divergence, numerical fluxes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design

Design

Aims

First steps plan

Basic considerations

Interfaces

potential layouts

TODO

Communication

DG

CG

Stuff

Fields

Composable operations

Next steps

Clone this wiki locally