Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of gridded on an Arakawa-C grid meteorological model output #57

Open
gewitterblitz opened this issue Aug 11, 2020 · 2 comments

Comments

@gewitterblitz
Copy link

Hi,

I am running a 3D non-hydrostatic atmospheric model (called ARPS) which uses an Arakawa C-grid domain and has terrain following coordinates in the vertical dimension. Here's an example of the grid's x-axis representation:

image

More info about model grid representation can be found on page 2 onwards in this pdf (http://www.caps.ou.edu/ARPS/download/code/pub/ARPS.docs/ARPS4DOC.PDF/arpsch4.pdf).

Looking at the schematic above, I would assume that the vector variable arrays would have a length 1 more than the state scalar variables which holds true even if the physical domain size is considered (e.g., vector variables have a size of (nx-1) - 2 = nx-3, therefore, scaler variables should have (nx-2) - 2 = nx-4).

However, when I read in the hdf4 format model output file using pynio and some legacy code to retrieve 2D/3D variables, scalar and vector variables all have the same shape, e.g.,

print(f'shape of u (vector variable) is {u.shape}')
print(f'shape of pressure variable (scalar) is {p.shape}')

#prints 
#shape of u (vector variable) is (53, 33, 53)
#shape of pressure variable (scalar) is (53, 33, 53)

How should I start converting this dataset to meet gridded's requirements? Should I choose a padding of "low" or "high" while creating a grid? (as shown in Fig. 1 at this link: https://sgrid.github.io/sgrid/)

The WRF example on S-Grid conventions webpage expects the dimension of staggered axes to be 1 short than the normal axes which doesn't seem to be the case in my example. Moreover, I only one variable each for coordinates in x,y,z dimensions which I think already take into account the staggering. For example, this is x variable (coordinates along x-axis):

image

I tried using xgcm too and here's an example plot of the u values interpolated at scalar points (let's store the interpolated values in the variable us_xgcm). Notice the shape of us_xgcm variable (same as the shape of u variable).

image

And here's the comparison of us_xgcm with us (which is the legacy code for interpolating u to scalar points):

image

Here's my attempt for interpolation using gridded:

image

Simple difference comparison between xgcm and gridded implementation throws error because gridded produces an array with length 1 less than the original length.

image

And, if calculate the difference by ignoring the last row/column, even then there exists some non-negligible resultant:

image

I would really appreciate if someone can help with diagnosing the problem or guiding with the correct implementation of the package.

Thanks and please let me know if you need additional info!

@jay-hennen
Copy link
Contributor

Judging by the x variable you showcased, it looks like you want 'low' padding. That means the 'low' (index 0) value is considered padding.

If you have write control over the data files, adding in a grid topology variable as demonstrated in the SGrid spec is probably the best way to make the file easily digestible by gridded. If this isn't an option, then you'll probably need to pass padding parameters to the constructor function.

In gridded.grids.Grid_S (staggered grid object) the dimensions of the 'node' grid are considered the reference dimension, and are never padded. In the ARPS case, this would correspond to the u/v vector grid. Therefore when you specify the scalar grid ('center' grid) you need to specify padding. Since the scalar variable shape is the same as the vector variable shape, and based on your print of the x variable, it looks like you want 'low' padding.

Here's a first-try code snippet of how I would introduce your file to gridded, based on what I can glean from your post. You may need to set up a gridded.time.Time object depending on how well it auto-detects that dimension. If you have a depth dimension, I highly recommend looking directly at the code in depth.py to determine which object best represents your data, and how to set it up:

import netCDF4  
from gridded.grids import Grid_S  
from gridded.variable import Variable
ds = netCDF4.Dataset("Your File.nc") 

// Example of setting up a Grid_S object, with 'center' or 'scalar' padding. 
arps_grid = Grid_S.from_netCDF(  
    dataset = ds,
    node_lon = ds['vector_lon_var'],
    node_lat = ds['vector_lat_var'],
    center_lon = ds['scalar_lon_var'],
    center_lat = ds['scalar_lat_var'],
    center_padding='low'
)

// Example setting up a Variable. In this case, we have a prepared Grid_S, so provide it. The time attribute
// in theory would be auto-detected and imported properly.
arps_uvel_variable = Variable.from_netCDF(
    dataset=ds,
    varname = 'u',
    grid = arps_grid
)

Hope that helps as a start. If you've got further questions feel free to ask.

@gewitterblitz
Copy link
Author

Hi Jay, thanks for the response. I realized I should have sent you the display output of both x and xs (x coordinates at scalar points) as calculated from legacy code.

The model output has a slightly different representation of coordinates for grid centers compared to the schematic shown in my top comment (weird?). Here's the comparison of x and xs (since x[0] < xs[0], therefore, x should be the coordinates for cell faces).

image

I think the padding should be high instead. In any case, this would cause problems when we perform some vector calculus operations using the variable, right?

If you look at the schematic of Arakawa-C grid used in ARPS model (figure attached in the top comment), you will notice that the length of scalar points is nx-1 while vector points have a length of nx. Moreover, the last coordinate (with the highest value) should belong to the vector grid points. However, x and xs share the same highest value (153000). Maybe this is a standard technique adopted for scientific computations and I am just not aware of it?

Lastly, since my file is a hdf4 formatted, I am reading in data using pynio and susbequently creating an xarray dataset to begin with gridded. So, this is how I am doing it (please note that I don't have lat/lon coords in my model output, so I am working with x (vector_lon_var) and xs (scalar_lon_var) ):

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants