Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration with xgcm.Grid #13

Open
rabernat opened this issue Jan 19, 2018 · 12 comments
Open

integration with xgcm.Grid #13

rabernat opened this issue Jan 19, 2018 · 12 comments

Comments

@rabernat
Copy link

xESMF is awesome!

I have spent quite a bit of effort in creating an xarray-friendly data model for finite volume GCM data in xgcm:
http://xgcm.readthedocs.io/en/latest/grids.html

It seems like this data model could help resolve some other issues in xESMF, in particular, the generalization of coordinate names (#5) and the specification of cell boundaries vs centers.

I wanted to open this issue just to discuss how we can make these two packages more mutually compatible and potentially avoid solving the same problems twice.

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Jan 19, 2018

Thanks for bringing this up! A good way to track cell boundaries will be very useful for xESMF. In xgcm's examples I only notice rectilinear grids. Do you have examples for curvilinear grids? More specifically, I am very interested in those functionalities (all assume general curvilinear grids):
(1) Infer cell corners from cell centers, and vice versa. I mainly need center->bound, because most NetCDF files do not contain boundary coordinate values. Bound->center is also useful (for double checking), and much easer to implement.
(2) A self-consistent grid object built upon (1), containing both center and bound information. xESMF currently doesn't check if lon and lon_b are consistent. If users calculate lon_b incorrectly, conservative regridding will be wrong while other methods will still be correct. This doesn't feel very robust.
(3) The ability to slice a grid object, built upon (2). Say I have a 100x200 grid (with 101x201 bounds), I hope to get a 10x20 subgrid, plus the 11x21 bound, by simply indexing grid[0:10, 0:20]

All those are only for conservative regridding. But conservation is so important in Earth science and deserves special care. A robust algorithm for (1) might be done in the Cartesian space instead of spherical coordinate, to avoid the polar singularity. This is also how ESMF calculates regridding weights. At some point I might code up this center-bound switching on my own, but if xgcm supports that it would be a great time saving!

@JiaweiZhuang
Copy link
Owner

Related issue: pydata/xarray#1475. This is really my major, perhaps only, complaint about xarray...

@rabernat
Copy link
Author

rabernat commented Jan 19, 2018

Not all of the features you want are already part of xgcm right now. But they are all things I also want in xgcm. So I am trying to convince you to collaborate with me!

In xgcm's examples I only notice rectilinear grids.

It works with logically rectangular grids. The underlying geometry can be curvilinear. And once xgcm/xgcm#88 is merged, it will support exchanging data in cubed-sphere-type geometries.

(1) Infer cell corners from cell centers,

Yes, this is currently supported in the grid generation module.

(2) A self-consistent grid object built upon (1)

Right now, the coordinates are generated, but they are not kept in the xgcm.Grid object. They live in the parent xarray dataset. We could make this coupling more explicit and better serve the needs of both projects.

(3) The ability to slice a grid object

What you can do now is slice an xarray dataset first and then generate an xgcm.Grid from the sliced dataset. Do you think that would meet your needs?

In general, there is a lot of room to improve the autogeneration of finite volume grids within xgcm. But I think this problem is distinct from, and more general than, regridding. We have a need for many of the same things, so let's work together!

@JiaweiZhuang
Copy link
Owner

It works with logically rectangular grids. The underlying geometry can be curvilinear.

This is all I need. And I don't think xarray's data model can go beyond logically rectangular (such as MPAS's hexagonal mesh).

Yes, this is currently supported in the grid generation module.

I would appreciate a simple demo for inferring the bounds of curvilinear grids. Can be as simple as this artificial grid in xESMF doc. More challenging grids would be better, like the rasm demo data which span over the pole.

Right now, the coordinates are generated, but they are not kept in the xgcm.Grid object. They live in the parent xarray dataset. We could make this coupling more explicit and better serve the needs of both projects.

What you can do now is slice an xarray dataset first and then generate an xgcm.Grid from the sliced dataset. Do you think that would meet your needs?

This is exactly what I like. I am interested in xgcm's boundary utility, but the input arguments for xesmf.Regridder() should still be the basic xarray.Dataset, not xgcm.Grid. I want to make the API absolutely simple&intuitive and avoid adding one more level of complexity. So I would consider using xgcm internally or in auxiliary functions.

In any case, I definitely like a better boundary handling. I'll only need a stable&robust center-bound switching functionality from xgcm. There should be no dependency on the vector calculation part.

@rabernat
Copy link
Author

rabernat commented Aug 22, 2018

Just an update: we now have documentation in xgcm about the grid autogeneration. This is mostly @jbusecke's work:
https://xgcm.readthedocs.io/en/latest/autogenerate_examples.html

Very much a work in progress, but it is a starting point.

@jbusecke
Copy link

Hi @JiaweiZhuang, thanks for the interest in the autogenerate module.
I will look into the examples you suggested above soon. I could add them to the docs aswell, but first I want to see if the results are satisfactory.

I will report back when I get to this (might be a few days, since I am traveling).

@jbusecke
Copy link

Ok I did a quick test here.

In short, there seem to be problems at the pole when using the RASM example from xarray.
I was not able to fix it. Nor do I really know if this is a problem with the created bounds or the way cartopy plots them.

@JiaweiZhuang do you have a similar dataset that actually has boundary coordinates? That way I could compare the generated output (like I did for the simple example).

On a related note: @rabernat this exercise exposed a clunkyness of the autogenerate module. We do not need to create any coordinates within that function. Once the dimensions are generated, everything can be done 'as normal' with xgcm. Will refactor this soon.

@JiaweiZhuang
Copy link
Owner

@jbusecke Thanks very much! Great to see the first example on a 2D curvilinear mesh.

do you have a similar dataset that actually has boundary coordinates?

The cubedsphere package is able to compute both cell centers and boundaries. See this gist as a testing framework for you to plug-in xgcm code. Two polar panels of the cubed-sphere grid are very similar to the RASM grid.

Inferring boundaries, or any general interpolation/extrapolation operations, can often be buggy near the poles. A more robust way is to convert (lat, lon) to cartesian (x, y, z) coordinates (subject to the constraint x**2+y**2+z**2=1), so there is no discontinuity in the coordinate variables. This is also how ESMF represents grids internally.

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Aug 29, 2018

There is also a subtle issue with non-uniform 1D grid.

Say the cell centers are:
[-85., -70., -50., -30., -10., 10., 30., 50., 70., 85.]

The actual boundaries that generate the above center values are:
[-90., -80., -60., -40., -20., 0., 20., 40., 60., 80., 90.]
where the first/last cell is half in size compared to others.

xgcm currently gives:
[-92.5, -77.5, -60. , -40. , -20. , 0. , 20. , 40. , 60. , 77.5, 92.5])

Values near two end points are slightly wrong, due to the simple extrapolation algorithm at end points.

See this gist for the full code.

A mathematically rigorous way would be solving a set of linear equations:

b[0] + b[1] = 2*c[0]
b[1] + b[2] = 2*c[1]
...
b[n] + b[n+1] = 2*c[n]

where b[0] or b[n+1] needs to provided by users. This guaranteesc == 0.5*(b[1:] + b[:-1]). The equation is trivial to solve as you can iteratively get b[0]->b[1]->b[2]->... .

The situation becomes quite tricky in 2D cases, but we probably don't need to be too rigorous for 2D curvilinear grids, where the definition of "the center of a cell" is not very clear.

@JiaweiZhuang
Copy link
Owner

Another minor question: Does the key of axes_dims_dict have real effect? Looks like I can use any keys not just 'X', 'Y'.

@jbusecke
Copy link

No these are just the names of the logical axes, they could be i, j, k, or something else.

@jbusecke
Copy link

Also, thanks a lot for the examples.

Inferring boundaries, or any general interpolation/extrapolation operations, can often be buggy near the poles. A more robust way is to convert (lat, lon) to cartesian (x, y, z) coordinates (subject to the constraint x2+y2+z**2=1), so there is no discontinuity in the coordinate variables. This is also how ESMF represents grids internally.

Could we use that internal machinery to get a better result here? I am not sure how much of this logic we want to be adding to xgcm.

A mathematically rigorous way would be solving a set of linear equations:

@rabernat Could we integrate this in xgcm/duck_array_ops/_apply_boundary_condition as an additional boundary condition, e.g. extrapolate_conservative. But I guess this is not ready to accept the two required boundary conditions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants