Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ESMF unstructured grids #18

Open
jhamman opened this issue Mar 13, 2018 · 11 comments
Open

Using ESMF unstructured grids #18

jhamman opened this issue Mar 13, 2018 · 11 comments

Comments

@jhamman
Copy link

jhamman commented Mar 13, 2018

I'm wondering what it would take to support using ESMF's unstructured grids with xESMF. I had a conversation with @bekozi today where we discussed this briefly but we weren't sure where xESMF stood on the issue.

Ultimately, it would be great if xESM could support bi-directional remapping between structured/unstructured grids/meshes.

@JiaweiZhuang
Copy link
Owner

My major concern is how can xarray represent unstructured grids (as discussed in docs)?

I have little experience with unstructured grids like in MPAS and don't know how MPAS folks analyze their data. I believe that it is possible to design a data model that works fairly well with one specific type of unstructured grid (say the one used by MPAS), but an intuitive, easy-to-use frontend for arbitrary unstructured grid sounds too fancy for me.

For my personal research I use GFDL-FV3 which fortunately uses a (multi-tile) quadrilateral grid and can be handled gracefully by current xESMF. So I don't have personal needs to support unstructured grids. That said, if there's any good idea about unstructured meshes I am willing to try. It will be pretty cool.

@jhamman
Copy link
Author

jhamman commented Mar 15, 2018

My major concern is how can xarray represent unstructured grids (as discussed in docs)?

My thought here is that we can follow existing conventions for this. There are existing specs/conventions for how these grids can be stored in the common data model (e.g. UGRID).

While these formats are not as intuitive as a logical grid may be, xarray should not have a problem storing a UGRID dataset. ESMF includes UGRID as a source/target grid spec so this should not be a problem.

if there's any good idea about unstructured meshes I am willing to try. It will be pretty cool.

To be clear, I'm not asking you to do any work here but just evaluating what it would take to make this work. I do have a use case that I would like to see an ESMF unstructured grid work for.

Perhaps we can get @bekozi to comment on what the workflow might look like here.

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Mar 16, 2018

xarray should not have a problem storing a UGRID dataset.

xarray can indeed store all UGRID information in a Dataset but I wonder if that gives any benefit than using dictionaries or something simpler to store the same data. A whole extension is needed to utilize the connectivity information, such as allowing dr.sel(), dr.plot() on unstructured mesh.

If xarray doesn't give additional advantage for analyzing unstructured mesh, it doesn't seem to worth the effort to use it as the frontend for ESMPy regridding. Directly writing ESMPy code might be easier.

A similar issue is the current treatment of cell boundaries (pydata/xarray#1475, referred so many times...). N+1 sized boundary variables can indeed live in a Dataset but they cannot be utilized by other xarray functionalities such as plotting or slicing. And this is a much simpler problem than unstructured mesh...

@jhamman
Copy link
Author

jhamman commented Mar 21, 2018

If xarray doesn't give additional advantage for analyzing unstructured mesh, it doesn't seem to worth the effort to use it as the frontend for ESMPy regridding.

I'm guessing my use case is sufficiently different from your typical use case that what doesn't seem useful to you is indeed useful to me.

As background, my use case is mapping gridded datasets to hydrologic response units (HRUs, e.g. river basin polygons). This can be done with ESMF using unstructured mesh regridding. Currently, I'm looking to build an xarray oriented tool kit for working with HRU meshes.

So my basic question for this issue is whether or not remapping between xarray datasets that use unstructured grids is within the scope of this package?

@JiaweiZhuang
Copy link
Owner

whether or not remapping between xarray datasets that use unstructured grids is within the scope of this package?

This is out of the current scope of my package. Right now I don't have a clear plan to deal with unstructured meshes, and wonder if directly interfacing xarray with ESMPy for unstructured mesh is the right way to go. The gridded API looks like a more suitable frontend for unstructured mesh, but it is not mature yet. There are some discussions on interfacing gridded with xarray (NOAA-ORR-ERD/gridded#20). Maybe xarray <-> gridded <-> ESMPy is a better way to use ESMPy's unstructured mesh regridding from xarray?

@bekozi
Copy link

bekozi commented Jun 13, 2018

Perhaps we can get @bekozi to comment on what the workflow might look like here.

I'm awful (terrible really) with catching these tagged issues. I'll do better. So, sorry again for the embarrassingly slow response.

Creating a generic unstructured grid interface can be a headache since there is no defacto standard in the climate/netCDF world. There are tools in ocgis that could help with interpreting unstructured formats and getting them into xarray if you are not interested in coding from scratch. Depending on the workflow, you can convert from shapely geometries or go directly from coordinates in a netCDF. The latter is still being ironed out. In terms of unstructured grid metadata conventions, SCRIP Unstructured, ESMF Unstructured, and UGRID are supported in ocgis.

I'd be happy to help you get set up with these interfaces as they can be a bit tricky to use. If the workflow is xarray-->ocgis-->xarray, we'd need to add a to_ocgis somewhere. Given my experience writing one going the other way, I don't expect that to be too difficult...and it's good to have a use case!

Unfortunately I don't have any experience with gridded, so I cannot comment on how interoperable it is with ESMPy. My sense is that it is probably sufficient? 😕 The way ESMF represents meshes internally is similar to UGRID, so if you decided to go with supporting a single convention and wanted to roll your own code, the UGRID to ESMPy code is not a deal breaker.

Anyway, let me know if you have any questions!

@jhamman
Copy link
Author

jhamman commented Jun 20, 2018

@bekozi - thanks for revising this issue.

This is out of the current scope of my package.

I'm curious what it would take to scope this issue such that it fits in xESMF. If we just stick to the UGRID spec and ESMpy, is there a development path for adding unstructured mesh regridding? Does ocgis need to get involved or can ESMF handle the regridding procedure by itself? Does ESMF need to get involved (maybe we just use ocgis but maintain the existing regridding API)?

@JiaweiZhuang
Copy link
Owner

I'm curious what it would take to scope this issue such that it fits in xESMF. If we just stick to the UGRID spec and ESMpy, is there a development path for adding unstructured mesh regridding?

There are multiple things need to be done:

  1. The "front-end design": How to encode UGRID spec in xarray Dataset? As @bekozi said, there is no no de-facto standard for general unstructured grids. The specification we choose should be at least compatible with popular unstructured-grid models (e.g. MPAS), to minimize conversion effort.
  2. The "bridge": Convert xarray Dataset to ESMPy Mesh object. OCGIS might help (see the next reply)?
  3. The computation: Retrieve the regridding weights as explicit numpy arrays, and apply weights using the Dask/Scipy(Numba/Cython) stack, just like xESMF does. This is necessary for scaling-up the regridding computation to the Pangeo platform. ESMPy's built-in, MPI-based parallelization doesn't integrate well with dask/Pangeo.

For (1) and (2), I have little idea since I seldom work with unstructured meshes. I am only looking into (3), which is a general problem for both structured and unstructured grids.

Does ocgis need to get involved or can ESMF handle the regridding procedure by itself?

According to @bekozi 's reply in this comment, ocgis is used to:

  • Read a shapefile representing an unstructured grid
  • Interface with ESMPy for regridding

Seems that it can save some coding compared to pure ESMPy. Also, ocgis has an to_xarray() method. How would it handle unstructured meshes?

Does ESMF need to get involved

Do you mean the Fortran library? No, ESMPy supports unstructured mesh, so everything can be written in Python.

@bekozi
Copy link

bekozi commented Jun 22, 2018

The "front-end design": How to encode UGRID spec in xarray Dataset? As @bekozi said, there is no no de-facto standard for general unstructured grids. The specification we choose should be at least compatible with popular unstructured-grid models (e.g. MPAS), to minimize conversion effort.

Looking through the MPAS Mesh Specification Document, I don't see any mention of a larger convention. I suspect a converter exists somewhere out there to UGRID - maybe you even know of one? In sum, we may need to ask some specialists to clarify this.

The "bridge": Convert xarray Dataset to ESMPy Mesh object. OCGIS might help (see the next reply)?

Yes, OCGIS can help here. I can also reasonably say that supporting the MPAS convention directly in OCGIS would not be out of the question.

ESMPy's built-in, MPI-based parallelization doesn't integrate well with dask/Pangeo.

Tangent incoming...

I have some technical inquiries regarding this, and I'm not sure where it would be best to discuss them. @jhamman, maybe you could direct me.

There are these OcgVM and OcgDist objects that handle MPI communication, decompositions, etc. Colleagues at @bird-house bird-house/bird-house.github.io#17 and myself are interested in dask for memory scaling, but would like to leverage MPI as well for asynchronous IO, some algorithms like spatial subsetting, and interfacing with things like ESMPy. I think dask and MPI would be somewhat interoperable provided the decomposed/chunked dimensions don't change size during processing. I really hope I can reuse these vm objects to enable this (In part, I'm also looking around for shortcuts...err...elegance).

Furthermore, I think it may be possible to take another step and enable spatial decompositions in dask allowing a greater chunking variety. For example, the memory requirements for high spatial resolution regridding could be amenable to dask, and potentially xESMF, as opposed to the intermediate file approach we are using for chunked weight generation. In sum, we need a spatial halo for some operations and this is one potential approach for getting it. Maybe someone has already done this?

We're probably going to move forward with this pretty soon, and it would be great to get advice / buy in from dask experts as opposed to charging blindly into the night!

Read a shapefile representing an unstructured grid

Just to clarify, it can work with a shapefile, shapely, or directly from coordinates (like stored in a netcdf).

Interface with ESMPy for regridding

Depending on the workflow you can get a grid, mesh, field, or the whole kit and kaboodle (never actually typed that before).

Seems that it can save some coding compared to pure ESMPy. Also, ocgis has an to_xarray() method. How would it handle unstructured meshes?

I'm not sure how much to_xarray would come into this unless you offloaded all regridding to OCGIS/ESMPy. You would probably just want the ESMPy field or mesh object. We'd need a to_ocgis function that I could definitely help with. So, in the ideal case:

  1. Read data with xESMF.
  2. Convert to ocgis with metadata and tell it the convention.
  3. Have ocgis convert to appropriate ESMPy object.
  4. Continue with xESMF.

@jhamman
Copy link
Author

jhamman commented Jun 22, 2018

and it would be great to get advice / buy in from dask experts as opposed to charging blindly into the night!

Probably best to open an issue on the pangeo issue tracker and ping mrocklin.

@bekozi
Copy link

bekozi commented Jun 22, 2018

Probably best to open an issue on the pangeo issue tracker and ping mrocklin.

Thanks. I'll try and make something more cohesive. I know the mpi-dask thing is kind of fleshed out on the dask issue tracker.

@JiaweiZhuang JiaweiZhuang pinned this issue Aug 6, 2019
@JiaweiZhuang JiaweiZhuang unpinned this issue Aug 8, 2019
raphaeldussin pushed a commit to raphaeldussin/xESMF that referenced this issue Sep 4, 2020
Update CHANGES.rst file with previous versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants