Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Oceananigans with Python-xgcm #1334

Open
tomchor opened this issue Feb 3, 2021 · 23 comments
Open

Integrating Oceananigans with Python-xgcm #1334

tomchor opened this issue Feb 3, 2021 · 23 comments

Comments

@tomchor
Copy link
Collaborator

tomchor commented Feb 3, 2021

I was thinking that it would be a good idea to make an integration with Python's xgcm package easier. The project is relatively new but I expect it to grow given that it has Pangeo support.

In my mind, this is related to #1313, since the primary (only?) communication with Python is done through NetCDF files. So including some grid metrics in the NetCDF output would be helpful. For now I'm not exactly sure what needs to be there since their docs aren't very explanatory in that sense, but it includes distances between Faces and Centers of the grids and other measures that should be straightforward.

Any thoughts?

@glwagner
Copy link
Member

glwagner commented Feb 3, 2021

This issue is timely since we are on the verge of adding support for curvilinear grids and cubed spheres....

@ali-ramadhan
Copy link
Member

I've used xarray to analyze and plot Oceananigans NetCDF files quite a lot in the past with little difficulty (still really like xarray). Being on a regular Cartesian grid helps a lot obviously.

I guess the main limitation of using xarray with Oceananigans NetCDF output is being on the staggered grid means you can't easily manipulate fields as some interpolation is needed. My understanding is that xgcm can handle all the interpolation for you so you can just compute new fields, integrals, etc. (so some overlap with Oceananigans.AbstractOperations).

We might be able to add all the grid metrics to Oceananigans NetCDF files (optional probably) to allow xgcm to open Oceananigans NetCDF files as if they were MITgcm output. This way xgcm wouldn't need to support anything new.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 3, 2021

I've used xarray to analyze and plot Oceananigans NetCDF files quite a lot in the past with little difficulty (still really like xarray). Being on a regular Cartesian grid helps a lot obviously.

You can use xarray on Oceananigans data blindly and be happy with it (I do it for the most part!), but their routines for interpolation and especially integration and differentiation don't match up with Oceananigans' finite volume scheme. So if you need to be precise with your calculations, you definitely shouldn't use the xarray default routines. And yes, this definitely overlaps with AbstractOperations, but I think that's okay, no?

@ali-ramadhan
Copy link
Member

Yes I think it's very okay to overlap with AbstractOperations. Seems like it should be pretty easy to output NetCDF files that will work with xgcm so why not do it (grid metrics might be generally useful to include too).

@glwagner
Copy link
Member

glwagner commented Feb 3, 2021

And yes, this definitely overlaps with AbstractOperations, but I think that's okay, no?

Yes! The more ways to analyze your data, the better.

We might be able to add all the grid metrics to Oceananigans NetCDF files (optional probably) to allow xgcm to open Oceananigans NetCDF files as if they were MITgcm output.

My suggestion is a self-contained helper function, something like add_grid_metrics!(writer::NetCDFOutputWriter, grid). Then we can invoke it inside the constructor for NetCDFOutputWriter if desired.

@francispoulin
Copy link
Collaborator

Can anyone share an example of how to use xarray with Oceananigans output? I have never tried to before and am curious. But I can also wait to see what people do when they create PR's.

@ali-ramadhan
Copy link
Member

You should be able to load any NetCDF file using xarray and it gives you a nice and powerful interface for analyzing and manipulating the data.

Probably not the best example as it's a little old and messy (I've started using Makie more often) but here's one example: https://github.com/CliMA/LESbrary.jl/blob/3595ff2e1db6d5e6898b6ea84335fdb9dbd23b15/src/make_lesbrary_plots.py

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 6, 2021

Can anyone share an example of how to use xarray with Oceananigans output? I have never tried to before and am curious. But I can also wait to see what people do when they create PR's.

xarray actually has a pretty good example gallery. Basically you can replace any time you create/read a dataset in that gallery with one command to open an Oceananigans NetCDF output. xarray is truly one of the most useful packages I've come across in my life so far (in any programming language that I've tried).

My suggestion is a self-contained helper function, something like add_grid_metrics!(writer::NetCDFOutputWriter, grid). Then we can invoke it inside the constructor for NetCDFOutputWriter if desired.

I agree. A possibly silly question: is it worth doing the same thing to JLD2OutputWriter? I've never really used it, so I don't know what kind of information it already has and what other libraries can take that format natively.

@glwagner
Copy link
Member

glwagner commented Feb 8, 2021

I agree. A possibly silly question: is it worth doing the same thing to JLD2OutputWriter? I've never really used it, so I don't know what kind of information it already has and what other libraries can take that format natively.

We probably don't need special functions for JLD2 because we can serialize the grid to disk. The harder problem is writing an extension to xgcm that opens JLD2 data. Probably not worth it... instead I think we will be developing a framework for analyzing output in julia (using the existing AbstractOperations framework, for example) and visualizing it with Makie, which hopefully will add value and complement xgcm's capabilities.

@rabernat
Copy link

Hi folks. I just saw this issue. We are very glad that you're working to support interoperability btw oceananigans and xgcm! 🎉 We'd love to help however we can.

Ideally you would not have to really do much here other than use CF conventions in your netCDF output and things would "just work." That's the beauty of standards.

Unfortunately, CF conventions don't quite provide the right vocabulary to describe the curvilinear geometry of staggered grid models compactly (see cf-convention/discuss#5). In the meantime, every modeling center seems to have their own preference for how to encode this (e.g. comodo conventions [now offline] used by ROMS and NEMO, S-grid, mosaics from GFDL).

With xgcm, we decided to use the Comodo conventions (rather than invent yet another new convention). In retrospect, this was maybe the wrong choice, since the pycomodo project seems to have totally disappeared. 🤦 However, if you put the right metadata in your attributes, xgcm should be able to figure out your grid.

Whatever you do, please try your best to squeeze your data into existing standard file formats and metadata conventions.
Don't invent something new. MITgcm did this with the mds data format and it has been endless headaches for our community. I don't know what JLD2 is, but it sounds like you could be going down that route...

If you have any questions, please ask! I'll try to respond.

@jbusecke
Copy link

Ha, @rabernat beat me to it. I just wanted to mention that I am also available for questions and maybe clarify one more point of confusion that seems to stem from our current docs (working on improving that with @tomchor s input).

There are two parts in the NetCDF output that would make the integration with xgcm smooth:

  • Proper metadata for the dimensions (as mentioned by @rabernat above)
  • Output of grid metrics, which are distances, areas and cell volume of the grid. An important detail there is that each of the metrics should describe properties of the surrounding cell. So if you have a tracer value on the point xT and the cell bounds are given by a staggered coordinate xC, the distance (lets call it dx) for each xT point should describe the distance between the two surrounding xC points (and have xC as coordinate, so it can be matched correctly by xgcm). Currently the user has to input these manually but it might be convenient to add the cf-attribute cell_measures, which we might support in the future with a more complete support for cf-metadata (see e.g. parse CF-style "bounds" coordinates into xgcm axes xgcm/xgcm#127)

@glwagner
Copy link
Member

This is great. I hope we can replicate the grid metric naming conventions in the source code as well as in output. It might also be nice to come up with an example/ that illustrates how to analyze Oceananigans output with xgcm (perhaps this can be done in a single julia script via PyCall?)

Our current plan is to support output in NetCDF and JLD2 file formats. JLD2 is a native julia output format capable of serializing julia objects that we use for checkpointing but is unlikely to be widely used in the geosciences.

@johncmarshall54
Copy link

johncmarshall54 commented Feb 11, 2021 via email

@rabernat
Copy link

I am very impressed by the continuous flow of amazing technical and scientific accomplishments from the CliMA group. But by far the most impressive feat is that you have managed to drag @johncmarshall54 onto GitHub. 🤣

@glwagner
Copy link
Member

@tomchor what's the status of this?

@dcherian
Copy link

FYI there is a WIP PR to support the SGRID conventions. That seems to be the "standard" way to proceed

@tomchor
Copy link
Collaborator Author

tomchor commented Mar 22, 2023

@tomchor what's the status of this?

I was waiting for #2842 to get merged. Now that it's merged, I'll start working on #2652 again, which is a first step towards this.

@dcherian
Copy link

The conversation around metrics is still unsettled. AFAICT there's no way to put those in the SGRID variable. There are CF conventions for cell_measures: specifically area and volume but we would also need length. There is a cell_thickness standard_name that could be used for this.

cc @jbusecke

@jbusecke
Copy link

Getting at least area and volume in there would be helpful, but kind of stopping short of 'doing this right'.

Each time I look at the SGRID specs I think "Its right there" when I see the specifications of 'edges=length', 'faces=area', and 'volume'
image
But AFAIK there is no way to actually assign a coordinate/data variable to each of these positions? If we could somehow add that into the spec I think that would be much better than abusing cell_thickness.

In my mind, all that is needed to correctly parse a metric into xgcm is a flag (e.g. this is a metric) and information on the 'kind' of metric and its 'orientation', both of which I believe can be encoded in SGRID already? So really all that is needed is some metadata that implements the flag. I hope this makes sense to folks.

Maybe we get some of the sgrid folks into this discussion? Just to see if I misinterpret what is already possible and whether we could add something to SGRID that indicates that a variable/coordinate represents some physical metric (length, area, volume).

cc @hrajagers @rsignell-usgs

@jatkinson1000
Copy link

For those following this thread the SGRID PR mentioned above has now been merged to add SGRID functionalities into xgcm. Please do try out if useful.
There are some docs for it here describing to to have it automatically detect and extract an SGRID grid, and I'm happy for feedback/questions.

@tomchor
Copy link
Collaborator Author

tomchor commented Apr 20, 2023

@jatkinson1000 thanks for your effort in xgcm/xgcm#559. I'm glad there's now a clear standard way to proceed!

I'm gonna take a crack at adding those grid measures to Oceananigans NetCDF output in #2652 now that it seems the pieces are all in place. However, I read the docs and I'm still a bit confused on how to implement them. Do you have an example NetCDF file (preferably 3D) that follows SGRID conventions that you can share? It would help me to use that as a basis of comparison.

Thanks!

@jatkinson1000
Copy link

Hi @tomchor There are a few examples of SGRID datasets listed on the original xgcm sgrid issue. I can also dig these out on my computer and share via email if you need - let me know.

When you say documentation, is that the general SGRID documentation, or the specific xgcm documentation relating to SGRID?

@tomchor
Copy link
Collaborator Author

tomchor commented May 3, 2023

@jatkinson1000 sorry for the delay. I'm trying to download the datasets mentioned on the original xgcm sgrid issue (namely the ones mentioned here and here) but I'm getting errors.

Could you please dig some files out and either attach them here (I think you may need to zip them first) or email them to me? You can find my email through my personal website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants