Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

Idea/use case: cfgrib #195

Closed
blaylockbk opened this issue Jan 10, 2023 · 7 comments
Closed

Idea/use case: cfgrib #195

blaylockbk opened this issue Jan 10, 2023 · 7 comments
Labels
enhancement New feature or request IO Representation of particular file formats as trees

Comments

@blaylockbk
Copy link

Hi Tom,

I missed your AMS talk this week because of a conflict, but I looked through the slides (thanks for posting those). Maybe I'll run into you later at AMS.

Just about all numerical weather model data is distributed in the grib format. Xarray has an engine for reading grib and grib2 files (cfgrib) that works great. One limitation with cfgrib is that when a file has variables on multiple types of levels (i.e., temperature at 2 meters, at 500 mb, and at cloud top height) cfgrib can't read the data into a single dataset, so instead it reads the data and returns a list of datasets when you do cfgrib.open_datasets(gribfileName).

If I understand the basics of datatree correctly, it sounds like datatree would be the better way for cfgrib to handle reading this data.

Have you looked at cfgrib and grib data before?

@TomNicholas TomNicholas added enhancement New feature or request IO Representation of particular file formats as trees labels Jan 10, 2023
@TomNicholas
Copy link
Member

Hi Brian!

I missed your AMS talk this week because of a conflict, but I looked through the slides (thanks for posting those). Maybe I'll run into you later at AMS.

No worries - are you coming to the pangeo workshop on Friday?


Have you looked at cfgrib and grib data before?

I have never personally used grib data, but I would be happy to help you make it work in xarray!

One limitation with cfgrib is that when a file has variables on multiple types of levels (i.e., temperature at 2 meters, at 500 mb, and at cloud top height) cfgrib can't read the data into a single dataset, so instead it reads the data and returns a list of datasets when you do cfgrib.open_datasets(gribfileName).

Do you know how you might organise this data in terms of nested groups / nodes? If those group names can be derived from your file then this should be pretty simple. You can see how datatree handles netCDF and Zarr here.

@jthielen
Copy link

Here's a brief snippet of code that could act as a starting point, given the one level depth of organization of datasets output by cfgrib (though could likely be cleaned up if integrated directly into cfgrib to use private functions):

import cfgrib
from datatree import DataTree

def cfgrib_open_datatree(file, **kwargs):
    ds_list = cfgrib.open_datasets(file, **kwargs)
    ds_dict = {}
    for ds in ds_list:
        type_of_level = next(ds.data_vars.values()).attrs.get("GRIB_typeOfLevel", "undef")
        ds_dict[type_of_level] = ds
    return DataTree.from_dict(ds_dict)

@TomNicholas
Copy link
Member

That looks pretty neat already @jthielen ! Could we just add something like that to cfgrib?

Ideally we want this to work:

dt = open_datatree("data.grib", engine="cfgrib")

but I'm not familiar enough with xarray's backend code to know if that can be done purely with changes to cfgrib or whether it requires changes to xarray (/integration of datatree in xarray). cc @jhamman ?

@jthielen
Copy link

My hunch is that we could easily add a cfgrib.open_datatree() method to supplement/replace the existing cfgrib.open_datasets() from what I had (to https://github.com/ecmwf/cfgrib/blob/master/cfgrib/xarray_plugin.py), but supporting the backend engine would take more work (though, perhaps it may only entail adding the appropriate method to BackendEntrypoint? )

@blaylockbk
Copy link
Author

Looks great @jthielen! And so quick.

@TomNicholas, unfortunately I won't be at AMS Friday for the pangeo workshop.

@jhamman
Copy link

jhamman commented Jan 11, 2023

I really like the idea of supporting open_datatree(engine=...) for certain backends. We already have open_dataarray and open_dataset so this will be a natural extension to include in xarray. We will need to do some dedicated design planning to figure out how to integrate with Xarray's backends. I'm thinking that sketching this out at the Pangeo meeting on Friday may be a good use of time.

@TomNicholas
Copy link
Member

The integration of datatree into xarray's backend entrypoint system has now been done, so if anyone wants to try making their grib reader return xarray.core.datatree.DataTree objects they can! You might also be interested in the new open_groups function (pydata/xarray#9137).

As xarray doesn't ship a grib reader, and this should now be possible in xarray upstream, I'm going to close this in favour of cfgrib tracking this enhancement to their package.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request IO Representation of particular file formats as trees
Projects
None yet
Development

No branches or pull requests

4 participants