-
Notifications
You must be signed in to change notification settings - Fork 42
Idea/use case: cfgrib #195
Comments
Hi Brian!
No worries - are you coming to the pangeo workshop on Friday?
I have never personally used grib data, but I would be happy to help you make it work in xarray!
Do you know how you might organise this data in terms of nested groups / nodes? If those group names can be derived from your file then this should be pretty simple. You can see how datatree handles netCDF and Zarr here. |
Here's a brief snippet of code that could act as a starting point, given the one level depth of organization of datasets output by import cfgrib
from datatree import DataTree
def cfgrib_open_datatree(file, **kwargs):
ds_list = cfgrib.open_datasets(file, **kwargs)
ds_dict = {}
for ds in ds_list:
type_of_level = next(ds.data_vars.values()).attrs.get("GRIB_typeOfLevel", "undef")
ds_dict[type_of_level] = ds
return DataTree.from_dict(ds_dict) |
That looks pretty neat already @jthielen ! Could we just add something like that to cfgrib? Ideally we want this to work: dt = open_datatree("data.grib", engine="cfgrib") but I'm not familiar enough with xarray's backend code to know if that can be done purely with changes to cfgrib or whether it requires changes to xarray (/integration of datatree in xarray). cc @jhamman ? |
My hunch is that we could easily add a |
Looks great @jthielen! And so quick. @TomNicholas, unfortunately I won't be at AMS Friday for the pangeo workshop. |
I really like the idea of supporting |
The integration of datatree into xarray's backend entrypoint system has now been done, so if anyone wants to try making their grib reader return As xarray doesn't ship a grib reader, and this should now be possible in xarray upstream, I'm going to close this in favour of cfgrib tracking this enhancement to their package. |
Hi Tom,
I missed your AMS talk this week because of a conflict, but I looked through the slides (thanks for posting those). Maybe I'll run into you later at AMS.
Just about all numerical weather model data is distributed in the grib format. Xarray has an engine for reading grib and grib2 files (cfgrib) that works great. One limitation with cfgrib is that when a file has variables on multiple types of levels (i.e., temperature at 2 meters, at 500 mb, and at cloud top height) cfgrib can't read the data into a single dataset, so instead it reads the data and returns a list of datasets when you do
cfgrib.open_datasets(gribfileName)
.If I understand the basics of datatree correctly, it sounds like datatree would be the better way for cfgrib to handle reading this data.
Have you looked at cfgrib and grib data before?
The text was updated successfully, but these errors were encountered: