Programmatically load all messages #73

djhoese · 2019-04-08T17:49:14Z

I'm not a grib expert, so I could be missing something about the available cfgrib functionality. My employee @katherinekolman and I have been working on converting some code that used pygrib to use cfgrib. The end result for our software is an xarray DataArray so cfgrib seemed like a good solution over the eccodes-python. However, we're having trouble reading some grib files from NCEP which are the main grib files we want to support.

We run in to the case a lot where some variables conflict with previously loaded versions of that same variable. We've even tried using the experimental open_datasets but that seems to fail in some cases where open_dataset with manual filter_by_keys succeeds. I think this is similar to #66 and #63.

So my question is, is there an interface (either in cfgrib or eccodes) that would allow us to programmatically list the metadata of a file's messages to see what filter_by_keys could be set to without first failing to load the file? I'm trying to find a way for our software to be given a grib file, analyze what can be loaded, and then provide that information to the user so they can request to load specific pieces of data.

We're willing to help if this is a time issue. If this is a "cfgrib doesn't plan on supporting these types of files or this type of reading" then do you have other ideas? Perhaps we could customize the existing open_datasets?

The text was updated successfully, but these errors were encountered:

alexamici · 2019-04-09T09:42:29Z

@djhoese you are right, at the moment the most powerful interface that we have to handle heterogeneous GRIB files is the filter_by_keys mechanism that requires users to perform one or more manual steps before they can figure out a set of working filters. Furthermore for some GRIB files not all variables can be extracted with simple filters, no matter what. This interface needs to be improved, but I don't have a vision of how to do it, yet.

We don't have an interface to list all messages (and all fields in the case of multi-field messages) in cfgrib for the simple reason that I personally use ecCodes grib_ls and grib_dump to explore the files and I assume most of the people will be familiar with those powerful tools. This is lower priority for cfgrib because there's a very good work-around.

I'll revisit the state of open_datasets because it is supposed to return all variables accessible via filter_by_keys, but I don't use it much it may very well be broken.

djhoese · 2019-04-10T14:05:41Z

I have never used the command line tools, but am also hoping to do everything from python if possible. Also, I don't need to necessarily list all the messages, but would like a way to open/load all the messages; even if that means creating a separate DataArray/Dataset for each message. If you're open to changes perhaps @katherinekolman and I could help come up with something. I'm a little worried we may have to become more familiar with the ECCodes C API than we'd like, but that's ok too.

alexamici · 2019-04-11T19:46:59Z

@djhoese you caught me in the middle of a refactor of how cfgrib accesses GRIB messages at low-level, so things are not as obvious as they should be, sorry.

In order to access programmatically all fields in all GRIB messages (this is not a one-to-one relation in the evento of MULTI-FILED messages, that are used by NCEP in come cases) may use our undocumented low-leve API, for example:

>>> from cfgrib import dataset
>>> grib = dataset.messages.FileStream('myfile.grib')
>>> for message in grib:
>>>    print(message['shortName'])

The FileStream is an iterator can extracts and decodes all fields (not just messages!) in the GRIB file to Message's and you can access all GRIB keys using the Message like a dictionary. Valid keys are all ecCodes defined key, both coded and computed.

Note that it is not of much use, because in rare cases still there no way to extract some of the variables as an xrarray.Dataset, but at least you can explore the file.

BTW, I'm open for collaboration, but I yet don't see the correct strategy to handle totally generic GRIB files.

alexamici · 2019-05-27T17:20:38Z

@djhoese using the new heuristic for cfgrib.open_datasets introduced with version 0.9.7 you should get all, or almost all, variable correctly.

Please open a new issue if you still find that some variable is not usable.

katherinekolman mentioned this issue Apr 29, 2019

Change from pygrib to cfgrib pytroll/satpy#743

Open

4 tasks

alexamici closed this as completed May 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Programmatically load all messages #73

Programmatically load all messages #73

djhoese commented Apr 8, 2019

alexamici commented Apr 9, 2019

djhoese commented Apr 10, 2019

alexamici commented Apr 11, 2019 •

edited

Loading

alexamici commented May 27, 2019

Programmatically load all messages #73

Programmatically load all messages #73

Comments

djhoese commented Apr 8, 2019

alexamici commented Apr 9, 2019

djhoese commented Apr 10, 2019

alexamici commented Apr 11, 2019 • edited Loading

alexamici commented May 27, 2019

alexamici commented Apr 11, 2019 •

edited

Loading