Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatically load all messages #73

Closed
djhoese opened this issue Apr 8, 2019 · 4 comments
Closed

Programmatically load all messages #73

djhoese opened this issue Apr 8, 2019 · 4 comments

Comments

@djhoese
Copy link

djhoese commented Apr 8, 2019

I'm not a grib expert, so I could be missing something about the available cfgrib functionality. My employee @katherinekolman and I have been working on converting some code that used pygrib to use cfgrib. The end result for our software is an xarray DataArray so cfgrib seemed like a good solution over the eccodes-python. However, we're having trouble reading some grib files from NCEP which are the main grib files we want to support.

We run in to the case a lot where some variables conflict with previously loaded versions of that same variable. We've even tried using the experimental open_datasets but that seems to fail in some cases where open_dataset with manual filter_by_keys succeeds. I think this is similar to #66 and #63.

So my question is, is there an interface (either in cfgrib or eccodes) that would allow us to programmatically list the metadata of a file's messages to see what filter_by_keys could be set to without first failing to load the file? I'm trying to find a way for our software to be given a grib file, analyze what can be loaded, and then provide that information to the user so they can request to load specific pieces of data.

We're willing to help if this is a time issue. If this is a "cfgrib doesn't plan on supporting these types of files or this type of reading" then do you have other ideas? Perhaps we could customize the existing open_datasets?

@alexamici
Copy link
Contributor

@djhoese you are right, at the moment the most powerful interface that we have to handle heterogeneous GRIB files is the filter_by_keys mechanism that requires users to perform one or more manual steps before they can figure out a set of working filters. Furthermore for some GRIB files not all variables can be extracted with simple filters, no matter what. This interface needs to be improved, but I don't have a vision of how to do it, yet.

We don't have an interface to list all messages (and all fields in the case of multi-field messages) in cfgrib for the simple reason that I personally use ecCodes grib_ls and grib_dump to explore the files and I assume most of the people will be familiar with those powerful tools. This is lower priority for cfgrib because there's a very good work-around.

I'll revisit the state of open_datasets because it is supposed to return all variables accessible via filter_by_keys, but I don't use it much it may very well be broken.

@djhoese
Copy link
Author

djhoese commented Apr 10, 2019

I have never used the command line tools, but am also hoping to do everything from python if possible. Also, I don't need to necessarily list all the messages, but would like a way to open/load all the messages; even if that means creating a separate DataArray/Dataset for each message. If you're open to changes perhaps @katherinekolman and I could help come up with something. I'm a little worried we may have to become more familiar with the ECCodes C API than we'd like, but that's ok too.

@alexamici
Copy link
Contributor

alexamici commented Apr 11, 2019

@djhoese you caught me in the middle of a refactor of how cfgrib accesses GRIB messages at low-level, so things are not as obvious as they should be, sorry.

In order to access programmatically all fields in all GRIB messages (this is not a one-to-one relation in the evento of MULTI-FILED messages, that are used by NCEP in come cases) may use our undocumented low-leve API, for example:

>>> from cfgrib import dataset
>>> grib = dataset.messages.FileStream('myfile.grib')
>>> for message in grib:
>>>    print(message['shortName'])

The FileStream is an iterator can extracts and decodes all fields (not just messages!) in the GRIB file to Message's and you can access all GRIB keys using the Message like a dictionary. Valid keys are all ecCodes defined key, both coded and computed.

Note that it is not of much use, because in rare cases still there no way to extract some of the variables as an xrarray.Dataset, but at least you can explore the file.

BTW, I'm open for collaboration, but I yet don't see the correct strategy to handle totally generic GRIB files.

@alexamici
Copy link
Contributor

@djhoese using the new heuristic for cfgrib.open_datasets introduced with version 0.9.7 you should get all, or almost all, variable correctly.

Please open a new issue if you still find that some variable is not usable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants