Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic_zarr_store module #57

Merged
merged 8 commits into from
Apr 5, 2024

Conversation

emfdavid
Copy link
Contributor

@emfdavid emfdavid commented Feb 2, 2024

Grib Index Aggregations

The functions in this module allow building kerchunk aggregations of NODD grib2 weather forecasts fast.

The module supports a 3 step process

  1. Extract and persist metadata directly from a few arbitrary grib files for a given product such as HRRR SUBH
  2. Use the metadata mapping to build an index table of every grib message from the idx text files
  3. Combine the index data with the metadata to build any FMRC slice (Horizon, RunTime, ValidTime, BestAvailable)

Once the metadata is created for the grib files from one complete forecast run (for instance, 48 hourly files from the 00Z HRRR SFC product), it takes less than a minutes to index a whole year of forecasts in a single python process - no parallelism required. This speeds up building the aggregations. It does not speed up reading the data (that is next).

A juptyer notebook provides a brief demonstration of the capability.

Camus Energy is using this operationally with GEFS, GFS and HRRR grib2 files, available on NODD hosted cloud storage buckets. There is no requirements file or docker file included in this PR. There are extensive tests that can be shared later. To run the code you must install kerchunk from github as the grib_tree code is not in the version 2.2 release.

This excerpt of our production code is a prototype for the community discussion that we hope can move into Kerchunk.

@emfdavid emfdavid mentioned this pull request Feb 2, 2024
@mpiannucci
Copy link
Contributor

Can't wait to go through this!!!

@emfdavid
Copy link
Contributor Author

emfdavid commented Mar 6, 2024

PR got a little bigger with the test fixtures 😂
Screenshot 2024-03-06 at 10 53 44 AM

I need to cleanup any lingering references to private Camus Energy buckets in the test fixtures.

@mpiannucci mpiannucci merged commit 6b32866 into asascience-open:main Apr 5, 2024
@emfdavid emfdavid deleted the grib_index_aggregation branch June 18, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants