Add dynamic_zarr_store module #57

emfdavid · 2024-02-02T03:17:27Z

Grib Index Aggregations

The functions in this module allow building kerchunk aggregations of NODD grib2 weather forecasts fast.

The module supports a 3 step process

Extract and persist metadata directly from a few arbitrary grib files for a given product such as HRRR SUBH
Use the metadata mapping to build an index table of every grib message from the idx text files
Combine the index data with the metadata to build any FMRC slice (Horizon, RunTime, ValidTime, BestAvailable)

Once the metadata is created for the grib files from one complete forecast run (for instance, 48 hourly files from the 00Z HRRR SFC product), it takes less than a minutes to index a whole year of forecasts in a single python process - no parallelism required. This speeds up building the aggregations. It does not speed up reading the data (that is next).

A juptyer notebook provides a brief demonstration of the capability.

Camus Energy is using this operationally with GEFS, GFS and HRRR grib2 files, available on NODD hosted cloud storage buckets. There is no requirements file or docker file included in this PR. There are extensive tests that can be shared later. To run the code you must install kerchunk from github as the grib_tree code is not in the version 2.2 release.

This excerpt of our production code is a prototype for the community discussion that we hope can move into Kerchunk.

mpiannucci · 2024-02-02T03:45:57Z

Can't wait to go through this!!!

…to private GCS buckets.

emfdavid · 2024-03-06T15:55:49Z

PR got a little bigger with the test fixtures 😂

I need to cleanup any lingering references to private Camus Energy buckets in the test fixtures.

Add dynamic_zarr_store module

051bd5a

emfdavid mentioned this pull request Feb 2, 2024

add prototype code #56

Closed

emfdavid mentioned this pull request Feb 2, 2024

Kerchunk enhancements for Fast NODD Grib Aggregations ioos/gsoc#42

Open

emfdavid added 5 commits March 6, 2024 15:43

Minor fixes for dzs

55ee9b7

add a readme

6b5f7fa

Add the python notebook demo

7c8da92

add the reqs file

af43602

Add the tests and fixtures. Some of the parquet fixtures still point …

c7293ef

…to private GCS buckets.

emfdavid force-pushed the grib_index_aggregation branch from b8d12e2 to c7293ef Compare March 6, 2024 15:51

Add license to test file

a529e9a

dynamicgribchunking.ipynb

a2f5e01

mpiannucci merged commit 6b32866 into asascience-open:main Apr 5, 2024

emfdavid deleted the grib_index_aggregation branch June 18, 2024 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dynamic_zarr_store module #57

Add dynamic_zarr_store module #57

emfdavid commented Feb 2, 2024 •

edited

Loading

mpiannucci commented Feb 2, 2024

emfdavid commented Mar 6, 2024

Add dynamic_zarr_store module #57

Add dynamic_zarr_store module #57

Conversation

emfdavid commented Feb 2, 2024 • edited Loading

Grib Index Aggregations

mpiannucci commented Feb 2, 2024

emfdavid commented Mar 6, 2024

emfdavid commented Feb 2, 2024 •

edited

Loading