Skip to content

A way to manage forecast Xarray datasets using datatrees

License

Notifications You must be signed in to change notification settings

abkfenris/xarray_fmrc

Repository files navigation

xarray-fmrc

Tests Python Version Dependencies Status

Code style: black Security: bandit Pre-commit Semantic Versions License

Xarray-FMCR uses Xarray datatrees to provide a standard in-memory and storage representation of Forecast Model Run Collections that can then be access via the various forecast views (best estimate/constant offset/constant time/model run).

In [1]: import xarray as xr

In [2]: import xarray_fmrc

In [3]: ds0 = xr.open_dataset("fvcom_gom3_met_2022120118.nc")

In [4]: ds1 = xr.open_dataset("fvcom_gom3_met_2022121218.nc")

In [5]: dt = xarray_fmrc.from_model_runs([ds0, ds1])

In [6]: dt
Out[6]:
DataTree('None', parent=None)
│   Dimensions:                  (forecast_reference_time: 2,
│                                 constant_forecast: 242, constant_offset: 121)
│   Coordinates:
│     * forecast_reference_time  (forecast_reference_time) datetime64[ns] 2022-12...
│     * constant_forecast        (constant_forecast) datetime64[ns] 2022-12-02 .....
│     * constant_offset          (constant_offset) timedelta64[ns] 06:00:00 ... 5...
│   Data variables:
│       model_run_path           (forecast_reference_time) <U29 'model_run/2022-1...
└── DataTree('model_run')
    ├── DataTree('2022-12-01T18:00:00')
    │       Dimensions:                  (forecast_reference_time: 1, time: 121,
    │                                     latitude: 220, longitude: 215)
    │       Coordinates:
    │         * longitude                (longitude) float64 -79.95 -79.86 ... -60.13 -60.04
    │         * latitude                 (latitude) float64 27.03 27.12 ... 47.32 47.41
    │         * time                     (time) datetime64[ns] 2022-12-02 ... 2022-12-07
    │         * forecast_reference_time  (forecast_reference_time) datetime64[ns] 2022-12...
    │           forecast_offset          (time) timedelta64[ns] 06:00:00 ... 5 days 06:00:00
    │       Data variables:
    │           wind_speed               (forecast_reference_time, time, latitude, longitude) float32 ...
    │           wind_from_direction      (forecast_reference_time, time, latitude, longitude) float32 ...
    │       Attributes: (12/178)
    │           ...
    └── DataTree('2022-12-12T18:00:00')
            Dimensions:                  (forecast_reference_time: 1, time: 121,
                                          latitude: 220, longitude: 215)
            Coordinates:
              * longitude                (longitude) float64 -79.95 -79.86 ... -60.13 -60.04
              * latitude                 (latitude) float64 27.03 27.12 ... 47.32 47.41
              * time                     (time) datetime64[ns] 2022-12-13 ... 2022-12-18
              * forecast_reference_time  (forecast_reference_time) datetime64[ns] 2022-12...
                forecast_offset          (time) timedelta64[ns] 06:00:00 ... 5 days 06:00:00
            Data variables:
                wind_speed               (forecast_reference_time, time, latitude, longitude) float32 ...
                wind_from_direction      (forecast_reference_time, time, latitude, longitude) float32 ...
            Attributes: (12/178)
                ...

In [7]: dt.fmrc.constant_offset("12h")
Out[7]:
<xarray.Dataset>
Dimensions:                  (longitude: 215, latitude: 220,
                              forecast_reference_time: 2, time: 2)
Coordinates:
  * longitude                (longitude) float64 -79.95 -79.86 ... -60.13 -60.04
  * latitude                 (latitude) float64 27.03 27.12 ... 47.32 47.41
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 2022-12...
  * time                     (time) datetime64[ns] 2022-12-02T06:00:00 2022-1...
    forecast_offset          (time) timedelta64[ns] 12:00:00 12:00:00
Data variables:
    wind_speed               (forecast_reference_time, time, latitude, longitude) float32 ...
    wind_from_direction      (forecast_reference_time, time, latitude, longitude) float32 ...
Attributes: (12/178)
    ...

Forecast views

Forecast Model Run Collections

The various views are explained in more detail below, but each has a method on the .fmrc accessor that returns a dataset.

  • dt.fmrc.model_run(dt: str | datetime.datetime | pd.Timestamp) -> xr.Dataset
  • dt.fmrc.constant_offset(offset: str | datetime.timedelta | pd.TimeOffset?) -> xr.Dataset
  • dt.fmrc.constant_forecast(dt: str | datetime.datetime | pd.Timestamp) -> xr.Dataset
  • dt.fmrc.best() -> xr.Dataset

A few ideas

Here are some things that aren't implemented, but where this library could go.

Kerchunk

Kerchunk has the ability to break down chunks into smaller chunks. Xarray-FMRC could provide utilities to take a collection of kerchunk files, break them apart, and rebuild them in the various FMRC views.

Xpublish-FMRC

Xpublish-FMRC provides new endpoints for xpublish servers to serve forecast model run collections.

This uses the plugin interface to create a new top level path, and then other dataset plugins to serve various forecast views. For each dataset plugin registered below it, it overrides the get_dataset function.

  • forecasts/gfs/best/edr/position
  • forecasts/gfs/model_run/20230101/edr/position
  • forecasts/gfs/constant_forecast/20230101/edr/position
  • forecasts/gfs/constant_offset/6h/edr/position

FMRC Dataset View definitions

There may be a better name for these, but my brain is currently comparing them to database views.

Definitions pulled from http://www.unidata.ucar.edu/staff/caron/presentations/FmrcPoster.pdf

Model Run Datasets

The RUC model is run hourly, and 12 runs are show in this collection; note that different runs contain forecast hours. The complete results for a single ru model run dataset. The selected example here is the run made on 2006-12-11 06:00 Z, having forecasts at 0,1,2,3,4,5,6,7,8,9 and 12 hours.

Constant forecast/valid time dataset

A constant forecast dataset is created from all data that have the same forecast/valid time. Using the 0 hour analysis as the best state estimate, one can use this dataset to evaluate how accurate forecasts are.

The selected example here is for the forecast time 2006-12-11 12:00 Z, using forecasts from the runs made at 0, 3, 6, 9, 10, 11, and 12 Z. There are a total of 24 such datasets in this collection.

Constant forecast offset datasets

A constant offset dataset is created from all the data that have the same offset time. This collection has 11 such datasets: the 0, 1, 2, 3, 4, 5, 5, 6, 8, 9, and 12 hour offsets.

The selected example here is for the 6 hour offset using forecast from the runs made at 0, 3, 6, 9, and 12 Z.

Best estimate dataset

For each forecast time in the collection, the best estimate for that hour is used to create the best estimate dataset, which covers the entire time range of the collection.

For this example, the best estimate is the 0 hour analysis from each run, plus all the forecasts from the latest run.

Development

Using your favorite python environment checkout and install xarray_fmrc

git clone [email protected]:abkfenris/xarray_fmrc.git
cd xarray_fmrc
pip install -e .
pytest .

TODO: Add more extensive development instructions

About

A way to manage forecast Xarray datasets using datatrees

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published