Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with the s5 forecast data #131

Open
tommylees112 opened this issue Oct 23, 2019 · 3 comments
Open

Working with the s5 forecast data #131

tommylees112 opened this issue Oct 23, 2019 · 3 comments

Comments

@tommylees112
Copy link
Contributor

the s5 data has lots of nuances

so in order to be a complete xarray object there are 4 different forecast_horizons for each month so that the only array with non-nan values is when the valid_time is the first of the next month.

the forecast_horizon is set to a size of 9 (28days, 30days, 31days + 30 + 60 ...) which is the way to code n month forecasts (1month = 28/30/31 depending on the length of the month).

Screenshot 2019-10-23 at 15 37 04
Screenshot 2019-10-23 at 15 36 53
Screenshot 2019-10-23 at 15 36 40

So these forecasts are all initialised on 2014-01-01 and January is 31 days long. So the only valid data comes from the 3rd indexed forecast_horizon (here coded as step) which corresponds to 31 days. This takes the valid_time to 2014-02-01 which is the month ahead. Whereas February/March are 59 days long total. So the 4th indexed forecast_horizon[3] corresponds to 59 days which is the valid step here (valid_time == 2014-03-01)

Screenshot 2019-10-23 at 15 56 36

We need a way to get the valid timesteps so that we have a complete xarray object with data at each timestep. I can think of a few ways but we could document them here.

@tommylees112
Copy link
Contributor Author

Just to avoid confusion and looking at the preprocessed data:

Screenshot 2019-10-23 at 16 03 54
Screenshot 2019-10-23 at 16 03 37
Screenshot 2019-10-23 at 16 04 14

@tommylees112
Copy link
Contributor Author

@tommylees112
Copy link
Contributor Author

tommylees112 commented Oct 24, 2019

To select by forecast horizon

        # map forecast horizons to months ahead
        map_ = {
            pd.Timedelta('28 days 00:00:00'): 1,
            pd.Timedelta('29 days 00:00:00'): 1,
            pd.Timedelta('30 days 00:00:00'): 1,
            pd.Timedelta('31 days 00:00:00'): 1,
            pd.Timedelta('59 days 00:00:00'): 2,
            pd.Timedelta('60 days 00:00:00'): 2,
            pd.Timedelta('61 days 00:00:00'): 2,
            pd.Timedelta('62 days 00:00:00'): 2,
            pd.Timedelta('89 days 00:00:00'): 3,
            pd.Timedelta('90 days 00:00:00'): 3,
            pd.Timedelta('91 days 00:00:00'): 3,
            pd.Timedelta('92 days 00:00:00'): 3,
        }

        fhs = [pd.Timedelta(fh) for fh in stacked.forecast_horizon.values]
        months = [map_[fh] for fh in fhs]
        stacked = stacked.assign_coords(months_ahead=('time', months))

Which we can then use to subset the data as follows:

stacked.loc[dict(time=stacked.months_ahead == 1)].rename({var: var + '_1'})

<xarray.Dataset>
Dimensions:              (lat: 45, lon: 35, time: 312)
Coordinates:
  * lon                  (lon) float32 33.75 34.0 34.25 ... 41.75 42.0 42.25
  * lat                  (lat) float32 6.0 5.75 5.5 5.25 ... -4.5 -4.75 -5.0
  * time                 (time) datetime64[ns] 1993-03-03 ... 2019-01-31
    valid_time           (time) datetime64[ns] dask.array<shape=(312,), chunksize=(312,)>
    initialisation_date  (time) datetime64[ns] 1993-01-31 ... 2018-12-31
    forecast_horizon     (time) timedelta64[ns] 31 days 28 days ... 31 days
    months_ahead         (time) int64 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
Data variables:
    tprate_1             (lat, lon, time) float64 dask.array<shape=(45, 35, 312), chunksize=(45, 35, 312)>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant