-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for custom seasons spanning calendar years #423
base: main
Are you sure you want to change the base?
Conversation
96c5eca
to
fa087b7
Compare
Example result of # Before dropping
# -----------------
# 2000-1, 2000-2, and 2001-12 months in incomplete "DJF" seasons" so they are dropped
ds.time
<xarray.DataArray 'time' (time: 15)>
array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* time (time) object 2000-01-16 12:00:00 ... 2001-12-16 12:00:00
Attributes:
axis: T
long_name: time
standard_name: time
bounds: time_bnds
# After dropping
# -----------------
ds_new.time
<xarray.DataArray 'time' (time: 12)>
array([cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* time (time) object 2000-03-16 12:00:00 ... 2001-02-15 00:00:00
Attributes:
axis: T
long_name: time
standard_name: time
bounds: time_bnds |
c11e505
to
dc0c325
Compare
Hey @lee1043, this PR seemed to be mostly done when I stopped working on it last year. I just had to fix a few things and update the tests. Would you like to check out this branch to test it out on real data? Also a code review would be appreciated. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #423 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 15
Lines 1555 1614 +59
=========================================
+ Hits 1555 1614 +59 ☔ View full report in Codecov by Sentry. |
@tomvothecoder sure, I will test it out and review. Thank you for the update! |
@tomvothecoder Can this be considered for v0.7.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PR self-review
xcdat/temporal.py
Outdated
warnings.warn( | ||
"The `season_config` argument 'drop_incomplete_djf' is being " | ||
"deprecated. Please use 'drop_incomplete_seasons' instead.", | ||
DeprecationWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Need to a specify a specific version that we will deprecate drop_incomplete_djf. Probably v0.8.0 or v0.9.0.
if len(input_months) != len(predefined_months): | ||
raise ValueError( | ||
"Exactly 12 months were not passed in the list of custom seasons." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed requirements for all 12 months to be included in a custom season
This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point. The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged. |
@tomvothecoder no problem. Thank you for consideration. |
@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows. import os
import xcdat as xc
input_data = os.path.join(
"/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
"psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")
ds = xc.open_mfdataset(input_data)
# Example of custom seasons in a three month format:
custom_seasons = [
['Dec', 'Jan'],
]
season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}
ds.temporal.group_average("psl", "season", season_config=season_config)
|
@lee1043 Thanks for trying to this out and providing an example script! I'll debug the stack trace. |
I want to prioritize this PR for the next release (v0.8.0), which I'm aiming by the next E3SM Unified release scheduled for 2/15/24. We can consider refactoring the implementation with the new feature from Xarray PR #9524 if it makes sense to (e.g., simplifies API and/or underlying implementation, same expected behavior, performance gains, etc.). |
@lee1043 Have you tried testing this PR out recently? |
@tomvothecoder I will give another round of test for this PR in this week. Sorry, it slipped from my list and thank you for the reminder! |
I used 2 years of monthly time series data to test this PR. From the test results below, I believe the code is working as expected. 1. First sanity check: compare xcdat default vs custom seasonimport os
import xarray as xr
import xcdat as xc
import matplotlib.pyplot as plt
# input data is from
# "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/"
input_data = [
"psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201201-201212.nc",
"psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc"
]
ds = xc.open_mfdataset(input_data)
# Sanity check
def compare_xcdat_default_vs_custom(custom_season, custom_drop_incomplete_seasons=True, ax=None, title=None):
if custom_season == "DJF":
custom_seasons = [['Dec', 'Jan', 'Feb']]
index = 0
elif custom_season == "MAM":
custom_seasons = [['Mar', 'Apr', 'May']]
index = 1
elif custom_season == "JJA":
custom_seasons = [['Jun', 'Jul', 'Aug']]
index = 2
elif custom_season == "SON":
custom_seasons = [['Sep', 'Oct', 'Nov']]
index = 3
# season from default method
ds_default_season = ds.temporal.climatology(
"psl",
freq="season",
weighted=True,
season_config={"dec_mode": "DJF", "drop_incomplete_djf": True},
)
da_default_season = ds_default_season['psl'][index]
#print(custom_season, 'default', ds_default_season.sizes, da_default_season.shape)
# season from custom season
season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_seasons': custom_drop_incomplete_seasons}
ds_custom_season = ds.temporal.group_average("psl", "season", season_config=season_config)
# get climatology
if len(ds_custom_season.time) > 1:
ds_custom_season = ds_custom_season.bounds.add_missing_bounds()
ds_custom_season = ds_custom_season.temporal.average("psl")
da_custom_season = ds_custom_season["psl"].squeeze()
#print(custom_season, 'custom', ds_custom_season.sizes, da_custom_season.shape)
# plot
(da_default_season - da_custom_season).plot(ax=ax)
if ax is not None and title is not None:
ax.set_title(title)
fig, ax = plt.subplots(2, 4, figsize=(12, 5))
seasons = ["DJF", "MAM", "JJA", "SON"]
for i, season in enumerate(seasons):
compare_xcdat_default_vs_custom(season, custom_drop_incomplete_seasons=True, ax=ax[0,i], title=season)
compare_xcdat_default_vs_custom(season, custom_drop_incomplete_seasons=False, ax=ax[1,i], title=season)
fig.suptitle("xcdat default - custom \n upper: custom_drop_incomplete_seasons=True \n upper: custom_drop_incomplete_seasons=False")
fig.tight_layout() The result is consistent to that is expected. DJF is one time step with 2. Compare to other toolscustom_season = "JJAS"
custom_seasons = [['Jun', 'Jul', 'Aug', 'Sep']]
#
# xcdat
#
ds = xc.open_mfdataset(input_data)
season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_seasons': True}
ds_cs_xcdat = ds.temporal.group_average("psl", "season", season_config=season_config)
# get climatology
if len(ds_cs_xcdat.time) > 1:
ds_cs_xcdat = ds_cs_xcdat.bounds.add_missing_bounds()
ds_cs_xcdat = ds_cs_xcdat.temporal.average("psl")
da_cs_xcdat = ds_cs_xcdat["psl"].squeeze()
#
# CDAT
#
# "cdscan -x test.xml *.nc" conducted as pre-process
import cdms2
import cdutil
f = cdms2.open('test.xml')
d = f('psl')
custom_season_class = cdutil.times.Seasons(custom_season)
d_cs = custom_season_class.climatology(d)
#
# PMP (xcdat and xarray method)
#
from pcmdi_metrics.utils import custom_season_average
# xcdat method
ds_cs_pmp_xcdat_yearly = custom_season_average(ds, "psl", season=custom_season, method="xcdat")
ds_cs_pmp_xcdat_yearly = ds_cs_pmp_xcdat_yearly.bounds.add_missing_bounds()
ds_cs_pmp_xcdat = ds_cs_pmp_xcdat_yearly.temporal.average("psl")
da_cs_pmp_xcdat = ds_cs_pmp_xcdat['psl']
# xarray method
ds_cs_pmp_xarray_yearly = custom_season_average(ds, "psl", season=custom_season, method="xarray")
ds_cs_pmp_xarray = ds_cs_pmp_xarray_yearly.mean(dim=["year"])
da_cs_pmp_xarray = ds_cs_pmp_xarray['psl']
#
# Compare results
#
fig, ax = plt.subplots(2, 4, figsize=(12, 5))
da_cs_xcdat.plot(ax=ax[0,0])
da_cs_cdat.plot(ax=ax[0,1])
da_cs_pmp_xcdat.plot(ax=ax[0,2])
da_cs_pmp_xarray.plot(ax=ax[0,3])
(da_cs_xcdat.to_numpy() - da_cs_cdat).plot(ax=ax[1,0])
(da_cs_cdat - da_cs_pmp_xcdat.to_numpy()).plot(ax=ax[1,1])
(da_cs_xcdat - da_cs_pmp_xcdat).plot(ax=ax[1,2])
(da_cs_pmp_xcdat - da_cs_pmp_xarray).plot(ax=ax[1,3])
ax[0,0].set_title("xcdat")
ax[0,1].set_title("cdat")
ax[0,2].set_title("pmp (xc)")
ax[0,3].set_title("pmp (xr)")
ax[1,0].set_title("xcdat - cdat")
ax[1,1].set_title("cdat - pmp (xc)")
ax[1,2].set_title("xcdat - pmp (xc)")
ax[1,3].set_title("pmp (xc) - pmp (xr)")
fig.suptitle(custom_season)
fig.tight_layout() No difference between xcdat, cdat, and pmp (xcdat) results, while pmp (xarray) is inconsistent maybe because it does not consider propor temporal weighting. |
@tomvothecoder if you plan to switch |
The
I commented in this thread above that users will who use In this PR right now, I will update this PR to restore the original |
- Remove logic for requiring all 12 months to be used
- Add conditional that determines whether subsetting time coordinates is necessary with custom seasons - Update docstrings for `season_config` - Add tests
4131525
to
4c3e8dd
Compare
From our meeting on 10/23:
|
@xCDAT/core-developers For the final time coordinate output, the current logic maps the custom season to its middle month. For example, year 2000 and "NDJFM" will be represented by How useful would it be to keep the custom seasons as auxiliary coords? For example, In Xarray, grouping by season will result in "season" coords. I'm not sure if these auxiliary coords are CF-compliant, although probably not a big deal. For example, <xarray.DataArray 'season' (season: 2)> Size: 32B
array(['DJFM', 'AMJ'], dtype='<U4')
Coordinates:
* season (season) <U4 32B 'DJFM' 'AMJ' Alternatively, the user can just reference the attributes of the data variable to see the custom seasons.
Let me know your thoughts. This can also be a future enhancement or something to include if we decide to refactor later on. |
- Months are also shifted in the `_preprocess_dataset()` method now. Before months were being shifted twice, once when dropping incomplete seasons or DJF, and a second time when labeling time coordinates.
Description
TODO:
_shift_spanning_months()
)custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]
:["Nov", "Dec"]
are from the previous year since they are listed before"Jan"
["Jan", "Feb", "Mar"]
are from the current year["Nov", "Dec"]
need to be shifted a year forward for correctgrouping.
_drop_incomplete_seasons()
)_drop_incomplete_djf()
drop_incomplete_djf
withdrop_incomplete_season
cftime
time coordinates. Does it make sense to also keep the custom seasons with the time coordinates, similar to what Xarray does?Checklist
If applicable:
Additional Context
Google Slides explaining logic
Refactoring this PR to use Add SeasonGrouper, SeasonResampler pydata/xarray#9524 will most likely require addressing [Refactor]: Consider using
flox
andxr.resample()
to improve temporal averaging grouping logic #217, which involves extensive refactoring in how the time coordinates are pre-processed based on the averaging mode and frequency. As ofxarray >=2024.09.0
, Xarray supports grouping by multiple variables too.