Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Timestamp subtraction must have the same timezones or no timezones" when saving a NetCDF #2649

Closed
matteodefelice opened this issue Jan 4, 2019 · 6 comments · Fixed by #2651

Comments

@matteodefelice
Copy link

I have an issue when saving a Dataset to NetCDF. This is the example NetCDF I am using.

import xarray as xr
d = xr.open_dataset('example.nc')      
d.to_netcdf('out.nc')

Then I get:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-baf698f1bf45> in <module>
----> 1 d.to_netcdf('out.nc')

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute)
   1241                          engine=engine, encoding=encoding,
   1242                          unlimited_dims=unlimited_dims,
-> 1243                          compute=compute)
   1244 
   1245     def to_zarr(self, store=None, mode='w-', synchronizer=None, group=None,

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile)
    747         # to be parallelized with dask
    748         dump_to_store(dataset, store, writer, encoding=encoding,
--> 749                       unlimited_dims=unlimited_dims)
    750         if autoclose:
    751             store.close()

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
    790 
    791     store.store(variables, attrs, check_encoding, writer,
--> 792                 unlimited_dims=unlimited_dims)
    793 
    794 

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    259             writer = ArrayWriter()
    260 
--> 261         variables, attributes = self.encode(variables, attributes)
    262 
    263         self.set_attributes(attributes)

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/backends/common.py in encode(self, variables, attributes)
    345         # All NetCDF files get CF encoded by default, without this attempting
    346         # to write times, for example, would fail.
--> 347         variables, attributes = cf_encoder(variables, attributes)
    348         variables = OrderedDict([(k, self.encode_variable(v))
    349                                  for k, v in variables.items()])

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/conventions.py in cf_encoder(variables, attributes)
    603     """
    604     new_vars = OrderedDict((k, encode_cf_variable(v, name=k))
--> 605                            for k, v in iteritems(variables))
    606     return new_vars, attributes

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/conventions.py in <genexpr>(.0)
    603     """
    604     new_vars = OrderedDict((k, encode_cf_variable(v, name=k))
--> 605                            for k, v in iteritems(variables))
    606     return new_vars, attributes

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/conventions.py in encode_cf_variable(var, needs_copy, name)
    233                   variables.CFMaskCoder(),
    234                   variables.UnsignedIntegerCoder()]:
--> 235         var = coder.encode(var, name=name)
    236 
    237     # TODO(shoyer): convert all of these to use coders, too:

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/coding/times.py in encode(self, variable, name)
    393                 data,
    394                 encoding.pop('units', None),
--> 395                 encoding.pop('calendar', None))
    396             safe_setitem(attrs, 'units', units, name=name)
    397             safe_setitem(attrs, 'calendar', calendar, name=name)

~/miniconda2/envs/cds/lib/python3.6/site-packages/xarray/coding/times.py in encode_cf_datetime(dates, units, calendar)
    363         # an OverflowError is raised if the ref_date is too far away from
    364         # dates to be encoded (GH 2272).
--> 365         num = (pd.DatetimeIndex(dates.ravel()) - ref_date) / time_delta
    366         num = num.values.reshape(dates.shape)
    367 

~/miniconda2/envs/cds/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in __sub__(self, other)
    898                 result = self._add_offset(-other)
    899             elif isinstance(other, (datetime, np.datetime64)):
--> 900                 result = self._sub_datelike(other)
    901             elif is_integer(other):
    902                 # This check must come after the check for np.timedelta64

~/miniconda2/envs/cds/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in _sub_datelike(self, other)
    876             # require tz compat
    877             elif not self._has_same_tz(other):
--> 878                 raise TypeError("Timestamp subtraction must have the same "
    879                                 "timezones or no timezones")
    880             else:

TypeError: Timestamp subtraction must have the same timezones or no timezones

I have tried with Python 3.7 and 3.6. I have also installed the latest version of xarray hoping that this issue was linked with #2630. Apparently, with other similar NetCDFs I don't get the error but however this is not supposed to happen, given that the same exact code was working a couple of months ago.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Nov 20 2018, 18:20:05) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.3 libnetcdf: 4.6.1

xarray: 0.11.1+9.g06244df
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.5.1
iris: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: 3.0.2
cartopy: 0.17.0
seaborn: None
setuptools: 40.6.3
pip: 18.1
conda: None
pytest: None
IPython: 7.2.0
sphinx: None

@spencerkclark
Copy link
Member

@matteodefelice thanks for the clear report. This is definitely a regression. See #2651 for a possible fix.

@matteodefelice
Copy link
Author

Thanks, I have applied the fix and now it works.

@shoyer
Copy link
Member

shoyer commented Jan 5, 2019

For reference, the issue here appears to be that the time variable's units attribute specifies a timezone:

	int64 time(time) ;
		time:standard_name = forecast_reference_time ;
		time:long_name = initial time of forecast ;
		time:units = seconds since 1970-01-01T00:00:00+00:00 ;
		time:calendar = proleptic_gregorian ;

And here is the relevant description from CF conventions:

The reference time string (appearing after the identifier since) may include date alone; date and time; or date, time, and time zone. ...

Note: if the time zone is omitted the default is UTC, and if both time and time zone are omitted the default is 00:00:00 UTC.

Given that xarray uses timezone agnostic np.datetime64 and cftime.datetime objects, it does seem like the best course of action here is to decode dates to UTC and then remove the timezone.

@spencerkclark
Copy link
Member

Agreed, this was my diagnosis too. I'm a bit late, but upon reflection I think I could have written a more comprehensive test in #2651. See #2654 for what I think should be an improvement.

@cyhsu
Copy link

cyhsu commented Jan 15, 2019

I tried to follow the discussion above but I still have no clue how to fix this issue.

Could you give an example?

@spencerkclark
Copy link
Member

This was an internal bug in xarray introduced in version 0.11.2, which through #2651 should be fixed in the next release. Probably the cleanest way to work around it for now would be to temporarily downgrade to xarray 0.11.0.

If there is something in xarray > 0.11.0 that you need, another option would be to install the master (unreleased) version of xarray, which includes the bug fix in #2651.

Finally, a cruder workaround (if you want to stick with xarray 0.11.2) in the case that time is encoded with the UTC time zone (and not some other time zone) would be to drop the time zone from the units encoding, e.g. with the example file shared above:

In [1]: import xarray

In [2]: ds = xarray.open_dataset('example.nc')

In [3]: ds.time.encoding['units']
Out[3]: 'seconds since 1970-01-01T00:00:00+00:00'

In [4]: ds.time.encoding['units'] = 'seconds since 1970-01-01T00:00:00'

In [5]: ds.to_netcdf('out.nc')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants