Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time bounds returned after an operation with resample-method #2231

Open
rpnaut opened this issue Jun 13, 2018 · 8 comments
Open

Time bounds returned after an operation with resample-method #2231

rpnaut opened this issue Jun 13, 2018 · 8 comments
Labels

Comments

@rpnaut
Copy link

rpnaut commented Jun 13, 2018

Problem description

For datamining with xarray there is always the following issue with the resampling-method.
If i resample e.g. a timeseries with hourly values to monthly values, the netcdf-standards tell us to put into the result file information about:

  1. the bounds for each timestep over which the aggregation was taken (for each month the beginning and the end of the month)
  2. the method which was used for aggregation decoded by the variable attribute 'cell_method' (e.g. 'time: mean').

The recent implementation should be improved which is proven by the following data example.

Data example

I have a dataset with hourly values over a period of 5 month.

<xarray.Dataset>
Dimensions:       (bnds: 2, time: 3672)
Coordinates:
    rlon          float32 22.06
    rlat          float32 5.06
  * time          (time) datetime64[ns] 2006-05-01 2006-05-01T01:00:00 ...
Dimensions without coordinates: bnds
Data variables:
    rotated_pole  int32 1
    time_bnds     (time, bnds) float64 1.304e+07 1.305e+07 1.305e+07 ...
    TOT_PREC      (time) float64 nan nan nan nan nan nan nan nan nan nan nan ...
Attributes:

Doing a resample process using the mean operator gives

In [36]: frs
Out[36]: 
<xarray.Dataset>
Dimensions:       (bnds: 2, time: 5)
Coordinates:
  * time          (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...
Dimensions without coordinates: bnds
Data variables:
    rotated_pole  (time) float64 1.0 1.0 1.0 1.0 1.0
    time_bnds     (time, bnds) float64 1.438e+07 1.438e+07 1.702e+07 ...
    TOT_PREC      (time) float64 12.0 nan nan nan nan

Here the time_bnds is still in the file but the content is very strange:

In [37]: frs["time_bnds"]
Out[37]: 
<xarray.DataArray 'time_bnds' (time: 5, bnds: 2)>
array([[  1.438020e+07,   1.438380e+07],
       [  1.701540e+07,   1.701900e+07],
       [  1.965060e+07,   1.965420e+07],
       [  2.232900e+07,   2.233260e+07],
       [ -6.330338e+10,  -6.330338e+10]])
Coordinates:
  * time     (time) datetime64[ns] 2006-05-31 2006-06-30 2006-07-31 ...
Dimensions without coordinates: bnds

So, he still knows that time_bnds is related to the coordinate time. However, the values are not correct. The first time_bnds entry should be [1.5.2006 00:00,31.5.2006 23:00]. That is definitely not the case, i.e. the numbers here are related to the original file (seconds since 2005-12-01), but they do not match to my expection. 1.438020e+07 equals "Dienstag, 16. Mai 2006, 10:30:00" and 1.438380e+07 equals "Dienstag, 16. Mai 2006, 11:30:00".
Moreover, the xarray's do not consider to change the unit of the time_bnds according the unit of the variable 'time' if data is written to netcdf. Output of the program ncdump reveals that time was changed to days since but time_bnds seems to be still coded in "seconds since".

ncdump -v time_bnds try.nc
netcdf try {
dimensions:
	time = 5 ;
	bnds = 2 ;
variables:
	double rotated_pole(time) ;
		rotated_pole:_FillValue = NaN ;
	double time_bnds(time, bnds) ;
		time_bnds:_FillValue = NaN ;
	double TOT_PREC(time) ;
		TOT_PREC:_FillValue = NaN ;
	int64 time(time) ;
		time:units = "days since 2006-05-31 00:00:00" ;
		time:calendar = "proleptic_gregorian" ;
data:

 time_bnds =
  14380200, 14383800,
  17015400, 17019000,
  19650600, 19654200,
  22329000, 22332600,
  -63303379200, -63303379200 ;
}

Is there a recommendation what to do?

@rpnaut
Copy link
Author

rpnaut commented Jun 13, 2018

I want to add that sometimes the variable time_bnds is already gone after resampling.

@shoyer
Copy link
Member

shoyer commented Jun 14, 2018

Sorry, xarray doesn’t handle time bounds directly, nor does it update metadata according to cfconventions. These were intentional design choices to keep xarray simple, but in principle you could layer cf convention handling on top of xarray.

For this sort of analysis, using a tool like Iris designed to follow cf conventions might make sense. You can convert directly between iris cubes and xarray DataArray objects.

@shoyer
Copy link
Member

shoyer commented Jun 14, 2018

One thing I’ll note is that you probably want to make the time bounds variables coordinates rather than data variables. But that means the time bounds will probably simply be dropped when you resample.

@aidanheerdegen
Copy link
Contributor

aidanheerdegen commented Aug 7, 2018

Sorry, xarray doesn’t handle time bounds directly, nor does it update metadata according to cfconventions. These were intentional design choices to keep xarray simple, but in principle you could layer cf convention handling on top of xarray.

Nor does it bring along bounds variables when extracting variables from a dataset, e.g.

        double time(time) ;
                time:long_name = "time" ;
                time:cartesian_axis = "T" ;
                time:calendar_type = "NOLEAP" ;
                time:bounds = "time_bounds" ;
                time:units = "days since 0001-01-01" ;
                time:calendar = "NOLEAP" ;

When a variable using the time dimension is extracted from a Dataset, the time_bounds variable is missing.

Is this also an intentional choice or something that xarray could/should support? Or does already and I've missed how to invoke this.

Edit: I've just realised, how is xarray supposed to "bring along" another variable in a DataArray object? I'll leave this query as maybe there is a solution? Have a bounds attribute similar to the coords attribute?

Is this just a dupe of #1475 ?

@stale
Copy link

stale bot commented Jul 11, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jul 11, 2020
@rpnaut
Copy link
Author

rpnaut commented Jul 14, 2020

Maybe, I will look to create a wrapper to handle the time_bounds issue for files following the cf-conventions. Note, that not only resample operations should modify the time_bounds, but also the reselect process should take care about time_bounds. As an example, we assume to have in one file A instantenous data (two times a day at 00 UTC and 12 UTC) and in the other file B aggregated data (daily averages with time stamps defined at the end of the aggregation interval). The reselection process of A in B should pick up only the times 12 UTC from file A (or even better: no time steps because aggregation interval in file B is not compatible with instantenous values).

@stale stale bot removed the stale label Jul 14, 2020
@dcherian
Copy link
Contributor

@rpnaut see the discussion in xarray-contrib/cf-xarray#10 . That discussion focuses more on using the bounds to properly weight points when resampling.

But we could also make .cf.resample generate time bounds and set appropriate attributes on the returned object.

@stale
Copy link

stale bot commented Apr 17, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants