Change resample_cube_temporal, align with GDAL and improve resampling descriptions in general #244

m-mohr · 2021-04-27T16:15:45Z

Changes:

resample_cube_spatial and resample_spatial: Aligned with recent changes in GDAL and added rms and sum options to methods. Also added descriptions for each option.
resample_cube_temporal: Replaced parameter process (callback-style) with a parameter method that aligns with the spatial resampling processes. See resample_cube_temporal behavior #194 for context. Do all these methods make sense for temporal resampling or should we add/remove some? Does this help with solving the issues at all?

I've also sorted the options for a hopefully better user experience.

I'm still looking into improving the documentation, but want to bring the first draft up ASAP to get feedback.

clausmichele

proposals/resample_cube_temporal.json
How should the back-end compute for example the average? If we have data for every first day of each month and I want to resample to every 15th of each month, what should I compute? The average between the closest valid samples on the left and right in time? It is still not so clear from the definition in my opinion, because computing the average over all valid samples doesn't make so much sense in this case.

m-mohr · 2021-04-28T08:02:42Z

@clausmichele That's exactly that type of feedback I need to get (from back-ends/users) as I think it's better to define what software supports instead of just coming up with an arbitrary specification no one can support. I tried to figure out from other software how they handle temporal resampling but couldn't find much except in gdalcubes. So if you or anyone else has good pointers to implementations, please let me know.

m-mohr · 2021-04-28T08:12:10Z

My naive approach would be that the cube just spans time ranges until they are "half-way" to the neighbours, but

So if your data datcube has timestamps 2020-01-01, 2020-01-02, ..., 2020-01-31 and your target datacube has timestamps 2020-01-04, 2020-01-14, 2020-01-23, 2020-01-31 I'd expect the following mapping:

2020-01-04 uses 2020-01-01 - 2020-01-09 midday
2020-01-14 uses 2020-01-09 midday - 2020-01-18
2020-01-23 uses 2020-01-19 - 2020-01-27 midday
2020-01-31 uses 2020-01-27 midday - 2020-01-31

The example is totally made-up (I hope I counted days correctly) and we may not want to divide days to erase any potential ambiguities. Even more likely that's not the best approach or is not backed by implementations though.

clausmichele · 2021-04-30T07:18:09Z

This seems really too complicated. I see a simpler use case:

Datacube A: high temporal density
Datacube B: lower temporal density

If I want to merge their data I see two possibilities:

Resample A based on B: reduce temporal density of A to align with the time steps of B. We can either use the Nearest Neighbor approach or interpolation (which can be linear or something more complex.)
Resample B based on A: increase temporal density of B to align with time steps of B. We can use again Nearest Neighbor approach or Interpolation.

In my opinion those two possibilities are already enough, for other use cases we still have aggregate_temporal and aggregate_temporal_period

m-mohr · 2021-04-30T07:54:06Z

I'd be happy to just provide nearest neighbor support without any additional methods. That makes the process spec so much simpler.
I guess we could provide the interpolation support in a different way? #233 #173

clausmichele · 2021-04-30T09:13:56Z

It would also be good to hear an opinion from @aljacob @przell @jdries @lforesta

soxofaan · 2021-04-30T10:37:31Z

I like the simplicity of @clausmichele 's proposal.

One note though: how do you handle NaN's? If your nearest neighbor is NaN: do you pick that, or do you look further?

clausmichele · 2021-04-30T12:46:22Z

I like the simplicity of @clausmichele 's proposal.

One note though: how do you handle NaN's? If your nearest neighbor is NaN: do you pick that, or do you look further?

We need to reason about this.
If we work with raster data, let's say Sentinel-2, there could be an empty area filled with NaNs due to the acquisition geometry.
So, if I want to do the temporal resampling, I'll keep the empty area as it is, because looking further in time wouldn't give any additional info if the area is always empty.

A different scenario could be having cloud masked data filled with NaNs in the cloud "holes": here if you try to look further for valid data in the cloud masked areas, you would get a "composite", which is not desired in my opinion.

Concluding, if someone would like to have only valid data and no NaNs, he/she should use the interpolation method filling the gaps.

…mension parameter less restrictive. #194

m-mohr · 2021-05-31T10:29:59Z

I've simplified the process. It only supports nearest neighbor as resampling method so that the process gets easier to implement.

I have not found a lot of details about nearest neighbor for temporal resampling. Has anyone good documentation that we can link to? I'm especially looking for an indication of how ties are resolved, which we should document in the process. The other open question is NaN handling as mentioned above.

# Conflicts: # tests/.words

clausmichele · 2021-06-08T12:37:38Z

I've simplified the process. It only supports nearest neighbor as resampling method so that the process gets easier to implement.

I have not found a lot of details about nearest neighbor for temporal resampling. Has anyone good documentation that we can link to? I'm especially looking for an indication of how ties are resolved, which we should document in the process. The other open question is NaN handling as mentioned above.

@przell maybe something about the wetsnow pipeline/use case could be useful to clarify the behavior of this process? Do we have some public documentation for it?

przell · 2021-06-08T16:40:57Z

The process resample_temporal was just foreseen as a helper in the wet snow use case to align the temporal dimension of the two collections to a common timeseries. So nothing official available from this side. Sorry.
The topic, available options and use cases are quite manyfold. So using nearest neighbor with, for now fixed parameters is a good starting point.

ties: I would choose the first (just arbitrary for now)
NaNs: As far as I understand from the discussion this is the case when the dense time series has no observation in the period between two dates of the sparse time series. In this case I would vote for creating a timestep where the raster is completely filled with NaNs.

m-mohr · 2021-06-16T15:54:07Z

@clausmichele @jdries @przell I've tried to clarify the behavior for ties and invalid values. As discussed in the dev telco, I've added a new parameter "valid_within". Does that all make sense for you?

m-mohr · 2021-06-16T16:03:06Z

Will merge end of next week at the latest if nothing major comes in...

clausmichele · 2021-06-17T07:47:22Z

Fine for me, having the valid_within parameter is a good trade off.

przell · 2021-06-17T14:53:09Z

Hi Matthias,
this seems like a straight forward solution so far.
There is one unclear point for me though:
What happens if the user specifies the valid_within parameter larger than the temporal resolution of the target data set? It is not clear to me what happens then. Theoretically a date could be assigned twice. In the case where there is no nearest neighbor within a time step, but the valid_within parameter allows to take one from the next/last time step. (I hope this is clear somehow)

We could specify to not set the valid_within to a higher value than the temporal resolution. Or throw an error.

m-mohr · 2021-06-18T10:05:01Z

Yes, right now I'd assume that it looks beyond the surrounding timesteps so that values could be assigned twice. Should we leave this up to the users to make the right decision and choose the right range? Should we add a warning to the parameter? Such as

Choosing a range that is close to or larger as the temporal resolution may lead to values being assigned to two target dates.

Although assigning values twice may happen anyway, right?

Let's say you have data on the 1st, 5th and 9th and the other cube has data on the 3rd and 7th... What to do? It seems obvious to assign the values twice by default. If you then choose 2 day range you'll get nodata for all values. If you choose 3 days it is assigned twice...

przell · 2021-06-18T11:21:23Z

Ok, I see your point. I never thought about resampling a temporally sparse data cube to a temporally denser one. Then the same dates naturally have to be assigned multiple times.
Is this clearly understandable?:

The function searches for available time steps within the date range given in "valid_within". If there are multiple dates available the closest is chosen. On a tie the first is chosen. If "valid_within" is chosen equal to or larger than the temporal resolution of the target temporal dimension time steps from the source data cube may be assigned multiple times.

And this but it's quite complicated to follow:

When resampling a data cube with a dense temporal dimension (e.g., daily) to a target one with a sparse temporal dimension (e.g., three daily) the duplicated assignment of a time step can be avoided by choosing "valid_within" smaller than the temporal resolution of the target temporal dimension.
When resampling a data cube with a sparse temporal dimension (e.g., three daily) to a target one with a dense temporal dimension (e.g., daily) duplicated assignment of time steps can naturally not be avoided

m-mohr · 2021-06-18T11:40:32Z

The function searches for available time steps within the date range given in "valid_within".

Only applies if valid_within is given, of course.

If "valid_within" is chosen equal to or larger than the temporal resolution of the target temporal dimension time steps from the source data cube may be assigned multiple times.

For the example above, it's already true if the value for valid_within is half the temporal resolution. I'm not sure we can describe this in a concise way. Maybe we just need to say that it may happen that values are assigned multiple times in certain circumstances and give an example.

three daily

🤔 Is that three times per day or every three days?

…s (added new parameter valid_within)

Improve resampling methods, especially temporal resampling #194

8b47ad6

m-mohr added this to the 1.1.0 milestone Apr 27, 2021

m-mohr requested review from soxofaan, neteler, lforesta, aljacob, kempenep and clausmichele April 27, 2021 16:15

m-mohr mentioned this pull request Apr 27, 2021

resample_cube_temporal behavior #194

Closed

m-mohr linked an issue Apr 27, 2021 that may be closed by this pull request

resample_cube_temporal behavior #194

Closed

m-mohr marked this pull request as draft April 27, 2021 16:17

clausmichele reviewed Apr 28, 2021

View reviewed changes

m-mohr added the platform label Apr 28, 2021

m-mohr force-pushed the resampling branch from c369c55 to 351bacd Compare May 31, 2021 10:26

m-mohr added 2 commits May 31, 2021 12:26

resample_cube_temporal: Only use nearest neighbor resampling, make di…

351bacd

…mension parameter less restrictive. #194

Merge branch 'draft' into resampling

c5b50cf

m-mohr marked this pull request as ready for review May 31, 2021 10:30

This was referenced Jun 4, 2021

Release openEO processes v1.1.0 Open-EO/PSC#12

Closed

Release openEO processes v1.1.0 #262

Merged

Merge remote-tracking branch 'origin/draft' into resampling

424d10c

# Conflicts: # tests/.words

Merge branch 'draft' into resampling

1144c68

m-mohr requested review from przell and jdries June 16, 2021 15:54

m-mohr force-pushed the resampling branch from a2a1517 to dc1518a Compare June 16, 2021 15:56

m-mohr requested review from sophieherrmann and removed request for lforesta June 18, 2021 10:06

Clarify resample_cube_temporal behavior wrt to ties and invalid value…

680c553

…s (added new parameter valid_within)

m-mohr force-pushed the resampling branch from dc1518a to 680c553 Compare June 18, 2021 14:14

Mention that timestamps could be used twice

1d747f0

m-mohr merged commit 5e4b5ed into draft Jun 25, 2021

m-mohr deleted the resampling branch June 25, 2021 10:08

zcernigoj mentioned this pull request May 26, 2022

resample_cube_temporal: behaviour when valid_within is provided #371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change resample_cube_temporal, align with GDAL and improve resampling descriptions in general #244

Change resample_cube_temporal, align with GDAL and improve resampling descriptions in general #244

m-mohr commented Apr 27, 2021 •

edited

Loading

clausmichele left a comment

m-mohr commented Apr 28, 2021

m-mohr commented Apr 28, 2021 •

edited

Loading

clausmichele commented Apr 30, 2021 •

edited by m-mohr

Loading

m-mohr commented Apr 30, 2021

clausmichele commented Apr 30, 2021

soxofaan commented Apr 30, 2021

clausmichele commented Apr 30, 2021

m-mohr commented May 31, 2021

clausmichele commented Jun 8, 2021

przell commented Jun 8, 2021

m-mohr commented Jun 16, 2021

m-mohr commented Jun 16, 2021 •

edited

Loading

clausmichele commented Jun 17, 2021

przell commented Jun 17, 2021

m-mohr commented Jun 18, 2021

przell commented Jun 18, 2021

m-mohr commented Jun 18, 2021

Change resample_cube_temporal, align with GDAL and improve resampling descriptions in general #244

Change resample_cube_temporal, align with GDAL and improve resampling descriptions in general #244

Conversation

m-mohr commented Apr 27, 2021 • edited Loading

clausmichele left a comment

Choose a reason for hiding this comment

m-mohr commented Apr 28, 2021

m-mohr commented Apr 28, 2021 • edited Loading

clausmichele commented Apr 30, 2021 • edited by m-mohr Loading

m-mohr commented Apr 30, 2021

clausmichele commented Apr 30, 2021

soxofaan commented Apr 30, 2021

clausmichele commented Apr 30, 2021

m-mohr commented May 31, 2021

clausmichele commented Jun 8, 2021

przell commented Jun 8, 2021

m-mohr commented Jun 16, 2021

m-mohr commented Jun 16, 2021 • edited Loading

clausmichele commented Jun 17, 2021

przell commented Jun 17, 2021

m-mohr commented Jun 18, 2021

przell commented Jun 18, 2021

m-mohr commented Jun 18, 2021

m-mohr commented Apr 27, 2021 •

edited

Loading

m-mohr commented Apr 28, 2021 •

edited

Loading

clausmichele commented Apr 30, 2021 •

edited by m-mohr

Loading

m-mohr commented Jun 16, 2021 •

edited

Loading