-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify temporal intervals #331 #394
Conversation
…ance in time must be after the first instance in time. #331
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this process allows the value '24' for the hour of an end time in order to make it possible that left-closed time intervals can fully cover the day.
What's the difference in writing an interval like ["2017-01-01T00:00:00Z", "2017-01-02T00:00:00Z"] to define a full 24h day?
Including this particular case that uses T24:00:00Z seems too complicated and not clearer in my opinion.
@clausmichele Good question. I had to think quite a bit about it and I can't remember the exact reasons we had back then, but it clearly originates from ISO8601, but it seems they are also struggling with it:
So it was there in the beginning, got removed and then added back again. I assume their reasons for allowing Without knowing their reasoning (ISO standards must be bought 🤮), I found some edge cases where it may matter and surprise users:
Has someone access to the ISO documents? I'd like to read about the 24:00:00 changes because I agree that ideally we could get rid of this special handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@m-mohr were you able to access the ISO documents? |
Yes! Thanks to @mkadunc for pointing me into the right direction and also providing his point of view. 24 as the hour
But: I just realized that JSON Schema for the format Incl./excl. upper boundaryThe definition of intervals is including the upper boundary, but other parts of the standard allow excluding the upper boundary, so here the standard is not very stringent and helpful. To avoid breaking everyones implementations, we should probably keep excluding it. Otherwise, users could get different results as now there's an additional day included. Missing parts of a date-time (e.g. a date)In ISO8601 it seems that omitted time components should simply be replaced by zero ( ConclusionWell, it's hard to come to a conclusion here as there's so many different aspects to consider, but we likely need to "break" someones implementation as VITO seems to be contraty to all other implementations. So to come up with a precise definition, someone needs to bite the bullet anyway. We could go the following route:
|
For me it would also be fine to change the API and make the upper boundary included, since it caused many headaches in the past and it would be easier to understand. However, we need to clarify it in the best way. If we change the definition I see mostly two cases: |
As said above, I don't think this is a good idea. Changing this would make data unreproducible as the now included day would intefere. Results would change without users noticing it. |
Well, all the users using the VITO back-end wouldn't notice any difference actually, since they already return the upper bound as well. Maybe it's something to discuss in the next dev telco? |
Is it possible to define a union type for the
I agree — either overload the process to allow a single parameter, or provide a helper function such as
Another option would be to speficy "the full range" with two identical instants - |
@clausmichele But there's not only the VITO implementation. We can change a lot here, but I think the exclusive upper bounds are somewhat set in stone for now due to the reproducibility issue...
@mkadunc We could, but it would change the schemas quite a bit and make them more complex so I'd try to avoid that. But we have the temporal-interval subtype which we can use and then it's "only" the comparison operators which may need a clarifying word in the parameter description, I think. Although, should
Hmm... something to look into, the aggregate_temporal spec is quite old and I think there's no actual user (and implementation?) for the time part right now. |
The more you go through the processes to make changes, the worse it gets. |
…terval subtype from climatological_normal (as it has inclusive upper boundaries)
If we want to stick to JSON as the only true context for openEO types, then we have to go with the fact that these are all JSON strings and therefore return If we want, we could define date and time types for openEO in more detail, in which case we could either go with "date- or time-like JSON strings represent intervals", like this:
... or we could specify that "all date- or time-like JSON strings represent single instants" (in line with ISO 8601 interpretation), in which case we have:
Intuitively, the first option would be more correct, but it does add some complexity to the whole system (all date- and time-like parameters would be intervals rather than single scalars) |
The base branch was changed.
# Conflicts: # proposals/load_result.json
Ready for review! |
@@ -112,31 +112,27 @@ | |||
}, | |||
{ | |||
"name": "temporal_extent", | |||
"description": "Limits the data to load from the collection to the specified left-closed temporal interval. Applies to all temporal dimensions. The interval has to be specified as an array with exactly two elements:\n\n1. The first element is the start of the temporal interval. The specified instance in time is **included** in the interval.\n2. The second element is the end of the temporal interval. The specified instance in time is **excluded** from the interval.\n\nThe specified temporal strings follow [RFC 3339](https://www.rfc-editor.org/rfc/rfc3339.html). Also supports open intervals by setting one of the boundaries to `null`, but never both.\n\nSet this parameter to `null` to set no limit for the temporal extent. Be careful with this when loading large datasets! It is recommended to use this parameter instead of using ``filter_temporal()`` directly after loading unbounded data.", | |||
"description": "Limits the data to load from the collection to the specified left-closed temporal interval. Applies to all temporal dimensions. The interval has to be specified as an array with exactly two elements:\n\n1. The first element is the start of the temporal interval. The specified time instant is **included** in the interval.\n2. The second element is the end of the temporal interval. The specified time instant is **excluded** from the interval.\n\nThe second element must always be greater/later than the first element. Otherwise, a `TemporalExtentEmpty` exception is thrown.\n\nAlso supports unbounded intervals by setting one of the boundaries to `null`, but never both.\n\nSet this parameter to `null` to set no limit for the temporal extent. Be careful with this when loading large datasets! It is recommended to use this parameter instead of using ``filter_temporal()`` directly after loading unbounded data.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supports unbounded intervals by setting one of the boundaries to
null
, but never both
Why forbidding to set both ends to null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the same as just setting the whole parameter to null. Why having two ways for the same thing?
{ | ||
"id": "date_between", | ||
"summary": "Between comparison for dates and times", | ||
"description": "By default, this process checks whether `x` is later than or equal to `min` and before or equal to `max`.\n\nIf `exclude_max` is set to `true` the upper bound is excluded so that the process checks whether `x` is later than or equal to `min` and before `max`.\n\nLower and upper bounds are not allowed to be swapped. So `min` MUST be before or equal to `max` or otherwise the process always returns `false`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about time-only intervals crossing midnight?
between("01:00:00", min="22:00:00", max="04:00:00)
: invalid, false or true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say invalid due to
Lower and upper bounds are not allowed to be swapped. So
min
MUST be before or equal tomax
or otherwise the process always returnsfalse
.
Co-authored-by: Stefaan Lippens <[email protected]>
Has anyone the capacity to review this? @dthiex @soxofaan @clausmichele @mkadunc @LukeWeidenwalker |
It seems the implementations tend more towards the approach that the two elements in the intervals should not be the same, so I created a PR that proposes this breaking change for v2.0.0.
Fixes #331