Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] change default time boundaries #4461

Closed
beckettsean opened this issue Oct 15, 2015 · 15 comments
Closed

[feature request] change default time boundaries #4461

beckettsean opened this issue Oct 15, 2015 · 15 comments
Assignees
Milestone

Comments

@beckettsean
Copy link
Contributor

To address #2703, #3369, #3789, and to clear up confusion about https://influxdb.com/docs/v0.9/troubleshooting/frequently_encountered_issues.html#querying-after-now, it would make sense to change the default lower bound from epoch 0 to the smallest timestamp and the default upper bound from now() to the largest possible timestamp.

I believe the smallest is -9023372036854775808, although #3367 means that's not valid right now. The largest timestamp is 9023372036854775807. These were derived from testing, not from parsing code.

@gunnaraasen
Copy link
Member

Some more context, we only store timestamps as the nanoseconds in int64 from the epoch. This means the earliest (minimum) time we can store is

time.Unix(0, -int64(1<<63-1))
-9223372036854775807
1677-09-21 00:12:43.145224193 +0000 UTC

and the latest (maximum) time is

time.Unix(0, int64(1<<63-1))
9223372036854775807
2262-04-11 23:47:16.854775807 +0000 UTC

@DanielMorsing
Copy link
Contributor

The main reason we have now as a boundary is because we don't want to generate time buckets until the end of time for aggregate queries.

We could probably do something where we check whether it's an aggregate query or not and only bound by now if it is.

@beckettsean
Copy link
Contributor Author

related to #5089

@beckettsean
Copy link
Contributor Author

@pauldix @jwilder I assume we aren't changing the default time boundaries for 1.0, but wanted to get verification of that.

@jsternberg
Copy link
Contributor

I don't think we're going to do this so I'm closing this issue.

@beckettsean
Copy link
Contributor Author

@jsternberg I agree we aren't doing this for 1.0, but are we making a conscious choice that we will never do this? It definitely still leads to some confusion for users, so I'd love to have an actual discussion of the tradeoffs.

@jsternberg
Copy link
Contributor

I can add this to TODO.md if we want to come back to it in the future.

@beckettsean
Copy link
Contributor Author

I do think it's worth making a considered choice on it, rather than the evolved default we have now. Thanks, Jonathan.

@jsternberg
Copy link
Contributor

@jwilder what was our final conclusion on implementing this? If we set the default maximum time to MaxTime, then the number of buckets for fill will go out of hand and the default query that currently works will stop working. If we clamp this down to the end time of the latest shard, we still have the default end time going past the now() time. What are the conditions for how we are going to clamp the time?

@e-dard
Copy link
Contributor

e-dard commented Oct 21, 2016

@jsternberg The default maximum time should be as far into the future as is necessary to return all matching points in the system. The general consensus was that the end of the relevant shard groups for the query would be the appropriate maximum default time.

@jsternberg
Copy link
Contributor

@e-dard but what should the end time for the fill iterator be? If you have the query:

SELECT mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)

This query will return points for each 1 minute interval for potentially 7 days with the default shard duration.

@e-dard
Copy link
Contributor

e-dard commented Oct 21, 2016

@jsternberg I see. So if we make this change it's going to change the behaviour of that query.

Is there a way to only fill only while there are points that would contribute to the aggregate value? That is to say, as soon as there are no more points to aggregate we stop emitting values?

@jsternberg
Copy link
Contributor

Yes. There is. I can try doing that. Might be more difficult, but I can try and get fill to not fill in any values that are from an implicit time and only ones that are an explicit time.

@jsternberg
Copy link
Contributor

I ran into an issue where I'm not sure what the answer should be. So if you don't specify an end time, we're supposed to implicitly not fill in ending values. But what happens when we have something like this? (assume that all times are in seconds, not nanoseconds)

cpu,host=server01 value=1 10
cpu,host=server02 value=2 10
cpu,host=server02 value=3 20

> SELECT mean(value) FROM cpu WHERE time >= 10 GROUP BY host, time(10s)
name: cpu
tags: host=server01
time    value
----    -----
10      1

name: cpu
tags: host=server02
time    value
----    -----
10      2
20      3

Should the first one return a row for 20?

@jsternberg
Copy link
Contributor

Never mind. We're just going to do what Daniel said and make the end time now() when there is a GROUP BY time(...) and the end time otherwise. We're overthinking this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants