IDateTime could optionally track different time zones for different rows #3160

MichaelChirico · 2018-11-26T06:10:14Z

simple base R time objects (POSIXct) are ill-suited to handling regional datasets (tzone attribute is atomic); similarly, POSIXlt objects are overly broad (every component is a vector).

What's really needed is a structure that's just a little bit more general than that offered by IDateTime at the moment:

idate itime tz
17861 32415  1
17861 32415  2

Where I'm suppressing the print methods for each column to show the simple integer structure of each column.

idate and itime are as before, but the objects (optionally? NA by default?) gain a tz column stored as a factor whose labels name different OlsonNames() time zones, e.g.

tz = factor(c('Asia/Singapore', 'Asia/Jakarta'))

Related methods would then handle printing/formatting/etc appropriately accounting for the tz column.

Related: #2402, #1451

The text was updated successfully, but these errors were encountered:

jangorecki · 2018-11-26T06:27:46Z

Not sure if it make sense to focus on that. Complexity of handling different tz for different rows is quite significant, all operators would need to handle that. I found most reliable and straightforward to just dump all times to UTC at the beginning and apply tz at the end when producing report.

MichaelChirico · 2018-11-26T06:31:16Z

So far I've been doing things by = tz; "producing a report" is exactly the sort of functionality I have in mind, but also, filtering rows based on local time (x[hour_local == 9L], e.g.).

Storage will remain UTC-based integer representation throughout, only functions like wday and format would be affected.

Functionality I have in in mind handles this logic internally to be less of a fuss. I'll add Low label since it's a nice-to-have.

mattdowle · 2020-07-14T02:13:20Z

If there are mixed timezones in an input dataset, it's best practice to convert everything to consistent UTC up front. Leaving them mixed in the same column is going to lead to bugs really quickly, slow it down even if it works, and at best result in a new class that can't be passed to other packages.
If the use case is really to do something like x[hour_local == 9L] then a column can be added for that by the user using plain integer, or ITime or similar. The tz column can be added by users in their dataset just as you showed, too. I don't think it needs to be built into data.table, or would be advantageous at all to build it into data.table. I have seen datasets where the local datetime and the UTC datetime are two separate columns; both are in UTC but the column name of the local time one (e.g. timestamp_local) is clear that it really represents local time. That way queries are a bit clearer because you can see what the timestamp is from the column name without needing to look at its type.
If you agree, can we close this issue?

MichaelChirico · 2020-07-15T15:10:20Z

I agree that it's a bit niche and hard to maintain. I do think it's quite useful e.g. for building regression models, but I guess the main potential advantage -- skipping over POSIXlt conversion on a huge dataset -- wouldn't really be brought by the proposed change, at least without substantial effort & underlying code.

It's worth mentioning that both PostgreSQL and Presto support the TIMESTAMP WITH TIMEZONE column type, which can have timestamps vary by row, which I think provides good evidence of its utility. I believe the implementation is similar to R, where the core storage type is related to UTC, and the time zone really only comes into play for formatting & extractions.

I'll close this with a note that it would be useful for a different add-on package to support such a thing.

MichaelChirico changed the title ~~IDateTIme could optionally track different time zones for different rows~~ IDateTime could optionally track different time zones for different rows Nov 26, 2018

MichaelChirico added the Low label Nov 26, 2018

MichaelChirico mentioned this issue Jul 13, 2020

Add support for native parsing of iso8601 dates/timestamps in fread #4464

Merged

11 tasks

MichaelChirico closed this as completed Jul 15, 2020

MichaelChirico added the idate/itime label Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDateTime could optionally track different time zones for different rows #3160

IDateTime could optionally track different time zones for different rows #3160

MichaelChirico commented Nov 26, 2018

jangorecki commented Nov 26, 2018

MichaelChirico commented Nov 26, 2018 •

edited

Loading

mattdowle commented Jul 14, 2020 •

edited

Loading

MichaelChirico commented Jul 15, 2020

IDateTime could optionally track different time zones for different rows #3160

IDateTime could optionally track different time zones for different rows #3160

Comments

MichaelChirico commented Nov 26, 2018

jangorecki commented Nov 26, 2018

MichaelChirico commented Nov 26, 2018 • edited Loading

mattdowle commented Jul 14, 2020 • edited Loading

MichaelChirico commented Jul 15, 2020

MichaelChirico commented Nov 26, 2018 •

edited

Loading

mattdowle commented Jul 14, 2020 •

edited

Loading