[Request] Support nanotime as well as integer64 #1982

eddelbuettel · 2017-01-10T16:39:55Z

With current CRAN versions of everything, the demo in nanotime tickles a warning:

library(nanotime)
suppressMessages(library(data.table))
set.seed(42)
N <- 300
rainyday <- ISOdatetime(2016,9,28,8,30,0) # made up
shinyday <- ISOdatetime(2016,9,21,8,30,0) # made up too
rdsent <- nanotime(rainyday) + cumsum(10*rpois(N, lambda=4)) 	# random sent time 
sdsent <- nanotime(shinyday) + cumsum(10*rpois(N, lambda=4))	# identical sent process for both
rdrecv <- rdsent + 10*rlnorm(N, 0.30, 0.25)                     # postulate higher mean and sd
sdrecv <- sdsent + 10*rlnorm(N, 0.10, 0.20)		            	# for rainy than shiny
raw <- data.table(rdsent, rdrecv, sdsent, sdrecv)
raw[, `:=`(rainy=as.numeric(rdrecv-rdsent),
                shiny=as.numeric(sdrecv-sdsent))]
tfile <- tempfile(pattern="raw", fileext=".csv")
fwrite(raw, file=tfile)
cooked <- fread(tfile)
Warning message:
In fread(tfile) :
  Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.
R>

However, package nanotime brings its own print method. So once we restore types (and there must be a better way I don't know yet) we can print:

R> ## csv files are not 'typed' so need to recover types explicitly
R> cooked[, `:=`(rdsent=nanotime(rdsent),
+               rdrecv=nanotime(rdrecv),
+               sdsent=nanotime(sdsent),
+               sdrecv=nanotime(sdrecv))]
R> all.equal(raw, cooked)
[1] TRUE
R> cooked 
                                 rdsent                              rdrecv                             sdsent                              sdrecv rainy shiny
  1: 2016-09-28T13:30:00.00000007+00:00 2016-09-28T13:30:00.000000083+00:00 2016-09-21T13:30:00.00000004+00:00  2016-09-21T13:30:00.00000005+00:00    13    10
  2: 2016-09-28T13:30:00.00000014+00:00 2016-09-28T13:30:00.000000156+00:00 2016-09-21T13:30:00.00000008+00:00 2016-09-21T13:30:00.000000092+00:00    16    12
  3: 2016-09-28T13:30:00.00000017+00:00 2016-09-28T13:30:00.000000183+00:00 2016-09-21T13:30:00.00000009+00:00 2016-09-21T13:30:00.000000103+00:00    13    13
  4: 2016-09-28T13:30:00.00000023+00:00 2016-09-28T13:30:00.000000246+00:00 2016-09-21T13:30:00.00000012+00:00 2016-09-21T13:30:00.000000133+00:00    16    13
  5: 2016-09-28T13:30:00.00000028+00:00 2016-09-28T13:30:00.000000293+00:00 2016-09-21T13:30:00.00000018+00:00 2016-09-21T13:30:00.000000189+00:00    13     9
 ---                                                                                                                                                          
296: 2016-09-28T13:30:00.00001164+00:00 2016-09-28T13:30:00.000011654+00:00 2016-09-21T13:30:00.00001172+00:00 2016-09-21T13:30:00.000011731+00:00    14    11
297: 2016-09-28T13:30:00.00001169+00:00 2016-09-28T13:30:00.000011702+00:00 2016-09-21T13:30:00.00001176+00:00 2016-09-21T13:30:00.000011774+00:00    12    14
298: 2016-09-28T13:30:00.00001173+00:00 2016-09-28T13:30:00.000011749+00:00 2016-09-21T13:30:00.00001179+00:00   2016-09-21T13:30:00.0000118+00:00    19    10
299: 2016-09-28T13:30:00.00001175+00:00 2016-09-28T13:30:00.000011757+00:00  2016-09-21T13:30:00.0000118+00:00 2016-09-21T13:30:00.000011814+00:00     7    14
300: 2016-09-28T13:30:00.00001176+00:00 2016-09-28T13:30:00.000011777+00:00 2016-09-21T13:30:00.00001184+00:00 2016-09-21T13:30:00.000011859+00:00    17    19
Warning message:
In print.data.table(x) :
  Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.
R>

but

R> format(head(cooked[, 1]))
     rdsent                              
[1,] "2016-09-28T13:30:00.00000007+00:00"
[2,] "2016-09-28T13:30:00.00000014+00:00"
[3,] "2016-09-28T13:30:00.00000017+00:00"
[4,] "2016-09-28T13:30:00.00000023+00:00"
[5,] "2016-09-28T13:30:00.00000028+00:00"
[6,] "2016-09-28T13:30:00.00000032+00:00"
R>

Now, I tried to go into your R/fread.R and make the obvious change:

(*Edited an hour later, had a silly error)

edd@max:~/git/data.table(master)$ git diff R/fread.R
diff --git a/R/fread.R b/R/fread.R
index 8efb633..971e6da 100644
--- a/R/fread.R
+++ b/R/fread.R
@@ -99,7 +99,7 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
     if (is.atomic(colClasses) && !is.null(names(colClasses))) colClasses = tapply(names(colClasses),colClasses,c,simplify=FALSE) # named vector handling
     ans = .Call(Creadfile,input,sep,as.integer(nrows),header,na.strings,verbose,as.integer(autostart),skip,select,drop,colClasses,integer64,dec,encoding,quote,strip.white,blank.lines.skip,fill,showProgress)
     nr = length(ans[[1]])
-    if ( integer64=="integer64" && !exists("print.integer64") && any(sapply(ans,inherits,"integer64")) )
+    if ( integer64=="integer64" && !(exists("print.integer64") || "package:nanotime" %in% search())  && any(sapply(ans,inherits,"integer64")) )
         warning("Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.")
     setattr(ans,"row.names",.set_row_names(nr))
 
edd@max:~/git/data.table(master)$

I get the desired behaviour:

edd@max:/tmp$ r -ldata.table -e'ck <- fread("raw.csv")'           # just load data.table, get warning
Warning message:
In fread("ck2.csv") :
  Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.
edd@max:/tmp$ r -ldata.table,nanotime -e'ck <- fread("raw.csv")'   ## load nantime as well, silent
edd@max:/tmp$

I could open a PR if you think that (currently still fringe) nanotime package is worth a change.

The text was updated successfully, but these errors were encountered:

eddelbuettel · 2017-01-10T17:37:56Z

(In case you consume issue by email rather than on the web: I had a silly mistake in my suggested change which I fixed, so please see the updated post on the website.)

…rather than warning asking user to. #1982

mattdowle added this to the v1.10.2 milestone Jan 17, 2017

mattdowle added the fwrite label Jan 25, 2017

mattdowle added a commit that referenced this issue Jan 26, 2017

bit64's namespace now auto loaded when integer64 columns are present …

b547e70

…rather than warning asking user to. #1982

mattdowle closed this as completed in 9905e9d Jan 28, 2017

mattdowle modified the milestones: Candidate, v1.10.2 Jan 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] Support nanotime as well as integer64 #1982

[Request] Support nanotime as well as integer64 #1982

eddelbuettel commented Jan 10, 2017 •

edited by mattdowle

Loading

eddelbuettel commented Jan 10, 2017

[Request] Support nanotime as well as integer64 #1982

[Request] Support nanotime as well as integer64 #1982

Comments

eddelbuettel commented Jan 10, 2017 • edited by mattdowle Loading

eddelbuettel commented Jan 10, 2017

eddelbuettel commented Jan 10, 2017 •

edited by mattdowle

Loading