Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Support nanotime as well as integer64 #1982

Closed
eddelbuettel opened this issue Jan 10, 2017 · 1 comment
Closed

[Request] Support nanotime as well as integer64 #1982

eddelbuettel opened this issue Jan 10, 2017 · 1 comment
Labels
Milestone

Comments

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Jan 10, 2017

With current CRAN versions of everything, the demo in nanotime tickles a warning:

library(nanotime)
suppressMessages(library(data.table))
set.seed(42)
N <- 300
rainyday <- ISOdatetime(2016,9,28,8,30,0) # made up
shinyday <- ISOdatetime(2016,9,21,8,30,0) # made up too
rdsent <- nanotime(rainyday) + cumsum(10*rpois(N, lambda=4)) 	# random sent time 
sdsent <- nanotime(shinyday) + cumsum(10*rpois(N, lambda=4))	# identical sent process for both
rdrecv <- rdsent + 10*rlnorm(N, 0.30, 0.25)                     # postulate higher mean and sd
sdrecv <- sdsent + 10*rlnorm(N, 0.10, 0.20)		            	# for rainy than shiny
raw <- data.table(rdsent, rdrecv, sdsent, sdrecv)
raw[, `:=`(rainy=as.numeric(rdrecv-rdsent),
                shiny=as.numeric(sdrecv-sdsent))]
tfile <- tempfile(pattern="raw", fileext=".csv")
fwrite(raw, file=tfile)
cooked <- fread(tfile)
Warning message:
In fread(tfile) :
  Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.
R> 

However, package nanotime brings its own print method. So once we restore types (and there must be a better way I don't know yet) we can print:

R> ## csv files are not 'typed' so need to recover types explicitly
R> cooked[, `:=`(rdsent=nanotime(rdsent),
+               rdrecv=nanotime(rdrecv),
+               sdsent=nanotime(sdsent),
+               sdrecv=nanotime(sdrecv))]
R> all.equal(raw, cooked)
[1] TRUE
R> cooked 
                                 rdsent                              rdrecv                             sdsent                              sdrecv rainy shiny
  1: 2016-09-28T13:30:00.00000007+00:00 2016-09-28T13:30:00.000000083+00:00 2016-09-21T13:30:00.00000004+00:00  2016-09-21T13:30:00.00000005+00:00    13    10
  2: 2016-09-28T13:30:00.00000014+00:00 2016-09-28T13:30:00.000000156+00:00 2016-09-21T13:30:00.00000008+00:00 2016-09-21T13:30:00.000000092+00:00    16    12
  3: 2016-09-28T13:30:00.00000017+00:00 2016-09-28T13:30:00.000000183+00:00 2016-09-21T13:30:00.00000009+00:00 2016-09-21T13:30:00.000000103+00:00    13    13
  4: 2016-09-28T13:30:00.00000023+00:00 2016-09-28T13:30:00.000000246+00:00 2016-09-21T13:30:00.00000012+00:00 2016-09-21T13:30:00.000000133+00:00    16    13
  5: 2016-09-28T13:30:00.00000028+00:00 2016-09-28T13:30:00.000000293+00:00 2016-09-21T13:30:00.00000018+00:00 2016-09-21T13:30:00.000000189+00:00    13     9
 ---                                                                                                                                                          
296: 2016-09-28T13:30:00.00001164+00:00 2016-09-28T13:30:00.000011654+00:00 2016-09-21T13:30:00.00001172+00:00 2016-09-21T13:30:00.000011731+00:00    14    11
297: 2016-09-28T13:30:00.00001169+00:00 2016-09-28T13:30:00.000011702+00:00 2016-09-21T13:30:00.00001176+00:00 2016-09-21T13:30:00.000011774+00:00    12    14
298: 2016-09-28T13:30:00.00001173+00:00 2016-09-28T13:30:00.000011749+00:00 2016-09-21T13:30:00.00001179+00:00   2016-09-21T13:30:00.0000118+00:00    19    10
299: 2016-09-28T13:30:00.00001175+00:00 2016-09-28T13:30:00.000011757+00:00  2016-09-21T13:30:00.0000118+00:00 2016-09-21T13:30:00.000011814+00:00     7    14
300: 2016-09-28T13:30:00.00001176+00:00 2016-09-28T13:30:00.000011777+00:00 2016-09-21T13:30:00.00001184+00:00 2016-09-21T13:30:00.000011859+00:00    17    19
Warning message:
In print.data.table(x) :
  Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.
R> 

but

R> format(head(cooked[, 1]))
     rdsent                              
[1,] "2016-09-28T13:30:00.00000007+00:00"
[2,] "2016-09-28T13:30:00.00000014+00:00"
[3,] "2016-09-28T13:30:00.00000017+00:00"
[4,] "2016-09-28T13:30:00.00000023+00:00"
[5,] "2016-09-28T13:30:00.00000028+00:00"
[6,] "2016-09-28T13:30:00.00000032+00:00"
R> 

Now, I tried to go into your R/fread.R and make the obvious change:

(*Edited an hour later, had a silly error)

edd@max:~/git/data.table(master)$ git diff R/fread.R
diff --git a/R/fread.R b/R/fread.R
index 8efb633..971e6da 100644
--- a/R/fread.R
+++ b/R/fread.R
@@ -99,7 +99,7 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
     if (is.atomic(colClasses) && !is.null(names(colClasses))) colClasses = tapply(names(colClasses),colClasses,c,simplify=FALSE) # named vector handling
     ans = .Call(Creadfile,input,sep,as.integer(nrows),header,na.strings,verbose,as.integer(autostart),skip,select,drop,colClasses,integer64,dec,encoding,quote,strip.white,blank.lines.skip,fill,showProgress)
     nr = length(ans[[1]])
-    if ( integer64=="integer64" && !exists("print.integer64") && any(sapply(ans,inherits,"integer64")) )
+    if ( integer64=="integer64" && !(exists("print.integer64") || "package:nanotime" %in% search())  && any(sapply(ans,inherits,"integer64")) )
         warning("Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.")
     setattr(ans,"row.names",.set_row_names(nr))
 
edd@max:~/git/data.table(master)$ 

I get the desired behaviour:

edd@max:/tmp$ r -ldata.table -e'ck <- fread("raw.csv")'           # just load data.table, get warning
Warning message:
In fread("ck2.csv") :
  Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.
edd@max:/tmp$ r -ldata.table,nanotime -e'ck <- fread("raw.csv")'   ## load nantime as well, silent
edd@max:/tmp$ 

I could open a PR if you think that (currently still fringe) nanotime package is worth a change.

@eddelbuettel
Copy link
Contributor Author

(In case you consume issue by email rather than on the web: I had a silly mistake in my suggested change which I fixed, so please see the updated post on the website.)

@mattdowle mattdowle added this to the v1.10.2 milestone Jan 17, 2017
mattdowle added a commit that referenced this issue Jan 26, 2017
@mattdowle mattdowle modified the milestones: Candidate, v1.10.2 Jan 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants