Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Just use base functionality for isoweek? #5111

Open
MichaelChirico opened this issue Aug 24, 2021 · 3 comments
Open

Just use base functionality for isoweek? #5111

MichaelChirico opened this issue Aug 24, 2021 · 3 comments

Comments

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 24, 2021

Happened to notice that format((Date), "%V") gives identical results to our isoweek, and it's a decent amount faster. Should we just migrate our backend to use base?

e.g.

x = rep(Sys.Date() - 0:5000, 1000)

system.time(out1 <- isoweek(x))
#    user  system elapsed 
# 12.428  21.093  34.376 
system.time(out2 <- as.integer(format(x, '%V')))
#    user  system elapsed 
#   4.346   0.221   4.685 

identical(out1, out2)
# [1] TRUE

Or should we try and optimize it better?

@mattdowle
Copy link
Member

mattdowle commented Aug 25, 2021

First thought is that investigating the root cause (which seems to be as.IDate) would fix more than isoweek, and then we could see where we are with isoweek after that.
Possible related: #2503

> x = rep(Sys.Date() - 0:5000, 1000)
> system.time(out2 <- as.integer(format(x, '%V')))
   user  system elapsed 
  1.316   0.000   1.316 
> system.time(out1 <- isoweek(x))
   user  system elapsed 
  4.962   2.903   7.866    # aside: surprised my timings appear 4x faster than yours
> Rprof()
> system.time(out1 <- isoweek(x))
   user  system elapsed 
  4.701   2.984   7.685 
> Rprof(NULL)
> summaryRprof()
$by.self
                  self.time self.pct total.time total.pct
"strptime"             6.08    78.96       6.08     78.96
"format.POSIXlt"       0.76     9.87       0.76      9.87
"as.Date.POSIXlt"      0.40     5.19       0.40      5.19
"as.POSIXlt.Date"      0.32     4.16       0.32      4.16
"%/%"                  0.04     0.52       0.04      0.52
"is.na<-.default"      0.04     0.52       0.04      0.52
"-.IDate"              0.02     0.26       0.02      0.26
"gc"                   0.02     0.26       0.02      0.26
"setattr"              0.02     0.26       0.02      0.26

$by.total
                    total.time total.pct self.time self.pct
"system.time"             7.70    100.00      0.00     0.00
"isoweek"                 7.68     99.74      0.00     0.00
"as.IDate"                7.64     99.22      0.00     0.00
"as.IDate.default"        6.52     84.68      0.00     0.00
"as.Date.character"       6.50     84.42      0.00     0.00
"as.Date"                 6.50     84.42      0.00     0.00
"charToDate"              6.10     79.22      0.00     0.00
"strptime"                6.08     78.96      6.08    78.96
"format.Date"             1.08     14.03      0.00     0.00
"format"                  1.08     14.03      0.00     0.00
"format.POSIXlt"          0.76      9.87      0.76     9.87
"as.Date.POSIXlt"         0.40      5.19      0.40     5.19
"as.POSIXlt.Date"         0.32      4.16      0.32     4.16
"as.POSIXlt"              0.32      4.16      0.00     0.00
"%/%"                     0.04      0.52      0.04     0.52
"is.na<-.default"         0.04      0.52      0.04     0.52
"is.na<-"                 0.04      0.52      0.00     0.00
"-.IDate"                 0.02      0.26      0.02     0.26
"gc"                      0.02      0.26      0.02     0.26
"setattr"                 0.02      0.26      0.02     0.26
"+.IDate"                 0.02      0.26      0.00     0.00

$sample.interval
[1] 0.02

$sampling.time
[1] 7.7

@zkamvar
Copy link

zkamvar commented Sep 4, 2021

Hello, I was lurking in the issues to see how y'all were fixing #5122 (I'm also one of the lucky ones to catch that bug) and I noticed this issue right below it and want to point out that "%V" may not be a recognized format on Windows (at least older systems anyways): https://stackoverflow.com/q/47509567

@bastistician
Copy link

"%V" may not be a recognized format on Windows (at least older systems anyways): https://stackoverflow.com/q/47509567

I think this information is outdated. The "%V" format string is widely supported since R 3.1.0, even on Windows. The corresponding NEWS item in R 3.1.0 was:

(On Windows, by default on OS X and optionally elsewhere.) The system C function strftime has been replaced by a more comprehensive version with closer conformance to the POSIX 2008 standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants