Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun repeated uniqueN test #3438

Closed
mattdowle opened this issue Mar 1, 2019 · 2 comments · Fixed by #4484
Closed

Rerun repeated uniqueN test #3438

mattdowle opened this issue Mar 1, 2019 · 2 comments · Fixed by #4484
Milestone

Comments

@mattdowle
Copy link
Member

Was reported here : #3395 (comment)
I said I'd follow up here: #3435 (comment)
Double check all those results under the new default.

@jangorecki
Copy link
Member

jangorecki commented Apr 7, 2020

Up to date timings below. There is still a problem in uniqueN by group.
We basically need what @mattdowle described in #3743 (comment) and #1120 would also be nice.

busy machine (20 cores)

#[1] 10
#   user  system elapsed 
#172.084   1.640  18.179 

#[1] 1
#   user  system elapsed 
# 10.608   0.596  11.206 

#[1] 20
#still computing

idle machine (32 cores)

#[1] 16
#   user  system elapsed 
#258.708   0.876  16.638  

#[1] 1
#   user  system elapsed 
#  8.405   0.132   8.539 

#[1] 32
#    user   system  elapsed 
#1066.565    1.180   35.085 

code

library(data.table)
N_X = 1e6
n_day = 60
n_clientid = 1e5
n_Platform = 7
X = data.table(
  day = sample(1:n_day, N_X, TRUE),
  clientid = as.character(sample(1:n_clientid, N_X, TRUE)),
  Platform = as.character(sample(1:n_Platform, N_X, TRUE))
)

setDTthreads(NULL) # default
getDTthreads()
system.time(
  X[, .(x = uniqueN(day) - 1L,
        first_active_day = min(day),
        last_active_day = max(day)),
    by = .(Platform, clientid)]
)

setDTthreads(1)
getDTthreads()
system.time(
  X[, .(x = uniqueN(day) - 1L,
        first_active_day = min(day),
        last_active_day = max(day)),
    by = .(Platform, clientid)]
)

setDTthreads(0)
getDTthreads()
system.time(
  X[, .(x = uniqueN(day) - 1L,
        first_active_day = min(day),
        last_active_day = max(day)),
    by = .(Platform, clientid)]
)

@mattdowle
Copy link
Member Author

Closed by #4484. See benchmark near the top of the top comment here: #4484 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants