Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded group by #2533

Closed
xiaodaigh opened this issue Dec 20, 2017 · 5 comments
Closed

Multi-threaded group by #2533

xiaodaigh opened this issue Dec 20, 2017 · 5 comments

Comments

@xiaodaigh
Copy link

xiaodaigh commented Dec 20, 2017

I wonder if it's technically possible to do multi-threaded group-by? I tested some multi-threading group by in Julia using a divide and conquer algorithm and I can make sum-by faster. So if data.table has multi-threaded group-by then things should speed up even more

@MichaelChirico
Copy link
Member

MichaelChirico commented Dec 21, 2017

I thought this was implemented already, but apparently not:

library(data.table)

NN = 1e9
set.seed(304093)
# poisson returns integer, 1 billion rows
#   really pushed the limit of my local RAM
DT = data.table(grp = sample(8L, NN, TRUE),
                V = rpois(NN, 10), key = 'grp')
(default_threads = getDTthreads())
# [1] 8

system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#   9.151   7.717  18.624  

setDTthreads(1)

system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#   8.796   3.153  13.275 

setDTthreads(default_threads)

@mattdowle is it not strange that the "unthreaded" version is in fact going faster (i found this pretty consistently)?

@jangorecki
Copy link
Member

jangorecki commented Sep 20, 2018

library(data.table)
# data.table 1.11.6  Latest news: r-datatable.com

NN = 1e9
set.seed(304093)
# poisson returns integer, 1 billion rows
#   really pushed the limit of my local RAM
DT = data.table(grp = sample(8L, NN, TRUE),
                V = rpois(NN, 10), key = 'grp')

(default_threads = getDTthreads())
# [1] 32

system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#  6.098   1.007   7.115 

setDTthreads(1)
 
system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#   5.674   1.067   6.748 

@MichaelChirico could you retry on your machine to see if you still experience slowdown (after swapping order of calls also)? Second call can eventually be a little bit faster due to caching (so my timings above are close to equal). Grouping is not parallelized but we still need to track slowdown in cases you presented. Recent improvements to gfunctions might have fixed that.

@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 20, 2018

Hmm I got

Error: vector memory exhausted (limit reached?)

I may have run that from my other machine (more RAM)

@mattdowle
Copy link
Member

mattdowle commented Sep 22, 2018

Please can we have a new policy that when posting timings or benchmarks,. verbose=TRUE should be set and the output included in the issue. The verbose output includes some timings of some operations and it really helps to see it up front. I'm looking at the result 9 months ago above and wondering if uniqlist was significant in those. There is a timing in the verbose output for that. And uniqlist was recently improved.

@jangorecki
Copy link
Member

jangorecki commented Jan 29, 2019

According to timings below the issue is already resolved by rework of subset by Matt, already published in 1.12.0.

Code to reproduce

library(data.table)
NN = 1e9
set.seed(304093)
DT = data.table(grp = sample(8L, NN, TRUE),
                V = rpois(NN, 10), key = 'grp')
(default_threads = getDTthreads())
system.time(DT[ , mean(V), keyby = grp])
setDTthreads(1)
system.time(DT[ , mean(V), keyby = grp])

Results

machine A (20 cores)

current latest 1.12.1

> (default_threads = getDTthreads())
[1] 20
> 
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
 49.004   4.952   4.818 
> 
> setDTthreads(1)
>  
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  9.702   3.120  12.822 

1.11.6

> (default_threads = getDTthreads())
[1] 20
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  6.533   1.244   7.777 
> setDTthreads(1)
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  6.543   1.224   7.767 

machine B (40 cores)

current latest 1.12.1

> (default_threads = getDTthreads())
[1] 40
> 
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
 63.085   9.026   4.519 
> 
> setDTthreads(1)
>  
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
 10.810   4.994  15.805 

1.11.6

> (default_threads = getDTthreads())
[1] 40
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  7.290   2.008   9.298 
> setDTthreads(1)
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  7.582   2.264   9.847 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants