Multi-threaded group by #2533

xiaodaigh · 2017-12-20T20:12:49Z

I wonder if it's technically possible to do multi-threaded group-by? I tested some multi-threading group by in Julia using a divide and conquer algorithm and I can make sum-by faster. So if data.table has multi-threaded group-by then things should speed up even more

MichaelChirico · 2017-12-21T01:46:30Z

I thought this was implemented already, but apparently not:

library(data.table)

NN = 1e9
set.seed(304093)
# poisson returns integer, 1 billion rows
#   really pushed the limit of my local RAM
DT = data.table(grp = sample(8L, NN, TRUE),
                V = rpois(NN, 10), key = 'grp')
(default_threads = getDTthreads())
# [1] 8

system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#   9.151   7.717  18.624  

setDTthreads(1)

system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#   8.796   3.153  13.275 

setDTthreads(default_threads)

@mattdowle is it not strange that the "unthreaded" version is in fact going faster (i found this pretty consistently)?

jangorecki · 2018-09-20T05:56:31Z

library(data.table)
# data.table 1.11.6  Latest news: r-datatable.com

NN = 1e9
set.seed(304093)
# poisson returns integer, 1 billion rows
#   really pushed the limit of my local RAM
DT = data.table(grp = sample(8L, NN, TRUE),
                V = rpois(NN, 10), key = 'grp')

(default_threads = getDTthreads())
# [1] 32

system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#  6.098   1.007   7.115 

setDTthreads(1)
 
system.time(DT[ , mean(V), keyby = grp])
#    user  system elapsed 
#   5.674   1.067   6.748

@MichaelChirico could you retry on your machine to see if you still experience slowdown (after swapping order of calls also)? Second call can eventually be a little bit faster due to caching (so my timings above are close to equal). Grouping is not parallelized but we still need to track slowdown in cases you presented. Recent improvements to gfunctions might have fixed that.

MichaelChirico · 2018-09-20T06:37:11Z

Hmm I got

Error: vector memory exhausted (limit reached?)

I may have run that from my other machine (more RAM)

mattdowle · 2018-09-22T05:54:58Z

Please can we have a new policy that when posting timings or benchmarks,. verbose=TRUE should be set and the output included in the issue. The verbose output includes some timings of some operations and it really helps to see it up front. I'm looking at the result 9 months ago above and wondering if uniqlist was significant in those. There is a timing in the verbose output for that. And uniqlist was recently improved.

jangorecki · 2019-01-29T04:03:00Z

According to timings below the issue is already resolved by rework of subset by Matt, already published in 1.12.0.

Code to reproduce

library(data.table)
NN = 1e9
set.seed(304093)
DT = data.table(grp = sample(8L, NN, TRUE),
                V = rpois(NN, 10), key = 'grp')
(default_threads = getDTthreads())
system.time(DT[ , mean(V), keyby = grp])
setDTthreads(1)
system.time(DT[ , mean(V), keyby = grp])

Results

machine A (20 cores)

current latest 1.12.1

> (default_threads = getDTthreads())
[1] 20
> 
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
 49.004   4.952   4.818 
> 
> setDTthreads(1)
>  
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  9.702   3.120  12.822

1.11.6

> (default_threads = getDTthreads())
[1] 20
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  6.533   1.244   7.777 
> setDTthreads(1)
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  6.543   1.224   7.767

machine B (40 cores)

current latest 1.12.1

> (default_threads = getDTthreads())
[1] 40
> 
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
 63.085   9.026   4.519 
> 
> setDTthreads(1)
>  
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
 10.810   4.994  15.805

1.11.6

> (default_threads = getDTthreads())
[1] 40
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  7.290   2.008   9.298 
> setDTthreads(1)
> system.time(DT[ , mean(V), keyby = grp])
   user  system elapsed 
  7.582   2.264   9.847

st-pasha mentioned this issue Sep 18, 2018

[FR] gforce mean and sum in parallel #3042

Closed

jangorecki closed this as completed Jan 29, 2019

mattdowle mentioned this issue Jan 29, 2019

Single-threaded grouping performance lower #3330

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threaded group by #2533

Multi-threaded group by #2533

xiaodaigh commented Dec 20, 2017 •

edited

Loading

MichaelChirico commented Dec 21, 2017 •

edited

Loading

jangorecki commented Sep 20, 2018 •

edited by MichaelChirico

Loading

MichaelChirico commented Sep 20, 2018 •

edited

Loading

mattdowle commented Sep 22, 2018 •

edited

Loading

jangorecki commented Jan 29, 2019 •

edited

Loading

Multi-threaded group by #2533

Multi-threaded group by #2533

Comments

xiaodaigh commented Dec 20, 2017 • edited Loading

MichaelChirico commented Dec 21, 2017 • edited Loading

jangorecki commented Sep 20, 2018 • edited by MichaelChirico Loading

MichaelChirico commented Sep 20, 2018 • edited Loading

mattdowle commented Sep 22, 2018 • edited Loading

jangorecki commented Jan 29, 2019 • edited Loading

Code to reproduce

Results

machine A (20 cores)

machine B (40 cores)

xiaodaigh commented Dec 20, 2017 •

edited

Loading

MichaelChirico commented Dec 21, 2017 •

edited

Loading

jangorecki commented Sep 20, 2018 •

edited by MichaelChirico

Loading

MichaelChirico commented Sep 20, 2018 •

edited

Loading

mattdowle commented Sep 22, 2018 •

edited

Loading

jangorecki commented Jan 29, 2019 •

edited

Loading