Parallelization of fast_csr_mv #67

GeoffNN · 2020-06-18T13:49:21Z

Parallelization for the sparse matrix multiplication, row-wise.
Cf this thread, which announces nice speed ups.

fabianp · 2020-06-18T18:45:39Z

I'm running the examples code with and without this patch, and the 12 CPUs I have are all constantly at 100% , and don't see any speed improvements after using this patch. It seems that numba (or llvm) is by default already deciding to parallelize some parts of the code

fabianp · 2020-06-18T18:46:48Z

I'm going to run a couple more benchmarks, but if the performance is the same I would be more inclined to let numba decide which parts to parallelize, as he'll likely do a better job than us at that (thinking for example at nested parallelizable loops) ;-)

fabianp · 2020-06-18T18:49:26Z

and the parallelization I observe definitely comes from numba, as I can get it down to use just one CPU with the environment variable NUMBA_NUM_THREADS=1

GeoffNN · 2020-06-18T19:48:35Z

For now, the only parallelization that's done in this part of the code base is for sampling the batches, once per epoch. Is it possible that the 100% on the CPUs (that I also observe on my machines) is because there's no deallocation between two calls to that function?

What happens if we remove parallel=True for sampling the batches, and put it for the matrix multiplication?

fabianp · 2020-06-18T20:19:49Z

OK I think I found why I was seeing all CPUs being used. It's because the bottleneck of the algorithm is not in the algorithm itself but in computing the fw_gap used for reporting. This uses the full gradient, and so Numpy's matrix-vector routines that fire up all CPUs

GeoffNN · 2020-07-20T15:15:07Z

Does that mean that it would be nice in practice ? Since for applications, you won't be computing the gap very often.

Parallelization of fast_csr_mv

f7816d8

GeoffNN requested a review from fabianp June 18, 2020 13:49

GeoffNN added enhancement performance performance labels Jun 18, 2020

Merge branch 'master' into geoff/parallelize_mv

0cc2fad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization of fast_csr_mv #67

Parallelization of fast_csr_mv #67

GeoffNN commented Jun 18, 2020 •

edited

Loading

fabianp commented Jun 18, 2020

fabianp commented Jun 18, 2020

fabianp commented Jun 18, 2020

GeoffNN commented Jun 18, 2020

fabianp commented Jun 18, 2020

GeoffNN commented Jul 20, 2020

Parallelization of fast_csr_mv #67

Are you sure you want to change the base?

Parallelization of fast_csr_mv #67

Conversation

GeoffNN commented Jun 18, 2020 • edited Loading

fabianp commented Jun 18, 2020

fabianp commented Jun 18, 2020

fabianp commented Jun 18, 2020

GeoffNN commented Jun 18, 2020

fabianp commented Jun 18, 2020

GeoffNN commented Jul 20, 2020

GeoffNN commented Jun 18, 2020 •

edited

Loading