-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform suggest operations in parallel using multiprocessing in nn_ensemble #568
Conversation
Codecov Report
@@ Coverage Diff @@
## master #568 +/- ##
=======================================
Coverage 99.47% 99.47%
=======================================
Files 84 84
Lines 5556 5565 +9
=======================================
+ Hits 5527 5536 +9
Misses 29 29
Continue to review full report at Codecov.
|
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
I did some benchmarks using the
I trained the NN ensemble using 500 documents (--docs-limit 500) with different setups on an i7-8550U laptop (4 cores, 8 threads). In the end I also evaluated the ensemble project against the test set, to verify that the ensemble is still working. Results:
Observations:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement!
During training, the NN ensemble has to process the training documents through the source projects, collecting suggestions. Previously this was done sequentially and could take a lot of time, often dominating the time it takes to train a NN ensemble.
With this PR, the suggest operations are done in parallel on multiple CPUs, controlled by the
--jobs
parameter of theannif train
command. The default is to use all CPUs.Fixes #429
Opening a draft PR to get feedback from QA tools. More detailed testing and benchmarking still needs to be done.