Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform suggest operations in parallel using multiprocessing in nn_ensemble #568

Merged
merged 3 commits into from
Feb 11, 2022

Conversation

osma
Copy link
Member

@osma osma commented Feb 4, 2022

During training, the NN ensemble has to process the training documents through the source projects, collecting suggestions. Previously this was done sequentially and could take a lot of time, often dominating the time it takes to train a NN ensemble.

With this PR, the suggest operations are done in parallel on multiple CPUs, controlled by the --jobs parameter of the annif train command. The default is to use all CPUs.

Fixes #429

Opening a draft PR to get feedback from QA tools. More detailed testing and benchmarking still needs to be done.

@osma osma self-assigned this Feb 4, 2022
@codecov
Copy link

codecov bot commented Feb 4, 2022

Codecov Report

Merging #568 (f2cf2ea) into master (736834b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #568   +/-   ##
=======================================
  Coverage   99.47%   99.47%           
=======================================
  Files          84       84           
  Lines        5556     5565    +9     
=======================================
+ Hits         5527     5536    +9     
  Misses         29       29           
Impacted Files Coverage Δ
annif/backend/nn_ensemble.py 99.30% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 736834b...f2cf2ea. Read the comment docs.

@osma osma added this to the 0.57 milestone Feb 4, 2022
@sonarcloud
Copy link

sonarcloud bot commented Feb 4, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@osma
Copy link
Member Author

osma commented Feb 4, 2022

I did some benchmarks using the yso-nlf data set from the Annif tutorial. I set up TFIDF, MLLM and Omikuji projects according to the tutorial exercises, then created a NN ensemble project including all three as sources, like this:

sources=yso-tfidf-en,yso-mllm-en:2,yso-omikuji-parabel-en

I trained the NN ensemble using 500 documents (--docs-limit 500) with different setups on an i7-8550U laptop (4 cores, 8 threads). In the end I also evaluated the ensemble project against the test set, to verify that the ensemble is still working. Results:

wall time user time max RSS F1@5
Before PR 22:46 1448 3021780 0.3632
This PR, jobs=8 10:31 4846 2643096 0.3666
This PR, jobs=1 22:31 1442 3312252 0.3727

Observations:

  • Training time decreases significantly (to less than half) when using parallel processing, although there is significant overhead in the parallelization - the total CPU time (user time) increases a lot.
  • With jobs=1, the situation is similar to what it was before.
  • Memory usage does not change dramatically.
  • Evaluation scores are similar to what they were before.

@osma osma marked this pull request as ready for review February 4, 2022 14:21
@osma osma requested a review from juhoinkinen February 4, 2022 14:21
Copy link
Member

@juhoinkinen juhoinkinen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement!

@osma osma merged commit 5948dee into master Feb 11, 2022
@osma osma deleted the issue429-nn-ensemble-parallel-suggest branch February 11, 2022 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize suggest operations during nn_ensemble training
2 participants