Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch suggest in SVC backend #670

Merged
merged 1 commit into from
Feb 6, 2023
Merged

Batch suggest in SVC backend #670

merged 1 commit into from
Feb 6, 2023

Conversation

osma
Copy link
Member

@osma osma commented Feb 6, 2023

This PR implements _suggest_batch instead of single-document _suggest in the SVC backend, allowing the use of more efficient vector operations.

I tested this with the 20 Newsgroups data set example, as shown on the wiki page of the SVC backend. Evaluation on the test set is 20-30% faster with the results otherwise unchanged.

There is one special case that requires attention: what to do when the input to the SVC classifier is empty or near-empty. The old _suggest code handled this as a special case so it short-circuits the classifier and returns an empty result. There is also a unit test that tests this (using the special input text "j" which is an unknown token and thus equivalent to empty input). The SVC model has no problem returning some class (maybe the majority class?) in this case, but I reimplemented the special case check for empty input to retain the old behaviour that refuses to classify it and keeps the unit test happy.

With 1 job

user time wall time max rss
before (master) 20.36 0:20.20 237848
after (PR) 14.35 0:14.17 237332

With 4 jobs

user time wall time max rss
before (master) 23.16 0:07.92 186972
after (PR) 15.41 0:06.02 187756

Fixes #667

@osma osma added this to the 0.61 milestone Feb 6, 2023
@osma osma requested a review from juhoinkinen February 6, 2023 09:41
@osma osma changed the title Implement batched suggest in SVC backend. Fixes #667 Implement batched suggest in SVC backend Feb 6, 2023
@sonarcloud
Copy link

sonarcloud bot commented Feb 6, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Feb 6, 2023

Codecov Report

Base: 99.56% // Head: 99.56% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (1c587bd) compared to base (cc6dfcf).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #670      +/-   ##
==========================================
- Coverage   99.56%   99.56%   -0.01%     
==========================================
  Files          87       87              
  Lines        6145     6142       -3     
==========================================
- Hits         6118     6115       -3     
  Misses         27       27              
Impacted Files Coverage Δ
annif/backend/svc.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@osma osma merged commit 29e4cb0 into master Feb 6, 2023
@osma osma deleted the issue667-suggest-batch-svc branch February 6, 2023 14:04
@osma osma changed the title Implement batched suggest in SVC backend Batch suggest in SVC backend Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support batch suggest in SVC backend
2 participants