-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spaCy analyzer #374
Comments
spaCy also has some support for document (text) categorization, both multiclass and multilabel. This could be supported as an Annif backend: https://spacy.io/api/textcategorizer |
Starting to look at this more seriously... One immediate issue that comes up is that spacy (3.1.2) depends on typer (0.3.2) which at the moment depends on click==7.1.2, while Annif depends on click==8.0.1 since PR #499 was merged just before the 0.53 release. Now this appears to be fixed with typer 0.4.0 that was released yesterday(!) and supports Click 8, but it won't help until there's a newer version of spaCy available that upgrades the typer dependency. For now we may have to downgrade back to Click 7.1.2, which probably isn't a problem for Annif since I don't think we've started using any Click 8 features yet. |
We could add an analyzer based on spaCy. It would enable support for some new languages and possibly also give better results for eg English and German than the current snowball analyzer.
Getting the full benefit of spaCy may require some internal API changes because it is more object oriented than NLTK and processes whole sentences instead of just individual words, taking some of the context into account.
This would be an optional feature as spaCy is implemented as a native code extension, not just pure Python.
The text was updated successfully, but these errors were encountered: