Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

prvenk · 2024-09-13T03:15:00Z

As described on the rapidfuzz github ( https://github.com/rapidfuzz/RapidFuzz )

RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations from FuzzyWuzzy. However there are a couple of aspects that set RapidFuzz apart from FuzzyWuzzy:

It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy
It provides many string_metrics like hamming or jaro_winkler, which are not included in FuzzyWuzzy (you can use these versions instead of textdistance)
It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. For detailed benchmarks check this image: https://[raw.githubusercontent.com/rapidfuzz/RapidFuzz/main/docs/img/scorer.svg?sanitize=true
- Given this, even some distance metrics such as jarowinkler, hamming from textdistance can be replaced by rapidfuzz.
Fixes multiple bugs in the partial_ratio implementation
It can be largely used as a drop in replacement for fuzzywuzzy.

prvenk · 2024-09-13T14:00:41Z

@ritesh-modi @guybartal

prvenk self-assigned this Sep 13, 2024

prvenk added the enhancement New feature or request label Sep 13, 2024

prvenk linked a pull request Sep 14, 2024 that will close this issue

708 replace fuzzywuzzy and textdistance with rapidfuzz for plain evaluation metrics #709

Merged

prvenk changed the title ~~Replace fuzzywuzzy with rapidfuzz~~ Replace fuzzywuzzy and some textdistance evals with rapidfuzz Sep 14, 2024

prvenk closed this as completed in #709 Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

prvenk commented Sep 13, 2024 •

edited

Loading

prvenk commented Sep 13, 2024

Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

Comments

prvenk commented Sep 13, 2024 • edited Loading

prvenk commented Sep 13, 2024

prvenk commented Sep 13, 2024 •

edited

Loading