Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

Closed
prvenk opened this issue Sep 13, 2024 · 1 comment · Fixed by #709
Closed

Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

prvenk opened this issue Sep 13, 2024 · 1 comment · Fixed by #709
Assignees
Labels
enhancement New feature or request

Comments

@prvenk
Copy link
Collaborator

prvenk commented Sep 13, 2024

As described on the rapidfuzz github ( https://github.com/rapidfuzz/RapidFuzz )

RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations from FuzzyWuzzy. However there are a couple of aspects that set RapidFuzz apart from FuzzyWuzzy:

  • It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy
  • It provides many string_metrics like hamming or jaro_winkler, which are not included in FuzzyWuzzy (you can use these versions instead of textdistance)
  • It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. For detailed benchmarks check this image: https://[raw.githubusercontent.com/rapidfuzz/RapidFuzz/main/docs/img/scorer.svg?sanitize=true
    • Given this, even some distance metrics such as jarowinkler, hamming from textdistance can be replaced by rapidfuzz.
  • Fixes multiple bugs in the partial_ratio implementation
  • It can be largely used as a drop in replacement for fuzzywuzzy.
@prvenk prvenk self-assigned this Sep 13, 2024
@prvenk prvenk added the enhancement New feature or request label Sep 13, 2024
@prvenk
Copy link
Collaborator Author

prvenk commented Sep 13, 2024

@ritesh-modi @guybartal

@prvenk prvenk changed the title Replace fuzzywuzzy with rapidfuzz Replace fuzzywuzzy and some textdistance evals with rapidfuzz Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant