708 replace fuzzywuzzy and textdistance with rapidfuzz for plain evaluation metrics #709

prvenk · 2024-09-14T01:52:25Z

In this PR, we modify the following:

fuzz token set ratio from fuzzywuzzy with the much faster rapidfuzz version. we also make this more flexible with an option to measure other kinds of fuzz metrics, namely: ratio, partial ratio, token_sort_ratio, token_sort_partial_ratio, and token_set_partial_ratio.
we replace some plain metrics from textdistance with the rapidfuzz version (this is faster as documented in the issue). the metrics we replace are lccseq, hamming, jarowinkler, levenshtein. We retain textdistance variants for cosine and jaccard given rapidfuzz doesn't have a version.
Added rouge scores as plain string metrics
Renaming variables for clarity, e.g., doc1 to str1 and value1 to str1 given all inputs are supposed to be strings and these were not consistent.
Adding types for arguments and return.
Editing documentation to reflect the changes
Added tests

martinpeck · 2024-09-16T10:40:06Z

Sorry...I don't think I have the background context to review this PR properly.

rag_experiment_accelerator/evaluation/plain_metrics.py

rag_experiment_accelerator/evaluation/eval.py

rag_experiment_accelerator/evaluation/plain_metrics.py

rag_experiment_accelerator/evaluation/tests/test_plain_metrics.py

kcortinas

Thanks @prvenk! Added a few comments and suggestions.

README.md

config.sample.json

docs/evaluation-metrics.md

.github/workflows/config.json

prvenk

done

docs/evaluation-metrics.md

kcortinas

LGTM!

prvenk added 2 commits September 13, 2024 14:33

replacing fuzzywuzzy with rapidfuzz

21db4fd

adding types

f7a69f6

prvenk added the enhancement New feature or request label Sep 14, 2024

prvenk self-assigned this Sep 14, 2024

prvenk requested a review from ritesh-modi September 14, 2024 01:52

prvenk marked this pull request as draft September 14, 2024 01:56

prvenk changed the base branch from prerelease to development September 14, 2024 01:56

prvenk linked an issue Sep 14, 2024 that may be closed by this pull request

Replace fuzzywuzzy and some textdistance evals with rapidfuzz #708

Open

prvenk marked this pull request as ready for review September 14, 2024 01:57

prvenk added 2 commits September 14, 2024 01:59

adding back testdistance

bd12004

adding back testdistance

8052bee

prvenk requested review from martinpeck and guybartal September 14, 2024 02:05

fixing tests

79d965d

martinpeck removed their request for review September 16, 2024 10:39

Merge branch 'development' into 708-replace-fuzzywuzzy-with-rapidfuzz

a1d0ceb

prvenk requested a review from julia-meshcheryakova September 16, 2024 13:05

julia-meshcheryakova reviewed Sep 16, 2024

View reviewed changes

rag_experiment_accelerator/evaluation/plain_metrics.py Outdated Show resolved Hide resolved

Merge branch 'development' into 708-replace-fuzzywuzzy-with-rapidfuzz

1b808d8

prvenk requested a review from kcortinas September 17, 2024 15:05