-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Semantic Answer Similarity metric #6877
Conversation
Pull Request Test Coverage Report for Build 7758146880Warning: This coverage report may be inaccurate.We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
💛 - Coveralls |
Co-authored-by: Silvano Cerza <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job as always. Thank you both. 🙏
|
||
# All Cross Encoders do not return a set of logits scores that are normalized | ||
# We normalize scores if they are larger than 1 | ||
if (similarity_scores > 1).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (abs(similarity_scores) > 1).any():
?
Related Issues
Fixes #6069
Proposed Changes:
Adds support for the Semantic Answer Similarity (SAS) metric to
EvaluationResult.calculate_metrics(...)
The
_calculate_sas
method of EvaluationResult had been updated to compute the SAS metric:Usage:
For evaluation of a pipeline:
How did you test it?
Unit tests have been added.
End-to-end tests with the following pipelines have been added:
Notes for the reviewer:
Certain cross encoders (like "ms-marco-MiniLM-L-6-v2") provide us with un-normalized similarity scores by simply outputting the logits. Since the mean of the normalized scores is what we use to calculate the final SAS score, it is necessary to normalize the logits by applying the sigmoid. For more information, please have a look at this issue.
In this implementation, we apply sigmoid to the logits returned by the cross encoder if they are greater than 1(un-normalized).
Alternatively, we could provide an optional
normalize
parameter. We decided against using this approach because it would produce SAS scores greater than 1, in the case that a cross-encoder model was passed withnormalize=False
. This would make the results hard to interpret and compare.The tests in
test_eval_sas.py
have been marked as integration tests, since they send need to send an API call to HuggingFace for fetching the SAS model config.This code was written collaboratively with @vrunm.