Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add calculate_metrics and MetricsResult #6680

Merged
merged 15 commits into from
Jan 10, 2024

Conversation

awinml
Copy link
Contributor

@awinml awinml commented Jan 4, 2024

Related Issues

Proposed Changes:

Based on the design in the Evaluation proposal (#5794), we have implemented the following classes:

calculate_metrics:

def calculate_metrics(self, metric: Union[Metric, Callable[..., MetricsResult]], **kwargs) -> MetricsResult:
        ...
        return metric(self, **kwargs)

The calculate_metrics method of EvaluationResult computes the evaluation metric based on provided metric.
The method returns a MetricsResult object.

MetricsResult:

class MetricsResult(dict):

MetricsResult stores the metric values computed during the evaluation. It inherits from dict.
A save method has been implemented to save the metrics to a JSON file.

Metric:

class Metric(Enum):

Metric contains a list of standard metrics available. It inherits from enum.

How did you test it?

The following pipelines were used to test the evaluation using a placeholder metric value:

  • Extractive QA Pipeline
  • RAG Pipeline with BM25 Retriever
  • RAG Pipeline with Embedding Retriever

The other tests will be added with the corresponding metric implementations.

This code was written collaboratively with @vrunm.

@awinml awinml requested review from a team as code owners January 4, 2024 09:16
@awinml awinml requested review from dfokina and silvanocerza and removed request for a team January 4, 2024 09:16
@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Jan 4, 2024
@silvanocerza
Copy link
Contributor

Hey @awinml can you maybe split this into multiple PRs? Ideally one for each issue should be enough, it will speed up review time quite a bit. :)

@awinml awinml changed the title feat: Add calculate_metrics, MetricsResult, Exact Match Metric feat: Add calculate_metrics and MetricsResult Jan 8, 2024
@awinml
Copy link
Contributor Author

awinml commented Jan 8, 2024

@silvanocerza Thanks! I have moved the Exact Match implementation to a separate PR (#6696).

I have decided to keep both calculate_metrics and MetricsResult in this PR, as compared to splitting them, since they are fairly straightforward classes and mostly have placeholders for the metrics at the moment.

@silvanocerza
Copy link
Contributor

Updated branch to fix tests failures.

@coveralls
Copy link
Collaborator

coveralls commented Jan 8, 2024

Pull Request Test Coverage Report for Build 7472477307

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.3%) to 87.066%

Totals Coverage Status
Change from base Build 7458712685: -0.3%
Covered Lines: 4214
Relevant Lines: 4840

💛 - Coveralls

@silvanocerza
Copy link
Contributor

Sorry, I merged base in again before noticing you already did. 🤦

Copy link
Contributor

@silvanocerza silvanocerza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified a bit the logic to run supported metrics and added some tests so coverage doesn't go down. This is good to merge as soon as tests are green. 👍

@silvanocerza silvanocerza merged commit 374a937 into deepset-ai:main Jan 10, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement function to calculate metrics from EvaluationResult Implement MetricsResult class
3 participants