feat: Add `calculate_metrics` and `MetricsResult` #6680

awinml · 2024-01-04T09:16:47Z

Related Issues

Proposed Changes:

Based on the design in the Evaluation proposal (#5794), we have implemented the following classes:

calculate_metrics:

def calculate_metrics(self, metric: Union[Metric, Callable[..., MetricsResult]], **kwargs) -> MetricsResult:
        ...
        return metric(self, **kwargs)

The calculate_metrics method of EvaluationResult computes the evaluation metric based on provided metric.
The method returns a MetricsResult object.

MetricsResult:

class MetricsResult(dict):

MetricsResult stores the metric values computed during the evaluation. It inherits from dict.
A save method has been implemented to save the metrics to a JSON file.

Metric:

class Metric(Enum):

Metric contains a list of standard metrics available. It inherits from enum.

How did you test it?

The following pipelines were used to test the evaluation using a placeholder metric value:

Extractive QA Pipeline
RAG Pipeline with BM25 Retriever
RAG Pipeline with Embedding Retriever

The other tests will be added with the corresponding metric implementations.

This code was written collaboratively with @vrunm.

silvanocerza · 2024-01-05T16:40:01Z

Hey @awinml can you maybe split this into multiple PRs? Ideally one for each issue should be enough, it will speed up review time quite a bit. :)

awinml · 2024-01-08T09:26:58Z

@silvanocerza Thanks! I have moved the Exact Match implementation to a separate PR (#6696).

I have decided to keep both calculate_metrics and MetricsResult in this PR, as compared to splitting them, since they are fairly straightforward classes and mostly have placeholders for the metrics at the moment.

silvanocerza · 2024-01-08T16:15:33Z

Updated branch to fix tests failures.

coveralls · 2024-01-08T16:25:07Z

Pull Request Test Coverage Report for Build 7472477307

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.3%) to 87.066%

Totals
Change from base Build 7458712685:	-0.3%
Covered Lines:	4214
Relevant Lines:	4840

💛 - Coveralls

silvanocerza · 2024-01-09T08:55:52Z

Sorry, I merged base in again before noticing you already did. 🤦

silvanocerza

I simplified a bit the logic to run supported metrics and added some tests so coverage doesn't go down. This is good to merge as soon as tests are green. 👍

awinml added 4 commits January 3, 2024 19:56

Add calculate_metrics, MetricsResult, Exact Match

cab6583

Add additional tests for metric calculation

70454b7

Add release notes

3a224b3

Add docstring for Exact Match metric

416d609

awinml requested review from a team as code owners January 4, 2024 09:16

awinml requested review from dfokina and silvanocerza and removed request for a team January 4, 2024 09:16

github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Jan 4, 2024

awinml added 2 commits January 4, 2024 15:39

Merge branch 'main' into add_metrics_calc_metrics_result

f5c7143

Merge branch 'main' into add_metrics_calc_metrics_result

1e46422

awinml added 2 commits January 8, 2024 13:56

Remove Exact Match Implementation

082e5c0

Update release notes

a015fc9

awinml changed the title ~~feat: Add calculate_metrics, MetricsResult, Exact Match Metric~~ feat: Add calculate_metrics and MetricsResult Jan 8, 2024

Remove unnecessary metrics implementation

f4c71e4

awinml mentioned this pull request Jan 8, 2024

feat: Add Exact Match metric #6696

Merged

silvanocerza mentioned this pull request Jan 8, 2024

test: Skip integration tests if env var is missing #6703

Merged

Merge branch 'main' into add_metrics_calc_metrics_result

b18e79b

awinml and others added 2 commits January 9, 2024 12:08

Merge branch 'main' into add_metrics_calc_metrics_result

27c865a

Merge branch 'main' into add_metrics_calc_metrics_result

be7b9f7

Simplify logic to run supported metrics

50a75bd

Add some evaluation tests

bbf7c4d

silvanocerza approved these changes Jan 10, 2024

View reviewed changes

Fix linting

850b57b

silvanocerza merged commit 374a937 into deepset-ai:main Jan 10, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `calculate_metrics` and `MetricsResult` #6680

feat: Add `calculate_metrics` and `MetricsResult` #6680

awinml commented Jan 4, 2024 •

edited

Loading

silvanocerza commented Jan 5, 2024

awinml commented Jan 8, 2024

silvanocerza commented Jan 8, 2024

coveralls commented Jan 8, 2024 •

edited

Loading

silvanocerza commented Jan 9, 2024

silvanocerza left a comment

feat: Add calculate_metrics and MetricsResult #6680

feat: Add calculate_metrics and MetricsResult #6680

Conversation

awinml commented Jan 4, 2024 • edited Loading

Related Issues

Proposed Changes:

calculate_metrics:

MetricsResult:

Metric:

How did you test it?

silvanocerza commented Jan 5, 2024

awinml commented Jan 8, 2024

silvanocerza commented Jan 8, 2024

coveralls commented Jan 8, 2024 • edited Loading

Pull Request Test Coverage Report for Build 7472477307

Warning: This coverage report may be inaccurate.

💛 - Coveralls

silvanocerza commented Jan 9, 2024

silvanocerza left a comment

Choose a reason for hiding this comment

feat: Add `calculate_metrics` and `MetricsResult` #6680

feat: Add `calculate_metrics` and `MetricsResult` #6680

awinml commented Jan 4, 2024 •

edited

Loading

coveralls commented Jan 8, 2024 •

edited

Loading