fix context coverage metric input #33

yisz · 2024-02-17T19:58:37Z

Updated input from answer to ground_truth answers

	🚀 This PR description was created by Ellipsis for commit `4839e9e`.

Summary:

This PR updates the calculate method in LLMBasedContextCoverage to iterate over ground_truths instead of answer, returning the maximum score among all ground truths, and updates the corresponding documentation.

Key points:

Updated calculate method in LLMBasedContextCoverage class in continuous_eval/metrics/retrieval_LLM_based_metrics.py.
Replaced answer input with ground_truths and iterated over it to calculate scores.
Method now returns maximum score among all ground truths.
In case of error during calculation, score is set to -1.0.
Updated documentation in docs/src/content/docs/metrics/Retrieval/LLM-Based/llm_context_coverage.md.

Generated with ❤️ by ellipsis.dev

ellipsis-dev

Changes requested. Reviewed entire PR up to commit 0c03170

Reviewed 123 lines of code across 2 files in 48 second(s).

See details

Skipped files: 0
Confidence threshold: 50%
Drafted 0 additional comments.
Workflow ID: wflow_F7pmg52mIxhCRikk

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

ellipsis-dev · 2024-02-17T19:59:33Z

continuous_eval/metrics/retrieval_LLM_based_metrics.py

+                    }
+                )
+
+            scores.append(


There is a logical bug in the code. The 'scores' list is appended with the coverage score for each ground truth answer, but it is also appended with a score of -1.0 whenever an exception is raised. This means that for each ground truth answer, two scores are added to the 'scores' list, which is not the intended behavior. The score of -1.0 should only be added when an exception is raised, and not for every ground truth answer. Consider adding an 'else' clause to the 'try-except' block to append the coverage score only when no exception is raised.

Suggested change

scores.append(

try:

coverage = self.extract_attributed_from_broken_json(content)

except Exception as e:

print(f"{type(e).__name__} Error: {content}, skipping")

scores.append(

{

"LLM_based_context_coverage": -1.0,

"LLM_based_context_statements": content,

}

)

else:

scores.append(

{

"LLM_based_context_coverage": coverage,

"LLM_based_context_statements": content,

}

)

ellipsis-dev

Looks good to me! Incremental review on commit c4110e3

Reviewed 25 lines of code across 1 files in 1 minute(s) and 1 second(s).

See details

Skipped files: 0
Confidence threshold: 50%
Drafted 1 additional comments.
Workflow ID: wflow_dVavXZaarPM6EZPq

View 1 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 1 comments under confidence threshold

Filtered comment at `continuous_eval/metrics/retrieval_LLM_based_metrics.py:137`

Confidence changes required: 50%

Commentary: The PR changes seem to be in line with the description provided. The calculate method in the LLMBasedContextCoverage class now iterates over ground_truths instead of answer. The maximum score among all ground truths is returned. In case of an error during calculation, the score is set to -1.0. The changes seem to be logically correct and I don't see any clear violations of best practices, logical bugs, performance bugs, or security bugs. However, I would like to point out that the error handling could be improved. Currently, the error message is printed to the console and a score of -1.0 is returned. It would be better to log the error message instead of printing it to the console. This would allow for better tracking and debugging of errors.

Consider logging the error message instead of printing it to the console. This would allow for better tracking and debugging of errors.

import logging

logger = logging.getLogger(__name__)

try:
    coverage = self.extract_attributed_from_broken_json(content)
except Exception as e:
    logger.error(f'{type(e).__name__} Error: {content}, skipping')
    scores.append(
        {
            'LLM_based_context_coverage': -1.0,
            'LLM_based_context_statements': content,
        }
    )

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

ellipsis-dev

Looks good to me! Incremental review on commit 4839e9e

Reviewed 123 lines of code across 2 files in 42 second(s).

See details

Skipped files: 0
Confidence threshold: 50%
Drafted 1 additional comments.
Workflow ID: wflow_OcyZNVr1YEyhL7wn

View 1 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 1 comments under confidence threshold

Filtered comment at `continuous_eval/metrics/retrieval_LLM_based_metrics.py:88`

Confidence changes required: 0%

Commentary: The PR changes the input from 'answer' to 'ground_truths' in the 'calculate' method of the 'LLMBasedContextCoverage' class. This change makes sense because the method is supposed to calculate the context coverage score for each ground truth answer and return the maximum score. The code correctly iterates over the ground truth answers, calculates the context coverage score for each, and returns the maximum score. The error handling also seems appropriate, as it sets the score to -1.0 if there's an error during the calculation. The changes in the documentation are also correct and reflect the changes in the code.

The changes in this method are correct. It now correctly calculates the context coverage score for each ground truth answer and returns the maximum score. The error handling is also appropriate.

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with ❤️ by ellipsis.dev

fix context coverage metric input

0c03170

ellipsis-dev bot reviewed Feb 17, 2024

View reviewed changes

add missing else

c4110e3

ellipsis-dev bot reviewed Feb 17, 2024

View reviewed changes

yisz merged commit 4839e9e into main Feb 17, 2024
2 checks passed

yisz deleted the fix/context-coverage branch February 17, 2024 20:36

ellipsis-dev bot reviewed Feb 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix context coverage metric input #33

fix context coverage metric input #33

yisz commented Feb 17, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Feb 17, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

fix context coverage metric input #33

fix context coverage metric input #33

Conversation

yisz commented Feb 17, 2024 • edited by ellipsis-dev bot Loading

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Feb 17, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Drafted 1 comments under confidence threshold

Filtered comment at continuous_eval/metrics/retrieval_LLM_based_metrics.py:137

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Drafted 1 comments under confidence threshold

Filtered comment at continuous_eval/metrics/retrieval_LLM_based_metrics.py:88

yisz commented Feb 17, 2024 •

edited by ellipsis-dev bot

Loading

Filtered comment at `continuous_eval/metrics/retrieval_LLM_based_metrics.py:137`

Filtered comment at `continuous_eval/metrics/retrieval_LLM_based_metrics.py:88`