-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix context coverage metric input #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested. Reviewed entire PR up to commit 0c03170
Reviewed 123
lines of code across 2
files in 48 second(s).
See details
- Skipped files: 0
- Confidence threshold:
50%
- Drafted
0
additional comments. - Workflow ID:
wflow_F7pmg52mIxhCRikk
Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.
Generated with ❤️ by ellipsis.dev
} | ||
) | ||
|
||
scores.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a logical bug in the code. The 'scores' list is appended with the coverage score for each ground truth answer, but it is also appended with a score of -1.0 whenever an exception is raised. This means that for each ground truth answer, two scores are added to the 'scores' list, which is not the intended behavior. The score of -1.0 should only be added when an exception is raised, and not for every ground truth answer. Consider adding an 'else' clause to the 'try-except' block to append the coverage score only when no exception is raised.
scores.append( | |
try: | |
coverage = self.extract_attributed_from_broken_json(content) | |
except Exception as e: | |
print(f"{type(e).__name__} Error: {content}, skipping") | |
scores.append( | |
{ | |
"LLM_based_context_coverage": -1.0, | |
"LLM_based_context_statements": content, | |
} | |
) | |
else: | |
scores.append( | |
{ | |
"LLM_based_context_coverage": coverage, | |
"LLM_based_context_statements": content, | |
} | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Incremental review on commit c4110e3
Reviewed 25
lines of code across 1
files in 1 minute(s) and 1 second(s).
See details
- Skipped files: 0
- Confidence threshold:
50%
- Drafted
1
additional comments. - Workflow ID:
wflow_dVavXZaarPM6EZPq
View 1 draft comments
These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml
.
Drafted 1 comments under confidence threshold
Filtered comment at continuous_eval/metrics/retrieval_LLM_based_metrics.py:137
Confidence changes required: 50%
Commentary: The PR changes seem to be in line with the description provided. The calculate method in the LLMBasedContextCoverage class now iterates over ground_truths instead of answer. The maximum score among all ground truths is returned. In case of an error during calculation, the score is set to -1.0. The changes seem to be logically correct and I don't see any clear violations of best practices, logical bugs, performance bugs, or security bugs. However, I would like to point out that the error handling could be improved. Currently, the error message is printed to the console and a score of -1.0 is returned. It would be better to log the error message instead of printing it to the console. This would allow for better tracking and debugging of errors.
Consider logging the error message instead of printing it to the console. This would allow for better tracking and debugging of errors.
import logging
logger = logging.getLogger(__name__)
try:
coverage = self.extract_attributed_from_broken_json(content)
except Exception as e:
logger.error(f'{type(e).__name__} Error: {content}, skipping')
scores.append(
{
'LLM_based_context_coverage': -1.0,
'LLM_based_context_statements': content,
}
)
Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.
Generated with ❤️ by ellipsis.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Incremental review on commit 4839e9e
Reviewed 123
lines of code across 2
files in 42 second(s).
See details
- Skipped files: 0
- Confidence threshold:
50%
- Drafted
1
additional comments. - Workflow ID:
wflow_OcyZNVr1YEyhL7wn
View 1 draft comments
These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml
.
Drafted 1 comments under confidence threshold
Filtered comment at continuous_eval/metrics/retrieval_LLM_based_metrics.py:88
Confidence changes required: 0%
Commentary: The PR changes the input from 'answer' to 'ground_truths' in the 'calculate' method of the 'LLMBasedContextCoverage' class. This change makes sense because the method is supposed to calculate the context coverage score for each ground truth answer and return the maximum score. The code correctly iterates over the ground truth answers, calculates the context coverage score for each, and returns the maximum score. The error handling also seems appropriate, as it sets the score to -1.0 if there's an error during the calculation. The changes in the documentation are also correct and reflect the changes in the code.
The changes in this method are correct. It now correctly calculates the context coverage score for each ground truth answer and returns the maximum score. The error handling is also appropriate.
Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.
Generated with ❤️ by ellipsis.dev
Updated input from answer to ground_truth answers
Summary:
This PR updates the
calculate
method inLLMBasedContextCoverage
to iterate overground_truths
instead ofanswer
, returning the maximum score among all ground truths, and updates the corresponding documentation.Key points:
calculate
method inLLMBasedContextCoverage
class incontinuous_eval/metrics/retrieval_LLM_based_metrics.py
.answer
input withground_truths
and iterated over it to calculate scores.docs/src/content/docs/metrics/Retrieval/LLM-Based/llm_context_coverage.md
.Generated with ❤️ by ellipsis.dev