-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added two new caches: FileCache, MemoryCache #2
Merged
Merged
Changes from 2 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
10f8998
added two new caches: FileCache, MemoryCache
simedw b033ef1
raise if similarity gets unexpected response
simedw ce9fd02
updated interactive mode and added test cases
simedw 4951f23
fixed bug in MemoryCache and addressed review feedback
simedw bc60e45
updated web to work with Evaluator.Match
simedw 6616233
separated evaluator classes into sub files
simedw a8c4e1a
switch from TempFile to TempDir to fix windows test permission issue
simedw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
import json | ||
from pathlib import Path | ||
from typing import Optional | ||
|
||
from benchllm.data_types import Evaluation, Prediction | ||
from benchllm.evaluator import Evaluator | ||
from benchllm.input_types import Json | ||
from benchllm.listener import EvaluatorListener | ||
|
||
|
||
class MemoryCache(Evaluator): | ||
"""Caches the results of the evaluator in memory""" | ||
|
||
def __init__(self, evaluator: Evaluator): | ||
super().__init__(workers=evaluator.workers) | ||
self._data: dict = {} | ||
self._evaluator = evaluator | ||
self._num_cache_misses = 0 | ||
self._num_cache_hits = 0 | ||
|
||
def _key(self, answer1: Json, answer2: Json) -> str: | ||
key1, key2 = json.dumps([answer1, answer2]), json.dumps([answer2, answer1]) | ||
return key1 if key1 < key2 else key2 | ||
|
||
def lookup(self, answer1: Json, answer2: Json) -> Optional[bool]: | ||
return self._data.get(self._key(answer1, answer2), None) | ||
|
||
def store(self, answer1: Json, answer2: Json, value: bool) -> None: | ||
key = self._key(answer1, answer2) | ||
self._data[key] = value | ||
|
||
def evaluate_prediction(self, prediction: Prediction) -> Optional[Evaluator.Match]: | ||
for expected in prediction.test.expected: | ||
lookup = self.lookup(expected, prediction.output) | ||
# None indicates that nothing was found in the cache | ||
# while True and False are both valid cache values | ||
if lookup is None: | ||
continue | ||
self._num_cache_hits += 1 | ||
if lookup: | ||
return Evaluator.Match(prediction=prediction.output, expected=expected) | ||
return None | ||
|
||
self._num_cache_misses += 1 | ||
result = self._evaluator.evaluate_prediction(prediction) | ||
if result: | ||
self.store(result.expected, result.prediction, True) | ||
else: | ||
for expected in prediction.test.expected: | ||
self.store(expected, prediction.output, False) | ||
return result | ||
|
||
@property | ||
def num_cache_hits(self) -> int: | ||
return self._num_cache_hits | ||
|
||
@property | ||
def num_cache_misses(self) -> int: | ||
return self._num_cache_misses | ||
|
||
|
||
class FileCache(MemoryCache, EvaluatorListener): | ||
"""Caches the results of the evaluator in a json file""" | ||
|
||
def __init__(self, evaluator: Evaluator, path: Path): | ||
super().__init__(evaluator) | ||
self._path = path | ||
self.add_listener(self) | ||
self._load() | ||
|
||
def _load(self) -> None: | ||
if self._path.exists(): | ||
try: | ||
cache = json.loads(self._path.read_text(encoding="UTF-8"), parse_int=str) | ||
if cache["version"] != "1": | ||
raise ValueError("Unsupported cache version") | ||
self._data = cache["entries"] | ||
except Exception: | ||
print(f"Failed to load cache file {self._path}") | ||
self._data = {} | ||
|
||
def _save(self) -> None: | ||
cache = {"entries": self._data, "version": "1"} | ||
self._path.write_text(json.dumps(cache, indent=4), encoding="UTF-8") | ||
|
||
def evaluate_ended(self, evaluations: list[Evaluation]) -> None: | ||
self._save() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I get the logic here. If we miss, we want to break for this loop, not return, right? Otherwise, we don't count the miss (line 44).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lookup is None, then we don't have any entry for that prediction, expected tuple so we continue.
If we have an entry, it might be a positive entry (they match) and then we can stop searching.
If we have an entry, it might also be a negative entry (they do not match) and we can also stop searching [this could be debated tbh]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look now, your comment helped me find a corner case which I also wrote a test case for!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for adding some documentation, it's much clearer now!