added two new caches: FileCache, MemoryCache #2

simedw · 2023-07-07T20:06:22Z

Adds two new caches, FileCache and MemoryCache.
FileCache is used by default from the CLI.

from benchllm.chache import FileCache
evaluator = FileCache(StringMatchEvaluator(workers=2), Path("path/to/cache.json"))

or

from benchllm.chache import MemoryCache
evaluator = MemoryCache(StringMatchEvaluator(workers=2))

from the cli (same from bench eval)

$ bench run --cache file # default
$ bench run --cache memory
$ bench run --cache none # old behaviour

Example cache

cat output/cache.json 
{
    "entries": {
        "[\"I was created by V7\", \"I was created by V7.\"]": true,
        "[\"2\", \"2\"]": true,
        "[\"Hello, user!\", \"V7\"]": false,
        "[\"Hello, user!\", \"I was created by V7\"]": false,
        "[\"2\", \"Hello, user!\"]": false,
        "[\"2.0\", \"Hello, user!\"]": false,
        "[\"False\", \"False\"]": true,
        "[\"False\", \"True\"]": false
    },
    "version": "1"
}

There is most likely a better way to store the keys, but right now we encode them to json strings, this way strings, dicts, ints, lists are all supported for expected and for output. Didn't want to shoehorn us into only supporting strings.

The cli also indicates if the cache was used or not.

andreaazzini

I love these two new Evaluators! I've added a few comments that I think are important to address.
When you have a chance, please take a look and let me know your thoughts on them.

benchllm/cli/commands/evaluate.py

andreaazzini · 2023-07-08T09:38:53Z

benchllm/cache.py

+            self._num_cache_hits += 1
+            if lookup:
+                return Evaluator.Match(prediction=prediction.output, expected=expected)
+            return None


I am not sure I get the logic here. If we miss, we want to break for this loop, not return, right? Otherwise, we don't count the miss (line 44).

for expected in prediction.test.expected: lookup = self.lookup(expected, prediction.output) # None indicates that nothing was found in the cache # while True and False are both valid cache values if lookup is None: continue self._num_cache_hits += 1 if lookup: return Evaluator.Match(prediction=prediction.output, expected=expected) return None

If lookup is None, then we don't have any entry for that prediction, expected tuple so we continue.
If we have an entry, it might be a positive entry (they match) and then we can stop searching.
If we have an entry, it might also be a negative entry (they do not match) and we can also stop searching [this could be debated tbh]

Have a look now, your comment helped me find a corner case which I also wrote a test case for!

I see, thank you for adding some documentation, it's much clearer now!

benchllm/cli/commands/run_suite.py

benchllm/evaluator.py

andreaazzini · 2023-07-08T09:46:54Z

benchllm/input_types.py


 from pydantic import BaseModel

+Json = Union[str, bool, list, dict]


I don't think this is the right way of typing a JSON-serializable object/variable. I found a very interesting issue in the Python official repo: python/typing#182.

I really like this approach:

All major type checkers now support recursive type aliases by default, so this should largely work:

JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None

What do you think?

Interesting I will check it out, I removed int and float on purpose since it made pydantic parse things in a strange way

"2" -> 2 ["2"] -> ["2"]

which was kind of unexpected and hard to work with.

pydantic/pydantic#5779

Seems like Pydantic does not yet support recursive TypeAlias

andreaazzini

I think this looks really good, especially:

better inline comments as dev-friendly documentation
split of evaluators into individual modules
better and larger test coverage

Approved! 👍

andreaazzini · 2023-07-10T14:19:33Z

benchllm/cache.py

+            self._num_cache_hits += 1
+            if lookup:
+                return Evaluator.Match(prediction=prediction.output, expected=expected)
+            return None


I see, thank you for adding some documentation, it's much clearer now!

simedw added 2 commits July 7, 2023 21:06

added two new caches: FileCache, MemoryCache

10f8998

raise if similarity gets unexpected response

b033ef1

simedw requested a review from andreaazzini July 7, 2023 21:18

andreaazzini requested changes Jul 8, 2023

View reviewed changes

simedw added 5 commits July 8, 2023 10:52

updated interactive mode and added test cases

ce9fd02

fixed bug in MemoryCache and addressed review feedback

4951f23

updated web to work with Evaluator.Match

bc60e45

separated evaluator classes into sub files

6616233

switch from TempFile to TempDir to fix windows test permission issue

a8c4e1a

simedw requested a review from andreaazzini July 10, 2023 13:48

andreaazzini approved these changes Jul 10, 2023

View reviewed changes

simedw merged commit 9f5275d into main Jul 10, 2023
2 checks passed

simedw deleted the basic-caching branch July 19, 2023 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added two new caches: FileCache, MemoryCache #2

added two new caches: FileCache, MemoryCache #2

simedw commented Jul 7, 2023 •

edited

Loading

andreaazzini left a comment

andreaazzini Jul 8, 2023

simedw Jul 8, 2023

simedw Jul 8, 2023

andreaazzini Jul 10, 2023

andreaazzini Jul 8, 2023

simedw Jul 8, 2023

simedw Jul 8, 2023

andreaazzini left a comment

andreaazzini Jul 10, 2023


		from pydantic import BaseModel

		Json = Union[str, bool, list, dict]

added two new caches: FileCache, MemoryCache #2

added two new caches: FileCache, MemoryCache #2

Conversation

simedw commented Jul 7, 2023 • edited Loading

andreaazzini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreaazzini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simedw commented Jul 7, 2023 •

edited

Loading