Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added two new caches: FileCache, MemoryCache #2

Merged
merged 7 commits into from
Jul 10, 2023
Merged

added two new caches: FileCache, MemoryCache #2

merged 7 commits into from
Jul 10, 2023

Conversation

simedw
Copy link
Contributor

@simedw simedw commented Jul 7, 2023

Adds two new caches, FileCache and MemoryCache.
FileCache is used by default from the CLI.

from benchllm.chache import FileCache
evaluator = FileCache(StringMatchEvaluator(workers=2), Path("path/to/cache.json"))

or

from benchllm.chache import MemoryCache
evaluator = MemoryCache(StringMatchEvaluator(workers=2))

from the cli (same from bench eval)

$ bench run --cache file # default
$ bench run --cache memory
$ bench run --cache none # old behaviour

Example cache

cat output/cache.json 
{
    "entries": {
        "[\"I was created by V7\", \"I was created by V7.\"]": true,
        "[\"2\", \"2\"]": true,
        "[\"Hello, user!\", \"V7\"]": false,
        "[\"Hello, user!\", \"I was created by V7\"]": false,
        "[\"2\", \"Hello, user!\"]": false,
        "[\"2.0\", \"Hello, user!\"]": false,
        "[\"False\", \"False\"]": true,
        "[\"False\", \"True\"]": false
    },
    "version": "1"
}

There is most likely a better way to store the keys, but right now we encode them to json strings, this way strings, dicts, ints, lists are all supported for expected and for output. Didn't want to shoehorn us into only supporting strings.

The cli also indicates if the cache was used or not.

image

@simedw simedw requested a review from andreaazzini July 7, 2023 21:18
Copy link
Member

@andreaazzini andreaazzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love these two new Evaluators! I've added a few comments that I think are important to address.
When you have a chance, please take a look and let me know your thoughts on them.

benchllm/cli/commands/evaluate.py Outdated Show resolved Hide resolved
self._num_cache_hits += 1
if lookup:
return Evaluator.Match(prediction=prediction.output, expected=expected)
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I get the logic here. If we miss, we want to break for this loop, not return, right? Otherwise, we don't count the miss (line 44).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for expected in prediction.test.expected:
            lookup = self.lookup(expected, prediction.output)
            # None indicates that nothing was found in the cache
            # while True and False are both valid cache values
            if lookup is None:
                continue
            self._num_cache_hits += 1
            if lookup:
                return Evaluator.Match(prediction=prediction.output, expected=expected)
            return None

If lookup is None, then we don't have any entry for that prediction, expected tuple so we continue.
If we have an entry, it might be a positive entry (they match) and then we can stop searching.
If we have an entry, it might also be a negative entry (they do not match) and we can also stop searching [this could be debated tbh]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look now, your comment helped me find a corner case which I also wrote a test case for!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you for adding some documentation, it's much clearer now!

benchllm/cli/commands/run_suite.py Outdated Show resolved Hide resolved
benchllm/evaluator.py Outdated Show resolved Hide resolved

from pydantic import BaseModel

Json = Union[str, bool, list, dict]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right way of typing a JSON-serializable object/variable. I found a very interesting issue in the Python official repo: python/typing#182.

I really like this approach:

All major type checkers now support recursive type aliases by default, so this should largely work:

JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting I will check it out, I removed int and float on purpose since it made pydantic parse things in a strange way

"2" -> 2
["2"] -> ["2"] 

which was kind of unexpected and hard to work with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic/pydantic#5779

Seems like Pydantic does not yet support recursive TypeAlias

@simedw simedw requested a review from andreaazzini July 10, 2023 13:48
Copy link
Member

@andreaazzini andreaazzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really good, especially:

  • better inline comments as dev-friendly documentation
  • split of evaluators into individual modules
  • better and larger test coverage

Approved! 👍

self._num_cache_hits += 1
if lookup:
return Evaluator.Match(prediction=prediction.output, expected=expected)
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you for adding some documentation, it's much clearer now!

@simedw simedw merged commit 9f5275d into main Jul 10, 2023
2 checks passed
@simedw simedw deleted the basic-caching branch July 19, 2023 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants