-
Notifications
You must be signed in to change notification settings - Fork 7
Make BBQ prompts identical to HELM's version #39
Conversation
In HELM this is the most common type of prompt. It is also how BBQ works, so I'll need it in fleshing out that Test.
Before this change the difference was how they sampled in context learning examples. I've updated that to match.
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have mentioned this earlier, but an alternative to copy and pasting the sampling algorithm would be to hardcode in the indexes of the sampled training examples.
Also, the sampled test items will still be different, right? I think it's sufficient to get close enough to BBQ, without needing to reproduce it exactly. |
The test items are still the same as long as max_eval_instances is 1000 or more. In that situation, HELM does no sampling or shuffling of the eval instances. |
In PR #33 I mentioned the only difference was how training examples get sampled. In this PR I'm porting the logic from HELM for how to do sampling.
My goal is to get NewHELM to produce the exact same value for the BBQ stats when using GPT2, as a way to ensure we have a fully functioning replacement.