Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you release the reproduction data for your result #5

Open
p1nksnow opened this issue Apr 4, 2024 · 2 comments
Open

Could you release the reproduction data for your result #5

p1nksnow opened this issue Apr 4, 2024 · 2 comments

Comments

@p1nksnow
Copy link

p1nksnow commented Apr 4, 2024

I'm testing the pass rate evaluation, could you offer the reproduction data like Toolbench?
Thanks for your reply

@zhichengg
Copy link
Collaborator

Hi! Thank you for your interest in our work.

We are planning to publish our model inference results soon. However, OpenAI updated their gpt-4-turbo models this month. With the new model as the evaluator, the performance will systematically drop. We used gpt-4-turbo-preview in our experiments but the behaviour of this model also changed a lot. We will soon update the model performance with gpt-4-turbo-2024-04-09. We are also training our own evaluator model with an open-source model to replace these closed-source models.

@importpandas
Copy link

Hi, thanks for your great job of StableToolBench. Is there any update on the release plan of model inference results?

I'm working on StableToolBench to build benchmark with other evaluation metrics. However, it's expensive to rerun all the model inference results. While the evaluation setup may change, is it possible to release the inference results first? It seems that the inference results will be always consistent during the whole evaluation process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants