Releases: wandb/llm-leaderboard
v3.1.0
Nejumi Leaderboard 3 has been released.
Weights & Biases Japan Co., Ltd. (W&B Japan) has launched the second major update to Nejumi LLM Leaderboard (http://nejumi.ai/), one of Japan's largest LLM Japanese language ability comparison sites, which has been operating since July 2023. The new version, Nejumi LLM Leaderboard 3, has been made public.
By significantly restructuring the evaluation benchmarks, it now assesses performance by use case and includes safety evaluations, which are gaining attention in AI governance. Additionally, inference speed improvements and simplified library version management make it easier than ever for companies to conduct private evaluations. The public leaderboard allows interactive comparison of evaluation results for over 40 models, including the latest commercial APIs from OpenAI and Anthropic, as well as a wide range of open-source models.
v3.1.0
- Added AzureOpenAI, Amazon bedrock interface
Related links:
v2.0.0
Merge pull request #50 from wandb/generative_eval Generative eval
v1.0.0
First version of Nejumi leaderboard
- Use JGLUE
- Evaluate with text-generation with no shot