Skip to content

Releases: wandb/llm-leaderboard

v3.1.0

01 Jul 04:19
Compare
Choose a tag to compare

Nejumi Leaderboard 3 has been released.

Weights & Biases Japan Co., Ltd. (W&B Japan) has launched the second major update to Nejumi LLM Leaderboard (http://nejumi.ai/), one of Japan's largest LLM Japanese language ability comparison sites, which has been operating since July 2023. The new version, Nejumi LLM Leaderboard 3, has been made public.

By significantly restructuring the evaluation benchmarks, it now assesses performance by use case and includes safety evaluations, which are gaining attention in AI governance. Additionally, inference speed improvements and simplified library version management make it easier than ever for companies to conduct private evaluations. The public leaderboard allows interactive comparison of evaluation results for over 40 models, including the latest commercial APIs from OpenAI and Anthropic, as well as a wide range of open-source models.

v3.1.0

  • Added AzureOpenAI, Amazon bedrock interface

Related links:

  1. Nejumi LLM Leaderboard 3
  2. Insights from Nejumi LLM Leaderboard 3 (blog)

v2.0.0

21 Dec 23:19
e633780
Compare
Choose a tag to compare
Merge pull request #50 from wandb/generative_eval

Generative eval

v1.0.0

28 Nov 07:34
8654e70
Compare
Choose a tag to compare

First version of Nejumi leaderboard

  • Use JGLUE
  • Evaluate with text-generation with no shot