-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'upstream/main' into main
- Loading branch information
Showing
9 changed files
with
469 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,18 +21,22 @@ English | [简体中文](README_zh-CN.md) | |
👋 join us on <a href="https://twitter.com/intern_lm" target="_blank">Twitter</a>, <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=internwx" target="_blank">WeChat</a> | ||
</p> | ||
|
||
Welcome to **OpenCompass**! | ||
## 🧭 Welcome | ||
|
||
to **OpenCompass**! | ||
|
||
Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models. | ||
|
||
## News | ||
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a> | ||
|
||
- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcomed to join OpenCompass. 🔥🔥🔥. | ||
- **\[2023.08.07\]** We have added a [script](tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev. 🔥🔥🔥. | ||
- **\[2023.08.05\]** We have supported [GPT-4](https://openai.com/gpt-4) and [Qwen-7B](https://github.com/QwenLM/Qwen-7B)! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass. 🔥🔥🔥. | ||
- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcome to join OpenCompass. 🔥🔥🔥. | ||
- **\[2023.07.21\]** Performances of Llama-2 are available in [OpenCompass leaderboard](https://opencompass.org.cn/leaderboard-llm)! 🔥🔥🔥. | ||
- **\[2023.07.19\]** We have supported [Llama-2](https://ai.meta.com/llama/)! Its performance report will be available soon. \[[Doc](./docs/en/get_started.md#Installation)\] 🔥🔥🔥. | ||
- **\[2023.07.13\]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models 🔥🔥🔥. | ||
|
||
## Introduction | ||
## ✨ Introduction | ||
|
||
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes: | ||
|
||
|
@@ -46,13 +50,13 @@ OpenCompass is a one-stop platform for large model evaluation, aiming to provide | |
|
||
- **Experiment management and reporting mechanism**: Use config files to fully record each experiment, support real-time reporting of results. | ||
|
||
## Leaderboard | ||
## 📊 Leaderboard | ||
|
||
We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `[email protected]`. | ||
|
||
[![image](https://github.com/InternLM/opencompass/assets/13503330/80c5a42c-ddf0-4c6f-b39e-c175711ac381)](https://opencompass.org.cn/rank) | ||
<p align="right"><a href="#top">🔝Back to top</a></p> | ||
|
||
## Dataset Support | ||
## 📖 Dataset Support | ||
|
||
<table align="center"> | ||
<tbody> | ||
|
@@ -239,7 +243,9 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun | |
</tbody> | ||
</table> | ||
|
||
## Model Support | ||
<p align="right"><a href="#top">🔝Back to top</a></p> | ||
|
||
## 📖 Model Support | ||
|
||
<table align="center"> | ||
<tbody> | ||
|
@@ -291,7 +297,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun | |
</tbody> | ||
</table> | ||
|
||
## Installation | ||
## 🛠️ Installation | ||
|
||
Below are the steps for quick installation and datasets preparation. | ||
|
||
|
@@ -308,19 +314,21 @@ unzip OpenCompassData.zip | |
|
||
Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html). | ||
|
||
## Evaluation | ||
<p align="right"><a href="#top">🔝Back to top</a></p> | ||
|
||
## 🏗️ ️Evaluation | ||
|
||
Make sure you have installed OpenCompass correctly and prepared your datasets according to the above steps. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html#quick-start) to learn how to run an evaluation task. | ||
|
||
For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html). | ||
|
||
## Acknowledgements | ||
## 🤝 Acknowledgements | ||
|
||
Some code in this project is cited and modified from [OpenICL](https://github.com/Shark-NLP/OpenICL). | ||
|
||
Some datasets and prompt implementations are modified from [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub) and [instruct-eval](https://github.com/declare-lab/instruct-eval). | ||
|
||
## Citation | ||
## 🖊️ Citation | ||
|
||
```bibtex | ||
@misc{2023opencompass, | ||
|
@@ -330,3 +338,5 @@ Some datasets and prompt implementations are modified from [chain-of-thought-hub | |
year={2023} | ||
} | ||
``` | ||
|
||
<p align="right"><a href="#top">🔝Back to top</a></p> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,18 +21,22 @@ | |
👋 加入我们的<a href="https://twitter.com/intern_lm" target="_blank">推特</a>、<a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> 和 <a href="https://r.vansin.top/?r=internwx" target="_blank">微信社区</a> | ||
</p> | ||
|
||
欢迎来到OpenCompass! | ||
## 🧭 欢迎 | ||
|
||
来到**OpenCompass**! | ||
|
||
就像指南针在我们的旅程中为我们导航一样,我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持,期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。 | ||
|
||
## 更新 | ||
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a> | ||
|
||
- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果. 🔥🔥🔥. | ||
- **\[2023.08.05\]** [GPT-4](https://openai.com/gpt-4) 与 [Qwen-7B](https://github.com/QwenLM/Qwen-7B) 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)! 🔥🔥🔥. | ||
- **\[2023.07.27\]** 新增了 [CMMLU](https://github.com/haonan-li/CMMLU)! 欢迎更多的数据集加入 OpenCompass. 🔥🔥🔥. | ||
- **\[2023.07.21\]** Llama-2 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)! 🔥🔥🔥. | ||
- **\[2023.07.19\]** 新增了 [Llama-2](https://ai.meta.com/llama/)!我们近期将会公布其评测结果。\[[文档](./docs/zh_cn/get_started.md#安装)\] 🔥🔥🔥。 | ||
- **\[2023.07.13\]** 发布了 [MMBench](https://opencompass.org.cn/MMBench),该数据集经过细致整理,用于评测多模态模型全方位能力 🔥🔥🔥。 | ||
|
||
## 介绍 | ||
## ✨ 介绍 | ||
|
||
OpenCompass 是面向大模型评测的一站式平台。其主要特点如下: | ||
|
||
|
@@ -48,13 +52,13 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下 | |
|
||
- **灵活化拓展**:想增加新模型或数据集?想要自定义更高级的任务分割策略,甚至接入新的集群管理系统?OpenCompass 的一切均可轻松扩展! | ||
|
||
## 性能榜单 | ||
## 📊 性能榜单 | ||
|
||
我们将陆续提供开源模型和API模型的具体性能榜单,请见 [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) 。如需加入评测,请提供模型仓库地址或标准的 API 接口至邮箱 `[email protected]`. | ||
|
||
[![image](https://github.com/InternLM/opencompass/assets/13503330/76237116-a9dd-4207-abef-7ff73b89568a)](https://opencompass.org.cn/rank) | ||
<p align="right"><a href="#top">🔝返回顶部</a></p> | ||
|
||
## 数据集支持 | ||
## 📖 数据集支持 | ||
|
||
<table align="center"> | ||
<tbody> | ||
|
@@ -241,7 +245,9 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下 | |
</tbody> | ||
</table> | ||
|
||
## 模型支持 | ||
<p align="right"><a href="#top">🔝返回顶部</a></p> | ||
|
||
## 📖 模型支持 | ||
|
||
<table align="center"> | ||
<tbody> | ||
|
@@ -291,7 +297,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下 | |
</tbody> | ||
</table> | ||
|
||
## 安装 | ||
## 🛠️ 安装 | ||
|
||
下面展示了快速安装以及准备数据集的步骤。 | ||
|
||
|
@@ -308,19 +314,21 @@ unzip OpenCompassData.zip | |
|
||
有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)。 | ||
|
||
## 评测 | ||
<p align="right"><a href="#top">🔝返回顶部</a></p> | ||
|
||
## 🏗️ ️评测 | ||
|
||
确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。 | ||
|
||
更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。 | ||
|
||
## 致谢 | ||
## 🤝 致谢 | ||
|
||
该项目部分的代码引用并修改自 [OpenICL](https://github.com/Shark-NLP/OpenICL)。 | ||
|
||
该项目部分的数据集和提示词实现修改自 [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub), [instruct-eval](https://github.com/declare-lab/instruct-eval) | ||
|
||
## 引用 | ||
## 🖊️ 引用 | ||
|
||
```bibtex | ||
@misc{2023opencompass, | ||
|
@@ -330,3 +338,5 @@ unzip OpenCompassData.zip | |
year={2023} | ||
} | ||
``` | ||
|
||
<p align="right"><a href="#top">🔝返回顶部</a></p> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.