Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
NickLennonLiu committed Aug 8, 2023
2 parents 77bf89a + af436f5 commit c081f58
Show file tree
Hide file tree
Showing 9 changed files with 469 additions and 30 deletions.
2 changes: 1 addition & 1 deletion .codespellrc
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
skip = *.ipynb
count =
quiet-level = 3
ignore-words-list = nd, ans, ques, rouge
ignore-words-list = nd, ans, ques, rouge, softwares
34 changes: 22 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@ English | [简体中文](README_zh-CN.md)
👋 join us on <a href="https://twitter.com/intern_lm" target="_blank">Twitter</a>, <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=internwx" target="_blank">WeChat</a>
</p>

Welcome to **OpenCompass**!
## 🧭 Welcome

to **OpenCompass**!

Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.

## News
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcomed to join OpenCompass. 🔥🔥🔥.
- **\[2023.08.07\]** We have added a [script](tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev. 🔥🔥🔥.
- **\[2023.08.05\]** We have supported [GPT-4](https://openai.com/gpt-4) and [Qwen-7B](https://github.com/QwenLM/Qwen-7B)! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass. 🔥🔥🔥.
- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcome to join OpenCompass. 🔥🔥🔥.
- **\[2023.07.21\]** Performances of Llama-2 are available in [OpenCompass leaderboard](https://opencompass.org.cn/leaderboard-llm)! 🔥🔥🔥.
- **\[2023.07.19\]** We have supported [Llama-2](https://ai.meta.com/llama/)! Its performance report will be available soon. \[[Doc](./docs/en/get_started.md#Installation)\] 🔥🔥🔥.
- **\[2023.07.13\]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models 🔥🔥🔥.

## Introduction
## Introduction

OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:

Expand All @@ -46,13 +50,13 @@ OpenCompass is a one-stop platform for large model evaluation, aiming to provide

- **Experiment management and reporting mechanism**: Use config files to fully record each experiment, support real-time reporting of results.

## Leaderboard
## 📊 Leaderboard

We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `[email protected]`.

[![image](https://github.com/InternLM/opencompass/assets/13503330/80c5a42c-ddf0-4c6f-b39e-c175711ac381)](https://opencompass.org.cn/rank)
<p align="right"><a href="#top">🔝Back to top</a></p>

## Dataset Support
## 📖 Dataset Support

<table align="center">
<tbody>
Expand Down Expand Up @@ -239,7 +243,9 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
</tbody>
</table>

## Model Support
<p align="right"><a href="#top">🔝Back to top</a></p>

## 📖 Model Support

<table align="center">
<tbody>
Expand Down Expand Up @@ -291,7 +297,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
</tbody>
</table>

## Installation
## 🛠️ Installation

Below are the steps for quick installation and datasets preparation.

Expand All @@ -308,19 +314,21 @@ unzip OpenCompassData.zip

Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).

## Evaluation
<p align="right"><a href="#top">🔝Back to top</a></p>

## 🏗️ ️Evaluation

Make sure you have installed OpenCompass correctly and prepared your datasets according to the above steps. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html#quick-start) to learn how to run an evaluation task.

For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html).

## Acknowledgements
## 🤝 Acknowledgements

Some code in this project is cited and modified from [OpenICL](https://github.com/Shark-NLP/OpenICL).

Some datasets and prompt implementations are modified from [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub) and [instruct-eval](https://github.com/declare-lab/instruct-eval).

## Citation
## 🖊️ Citation

```bibtex
@misc{2023opencompass,
Expand All @@ -330,3 +338,5 @@ Some datasets and prompt implementations are modified from [chain-of-thought-hub
year={2023}
}
```

<p align="right"><a href="#top">🔝Back to top</a></p>
32 changes: 21 additions & 11 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@
👋 加入我们的<a href="https://twitter.com/intern_lm" target="_blank">推特</a>、<a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> 和 <a href="https://r.vansin.top/?r=internwx" target="_blank">微信社区</a>
</p>

欢迎来到OpenCompass!
## 🧭 欢迎

来到**OpenCompass**

就像指南针在我们的旅程中为我们导航一样,我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持,期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。

## 更新
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果. 🔥🔥🔥.
- **\[2023.08.05\]** [GPT-4](https://openai.com/gpt-4)[Qwen-7B](https://github.com/QwenLM/Qwen-7B) 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)! 🔥🔥🔥.
- **\[2023.07.27\]** 新增了 [CMMLU](https://github.com/haonan-li/CMMLU)! 欢迎更多的数据集加入 OpenCompass. 🔥🔥🔥.
- **\[2023.07.21\]** Llama-2 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)! 🔥🔥🔥.
- **\[2023.07.19\]** 新增了 [Llama-2](https://ai.meta.com/llama/)!我们近期将会公布其评测结果。\[[文档](./docs/zh_cn/get_started.md#安装)\] 🔥🔥🔥。
- **\[2023.07.13\]** 发布了 [MMBench](https://opencompass.org.cn/MMBench),该数据集经过细致整理,用于评测多模态模型全方位能力 🔥🔥🔥。

## 介绍
## 介绍

OpenCompass 是面向大模型评测的一站式平台。其主要特点如下:

Expand All @@ -48,13 +52,13 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下

- **灵活化拓展**:想增加新模型或数据集?想要自定义更高级的任务分割策略,甚至接入新的集群管理系统?OpenCompass 的一切均可轻松扩展!

## 性能榜单
## 📊 性能榜单

我们将陆续提供开源模型和API模型的具体性能榜单,请见 [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) 。如需加入评测,请提供模型仓库地址或标准的 API 接口至邮箱 `[email protected]`.

[![image](https://github.com/InternLM/opencompass/assets/13503330/76237116-a9dd-4207-abef-7ff73b89568a)](https://opencompass.org.cn/rank)
<p align="right"><a href="#top">🔝返回顶部</a></p>

## 数据集支持
## 📖 数据集支持

<table align="center">
<tbody>
Expand Down Expand Up @@ -241,7 +245,9 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
</tbody>
</table>

## 模型支持
<p align="right"><a href="#top">🔝返回顶部</a></p>

## 📖 模型支持

<table align="center">
<tbody>
Expand Down Expand Up @@ -291,7 +297,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
</tbody>
</table>

## 安装
## 🛠️ 安装

下面展示了快速安装以及准备数据集的步骤。

Expand All @@ -308,19 +314,21 @@ unzip OpenCompassData.zip

有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行,详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)

## 评测
<p align="right"><a href="#top">🔝返回顶部</a></p>

## 🏗️ ️评测

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。

更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)

## 致谢
## 🤝 致谢

该项目部分的代码引用并修改自 [OpenICL](https://github.com/Shark-NLP/OpenICL)

该项目部分的数据集和提示词实现修改自 [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub), [instruct-eval](https://github.com/declare-lab/instruct-eval)

## 引用
## 🖊️ 引用

```bibtex
@misc{2023opencompass,
Expand All @@ -330,3 +338,5 @@ unzip OpenCompassData.zip
year={2023}
}
```

<p align="right"><a href="#top">🔝返回顶部</a></p>
2 changes: 1 addition & 1 deletion configs/eval_gpt3.5.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
key='ENV', # The key will be obtained from $OPENAI_API_KEY, but you can write down your key here as well
meta_template=api_meta_template,
query_per_second=1,
max_out_len=2048, max_seq_len=2048, batch_size=8),
max_out_len=2048, max_seq_len=4096, batch_size=8),
]

infer = dict(
Expand Down
2 changes: 1 addition & 1 deletion configs/multimodal/instructblip/instructblip-mmbench.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs',
algorithm_keys=[
'question', 'answer', 'category', 'l2-category', 'context',
'question', 'category', 'l2-category', 'context',
'index', 'options_dict', 'options', 'split'
])
]
Expand Down
2 changes: 1 addition & 1 deletion configs/multimodal/minigpt_4/minigpt_4_7b_mmbench.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs',
algorithm_keys=[
'question', 'answer', 'category', 'l2-category', 'context',
'question', 'category', 'l2-category', 'context',
'index', 'options_dict', 'options', 'split'
])
]
Expand Down
8 changes: 5 additions & 3 deletions opencompass/models/openai_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ class OpenAI(BaseAPIModel):
is_api: bool = True

def __init__(self,
path: str,
max_seq_len: int = 2048,
path: str = 'gpt-3.5-turbo',
max_seq_len: int = 4096,
query_per_second: int = 1,
retry: int = 2,
key: Union[str, List[str]] = 'ENV',
Expand Down Expand Up @@ -146,7 +146,9 @@ def _generate(self, input: str or PromptList, max_out_len: int,
messages.append(msg)

# max num token for gpt-3.5-turbo is 4097
max_out_len = min(max_out_len, 4000 - self.get_token_len(str(input)))
max_out_len = min(
max_out_len,
self.max_seq_len - 50 - self.get_token_len(str(input)))
if max_out_len <= 0:
return ''

Expand Down
14 changes: 14 additions & 0 deletions opencompass/tasks/mm_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,20 @@ def name(self) -> str:
evaluator_name = self.evaluator[0]['type']
return f'{model_name}-{dataset_name}-{evaluator_name}'

def get_log_path(self, file_extension: str = 'json') -> str:
"""Get the path to the log file.
Args:
file_extension (str): The file extension of the log file.
Default: 'json'.
"""
model_name = self.model['type']
dataset_name = self.dataloader['dataset']['type']
evaluator_name = self.evaluator[0]['type']

return osp.join(model_name,
f'{dataset_name}-{evaluator_name}.{file_extension}')

def get_command(self, cfg_path, template):
"""Get the command template for the task.
Expand Down
Loading

0 comments on commit c081f58

Please sign in to comment.