Merge remote-tracking branch 'upstream/main' into main

NetManAIOps · Aug 8, 2023 · c081f58 · c081f58
2 parents 77bf89a + af436f5
commit c081f58
Show file tree

Hide file tree

Showing 9 changed files with 469 additions and 30 deletions.
diff --git a/.codespellrc b/.codespellrc
@@ -2,4 +2,4 @@
 skip = *.ipynb
 count =
 quiet-level = 3
-ignore-words-list = nd, ans, ques, rouge
+ignore-words-list = nd, ans, ques, rouge, softwares
diff --git a/README.md b/README.md
@@ -21,18 +21,22 @@ English | [简体中文](README_zh-CN.md)
     👋 join us on <a href="https://twitter.com/intern_lm" target="_blank">Twitter</a>, <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=internwx" target="_blank">WeChat</a>
 </p>
 
-Welcome to **OpenCompass**!
+## 🧭	Welcome
+
+to **OpenCompass**!
 
 Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.
 
-## News
+## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 
-- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcomed to join OpenCompass. 🔥🔥🔥.
+- **\[2023.08.07\]** We have added a [script](tools/eval_mmbench.py) for users to evaluate the inference results of [MMBench](https://opencompass.org.cn/MMBench)-dev. 🔥🔥🔥.
+- **\[2023.08.05\]** We have supported [GPT-4](https://openai.com/gpt-4) and [Qwen-7B](https://github.com/QwenLM/Qwen-7B)! Go to our [leaderboard](https://opencompass.org.cn/leaderboard-llm) for more results! More models are welcome to join OpenCompass. 🔥🔥🔥.
+- **\[2023.07.27\]** We have supported [CMMLU](https://github.com/haonan-li/CMMLU)! More datasets are welcome to join OpenCompass. 🔥🔥🔥.
 - **\[2023.07.21\]** Performances of Llama-2 are available in [OpenCompass leaderboard](https://opencompass.org.cn/leaderboard-llm)!  🔥🔥🔥.
 - **\[2023.07.19\]** We have supported [Llama-2](https://ai.meta.com/llama/)! Its performance report will be available soon. \[[Doc](./docs/en/get_started.md#Installation)\] 🔥🔥🔥.
 - **\[2023.07.13\]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models 🔥🔥🔥.
 
-## Introduction
+## ✨ Introduction
 
 OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
 
@@ -46,13 +50,13 @@ OpenCompass is a one-stop platform for large model evaluation, aiming to provide
 
 - **Experiment management and reporting mechanism**: Use config files to fully record each experiment, support real-time reporting of results.
 
-## Leaderboard
+## 📊 Leaderboard
 
 We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `[email protected]`.
 
-[![image](https://github.com/InternLM/opencompass/assets/13503330/80c5a42c-ddf0-4c6f-b39e-c175711ac381)](https://opencompass.org.cn/rank)
+<p align="right"><a href="#top">🔝Back to top</a></p>
 
-## Dataset Support
+## 📖 Dataset Support
 
 <table align="center">
   <tbody>
@@ -239,7 +243,9 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
   </tbody>
 </table>
 
-## Model Support
+<p align="right"><a href="#top">🔝Back to top</a></p>
+
+## 📖 Model Support
 
 <table align="center">
   <tbody>
@@ -291,7 +297,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
   </tbody>
 </table>
 
-## Installation
+## 🛠️ Installation
 
 Below are the steps for quick installation and datasets preparation.
 
@@ -308,19 +314,21 @@ unzip OpenCompassData.zip
 
 Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started.html).
 
-## Evaluation
+<p align="right"><a href="#top">🔝Back to top</a></p>
+
+## 🏗️ ️Evaluation
 
 Make sure you have installed OpenCompass correctly and prepared your datasets according to the above steps. Please read the [Quick Start](https://opencompass.readthedocs.io/en/latest/get_started.html#quick-start) to learn how to run an evaluation task.
 
 For more tutorials, please check our [Documentation](https://opencompass.readthedocs.io/en/latest/index.html).
 
-## Acknowledgements
+## 🤝 Acknowledgements
 
 Some code in this project is cited and modified from [OpenICL](https://github.com/Shark-NLP/OpenICL).
 
 Some datasets and prompt implementations are modified from [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub) and [instruct-eval](https://github.com/declare-lab/instruct-eval).
 
-## Citation
+## 🖊️ Citation
 
 ```bibtex
 @misc{2023opencompass,
@@ -330,3 +338,5 @@ Some datasets and prompt implementations are modified from [chain-of-thought-hub
     year={2023}
 }
 ```
+
+<p align="right"><a href="#top">🔝Back to top</a></p>
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -21,18 +21,22 @@
     👋 加入我们的<a href="https://twitter.com/intern_lm" target="_blank">推特</a>、<a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> 和 <a href="https://r.vansin.top/?r=internwx" target="_blank">微信社区</a>
 </p>
 
-欢迎来到OpenCompass！
+## 🧭	欢迎
+
+来到**OpenCompass**！
 
 就像指南针在我们的旅程中为我们导航一样，我们希望OpenCompass能够帮助你穿越评估大型语言模型的重重迷雾。OpenCompass提供丰富的算法和功能支持，期待OpenCompass能够帮助社区更便捷地对NLP模型的性能进行公平全面的评估。
 
-## 更新
+## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 
+- **\[2023.08.07\]** 新增了 [MMBench 评测脚本](tools/eval_mmbench.py) 以支持用户自行获取 [MMBench](https://opencompass.org.cn/MMBench)-dev 的测试结果. 🔥🔥🔥.
+- **\[2023.08.05\]** [GPT-4](https://openai.com/gpt-4) 与 [Qwen-7B](https://github.com/QwenLM/Qwen-7B) 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)! 🔥🔥🔥.
 - **\[2023.07.27\]** 新增了 [CMMLU](https://github.com/haonan-li/CMMLU)! 欢迎更多的数据集加入 OpenCompass. 🔥🔥🔥.
 - **\[2023.07.21\]** Llama-2 的评测结果已更新在 OpenCompass [大语言模型评测榜单](https://opencompass.org.cn/leaderboard-llm)!  🔥🔥🔥.
 - **\[2023.07.19\]** 新增了 [Llama-2](https://ai.meta.com/llama/)！我们近期将会公布其评测结果。\[[文档](./docs/zh_cn/get_started.md#安装)\] 🔥🔥🔥。
 - **\[2023.07.13\]** 发布了 [MMBench](https://opencompass.org.cn/MMBench)，该数据集经过细致整理，用于评测多模态模型全方位能力 🔥🔥🔥。
 
-## 介绍
+## ✨ 介绍
 
 OpenCompass 是面向大模型评测的一站式平台。其主要特点如下：
 
@@ -48,13 +52,13 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 
 - **灵活化拓展**：想增加新模型或数据集？想要自定义更高级的任务分割策略，甚至接入新的集群管理系统？OpenCompass 的一切均可轻松扩展！
 
-## 性能榜单
+## 📊 性能榜单
 
 我们将陆续提供开源模型和API模型的具体性能榜单，请见 [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) 。如需加入评测，请提供模型仓库地址或标准的 API 接口至邮箱  `[email protected]`.
 
-[![image](https://github.com/InternLM/opencompass/assets/13503330/76237116-a9dd-4207-abef-7ff73b89568a)](https://opencompass.org.cn/rank)
+<p align="right"><a href="#top">🔝返回顶部</a></p>
 
-## 数据集支持
+## 📖 数据集支持
 
 <table align="center">
   <tbody>
@@ -241,7 +245,9 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
   </tbody>
 </table>
 
-## 模型支持
+<p align="right"><a href="#top">🔝返回顶部</a></p>
+
+## 📖 模型支持
 
 <table align="center">
   <tbody>
@@ -291,7 +297,7 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
   </tbody>
 </table>
 
-## 安装
+## 🛠️ 安装
 
 下面展示了快速安装以及准备数据集的步骤。
 
@@ -308,19 +314,21 @@ unzip OpenCompassData.zip
 
 有部分第三方功能,如 Humaneval 以及 Llama,可能需要额外步骤才能正常运行，详细步骤请参考[安装指南](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html)。
 
-## 评测
+<p align="right"><a href="#top">🔝返回顶部</a></p>
+
+## 🏗️ ️评测
 
 确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，请阅读[快速上手](https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id3)了解如何运行一个评测任务。
 
 更多教程请查看我们的[文档](https://opencompass.readthedocs.io/zh_CN/latest/index.html)。
 
-## 致谢
+## 🤝 致谢
 
 该项目部分的代码引用并修改自 [OpenICL](https://github.com/Shark-NLP/OpenICL)。
 
 该项目部分的数据集和提示词实现修改自 [chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub), [instruct-eval](https://github.com/declare-lab/instruct-eval)
 
-## 引用
+## 🖊️ 引用
 
 ```bibtex
 @misc{2023opencompass,
@@ -330,3 +338,5 @@ unzip OpenCompassData.zip
     year={2023}
 }
 ```
+
+<p align="right"><a href="#top">🔝返回顶部</a></p>
diff --git a/configs/eval_gpt3.5.py b/configs/eval_gpt3.5.py
@@ -24,7 +24,7 @@
         key='ENV',  # The key will be obtained from $OPENAI_API_KEY, but you can write down your key here as well
         meta_template=api_meta_template,
         query_per_second=1,
-        max_out_len=2048, max_seq_len=2048, batch_size=8),
+        max_out_len=2048, max_seq_len=4096, batch_size=8),
 ]
 
 infer = dict(

diff --git a/configs/multimodal/instructblip/instructblip-mmbench.py b/configs/multimodal/instructblip/instructblip-mmbench.py
@@ -9,7 +9,7 @@
          std=(0.26862954, 0.26130258, 0.27577711)),
     dict(type='mmpretrain.PackInputs',
          algorithm_keys=[
-             'question', 'answer', 'category', 'l2-category', 'context',
+             'question', 'category', 'l2-category', 'context',
              'index', 'options_dict', 'options', 'split'
          ])
 ]

diff --git a/configs/multimodal/minigpt_4/minigpt_4_7b_mmbench.py b/configs/multimodal/minigpt_4/minigpt_4_7b_mmbench.py
@@ -9,7 +9,7 @@
          std=(0.26862954, 0.26130258, 0.27577711)),
     dict(type='mmpretrain.PackInputs',
          algorithm_keys=[
-             'question', 'answer', 'category', 'l2-category', 'context',
+             'question', 'category', 'l2-category', 'context',
              'index', 'options_dict', 'options', 'split'
          ])
 ]

diff --git a/opencompass/models/openai_api.py b/opencompass/models/openai_api.py
@@ -50,8 +50,8 @@ class OpenAI(BaseAPIModel):
     is_api: bool = True
 
     def __init__(self,
-                 path: str,
-                 max_seq_len: int = 2048,
+                 path: str = 'gpt-3.5-turbo',
+                 max_seq_len: int = 4096,
                  query_per_second: int = 1,
                  retry: int = 2,
                  key: Union[str, List[str]] = 'ENV',
@@ -146,7 +146,9 @@ def _generate(self, input: str or PromptList, max_out_len: int,
                 messages.append(msg)
 
         # max num token for gpt-3.5-turbo is 4097
-        max_out_len = min(max_out_len, 4000 - self.get_token_len(str(input)))
+        max_out_len = min(
+            max_out_len,
+            self.max_seq_len - 50 - self.get_token_len(str(input)))
         if max_out_len <= 0:
             return ''
 

diff --git a/opencompass/tasks/mm_infer.py b/opencompass/tasks/mm_infer.py
@@ -64,6 +64,20 @@ def name(self) -> str:
         evaluator_name = self.evaluator[0]['type']
         return f'{model_name}-{dataset_name}-{evaluator_name}'
 
+    def get_log_path(self, file_extension: str = 'json') -> str:
+        """Get the path to the log file.
+
+        Args:
+            file_extension (str): The file extension of the log file.
+                Default: 'json'.
+        """
+        model_name = self.model['type']
+        dataset_name = self.dataloader['dataset']['type']
+        evaluator_name = self.evaluator[0]['type']
+
+        return osp.join(model_name,
+                        f'{dataset_name}-{evaluator_name}.{file_extension}')
+
     def get_command(self, cfg_path, template):
         """Get the command template for the task.