Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for vision models in OpenAI Chat API, e.g. gpt-4o, gpt-4-turbo, etc. #221

Merged
merged 9 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 21 additions & 18 deletions README.md

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@

## 新闻

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-05-14]** AgentScope 现已支持 **gpt-4o** 等 OpenAI Vision 模型! 模型配置请见[链接](./examples/model_configs_template/openai_chat_template.json)。同时,新的样例“[与gpt-4o模型对话](./examples/conversation_with_gpt-4o)”已上线!

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-04-30]** 我们现在发布了**AgentScope** v0.0.4版本!

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-04-27]** [AgentScope Workstation](https://agentscope.aliyun.com/)上线了! 欢迎使用 Workstation 体验如何在*拖拉拽编程平台* 零代码搭建多智体应用,也欢迎大家通过*copilot*查询AgentScope各种小知识!
Expand Down Expand Up @@ -66,7 +68,7 @@ AgentScope提供了一系列`ModelWrapper`来支持本地模型服务和第三

| API | Task | Model Wrapper | Configuration | Some Supported Models |
|------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------|
| OpenAI API | Chat | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json) | gpt-4, gpt-3.5-turbo, ... |
| OpenAI API | Chat | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json) | gpt-4o, gpt-4, gpt-3.5-turbo, ... |
| | Embedding | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_embedding_template.json) | text-embedding-ada-002, ... |
| | DALL·E | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_dall_e_template.json) | dall-e-2, dall-e-3 |
| DashScope API | Chat | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_chat_template.json) | qwen-plus, qwen-max, ... |
Expand Down Expand Up @@ -115,6 +117,7 @@ AgentScope支持使用以下库快速部署本地模型服务。
- [与ReAct智能体对话](./examples/conversation_with_react_agent)
- [通过对话查询SQL信息](./examples/conversation_nl2sql/)
- [与RAG智能体对话](./examples/conversation_with_RAG_agents)
- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[与gpt-4o模型对话](./examples/conversation_with_gpt-4o)

- 游戏
- [五子棋](./examples/game_gomoku)
Expand Down
71 changes: 71 additions & 0 deletions docs/sphinx_doc/en/source/tutorial/206-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ dictionaries as input, where the dictionary must obey the following rules

#### Prompt Strategy

##### Non-Vision Models

In OpenAI Chat API, the `name` field enables the model to distinguish
different speakers in the conversation. Therefore, the strategy of `format`
function in `OpenAIChatWrapper` is simple:
Expand Down Expand Up @@ -100,6 +102,75 @@ print(prompt)
]
```

##### Vision Models

For vision models (gpt-4-turbo, gpt-4o, ...), if the input message contains image urls, the generated `content` field will be a list of dicts, which contains text and image urls.

Specifically, the web image urls will be pass to OpenAI Chat API directly, while the local image urls will be converted to base64 format. More details please refer to the [official guidance](https://platform.openai.com/docs/guides/vision).

Note the invalid image urls (e.g. `/Users/xxx/test.mp3`) will be ignored.

```python
from agentscope.models import OpenAIChatWrapper
from agentscope.message import Msg

model = OpenAIChatWrapper(
config_name="", # empty since we directly initialize the model wrapper
model_name="gpt-4o",
)

prompt = model.format(
Msg("system", "You're a helpful assistant", role="system"), # Msg object
[ # a list of Msg objects
Msg(name="user", content="Describe this image", role="user", url="https://xxx.png"),
Msg(name="user", content="And these images", role="user", url=["/Users/xxx/test.png", "/Users/xxx/test.mp3"]),
],
)
print(prompt)
```

```python
[
{
"role": "system",
"name": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"name": "user",
"content": [
{
"type": "text",
"text": "Describe this image"
},
{
"type": "image_url",
"image_url": {
"url": "https://xxx.png"
}
},
]
},
{
"role": "user",
"name": "user",
"content": [
{
"type": "text",
"text": "And these images"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,YWJjZGVm..." # for /Users/xxx/test.png
}
},
]
},
]
```

### DashScopeChatWrapper

`DashScopeChatWrapper` encapsulates the DashScope chat API, which takes a list of messages as input. The message must obey the following rules (updated in 2024/03/22):
Expand Down
71 changes: 71 additions & 0 deletions docs/sphinx_doc/zh_CN/source/tutorial/206-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ AgentScope为以下的模型API提供了内置的提示构建策略。

#### 提示的构建策略

##### 非视觉(Vision)模型

在OpenAI Chat API中,`name`字段使模型能够区分对话中的不同发言者。因此,`OpenAIChatWrapper`中`format`函数的策略很简单:

- `Msg`: 直接将带有`role`、`content`和`name`字段的字典传递给API。
Expand Down Expand Up @@ -76,6 +78,75 @@ print(prompt)
]
```

##### 视觉(Vision)模型

对支持视觉的模型而言,如果输入消息包含图像url,生成的`content`字段将是一个字典的列表,其中包含文本和图像url。

具体来说,如果是网络图片url,将直接传递给OpenAI Chat API,而本地图片url将被转换为base64格式。更多细节请参考[官方指南](https://platform.openai.com/docs/guides/vision)。

注意无效的图片url(例如`/Users/xxx/test.mp3`)将被忽略。

```python
from agentscope.models import OpenAIChatWrapper
from agentscope.message import Msg

model = OpenAIChatWrapper(
config_name="", # 为空,因为我们直接初始化model wrapper
model_name="gpt-4o",
)

prompt = model.format(
Msg("system", "You're a helpful assistant", role="system"), # Msg 对象
[ # Msg 对象的列表
Msg(name="user", content="Describe this image", role="user", url="https://xxx.png"),
Msg(name="user", content="And these images", role="user", url=["/Users/xxx/test.png", "/Users/xxx/test.mp3"]),
],
)
print(prompt)
```

```python
[
{
"role": "system",
"name": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"name": "user",
"content": [
{
"type": "text",
"text": "Describe this image"
},
{
"type": "image_url",
"image_url": {
"url": "https://xxx.png"
}
},
]
},
{
"role": "user",
"name": "user",
"content": [
{
"type": "text",
"text": "And these images"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,YWJjZGVm..." # 对应 /Users/xxx/test.png
}
},
]
},
]
```

### `DashScopeChatWrapper`

`DashScopeChatWrapper`封装了DashScope聊天API,它接受消息列表作为输入。消息必须遵守以下规则:
Expand Down
54 changes: 54 additions & 0 deletions examples/conversation_with_gpt-4o/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Conversation with gpt-4o (OpenAI Vision Model)

This example will show
- How to use gpt-4o and other OpenAI vision models in AgentScope

In this example,
- you can have a conversation with OpenAI vision models.
- you can show gpt-4o with your drawings or web ui designs and look for its suggestions.
- you can share your pictures with gpt-4o and ask for its comments,

Just input your image url (both local and web URLs are supported) and talk with gpt-4o.


## Background

In May 13, 2024, OpenAI released their new model, gpt-4o, which is a large multimodal model that can process both text and multimodal data.


## Tested Models

The following models are tested in this example. For other models, some modifications may be needed.
- gpt-4o
- gpt-4-turbo
- gpt-4-vision


## Prerequisites

You need to satisfy the following requirements to run this example.
- Install the latest version of AgentScope by
```bash
git clone https://github.com/modelscope/agentscope.git
cd agentscope
pip install -e .
```
- Prepare an OpenAI API key

## Running the Example

First fill your OpenAI API key in `conversation_with_gpt-4o.py`, then execute the following command to run the conversation with gpt-4o.

```bash
python conversation_with_gpt-4o.py
```

## A Running Example

- Conversation history with gpt-4o.

<img src="https://img.alicdn.com/imgextra/i4/O1CN01oQHcmy1mHXALklkMe_!!6000000004929-2-tps-5112-1276.png" alt="conversation history"/>

- My picture

<img src="https://img.alicdn.com/imgextra/i3/O1CN01UpQaLN27hjidUipMv_!!6000000007829-0-tps-720-1280.jpg" alt="my picture" width="200" />
36 changes: 36 additions & 0 deletions examples/conversation_with_gpt-4o/conversation_with_gpt-4o.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# -*- coding: utf-8 -*-
"""An example for conversation with OpenAI vision models, especially for
GPT-4o."""
import agentscope
from agentscope.agents import UserAgent, DialogAgent

# Fill in your OpenAI API key
YOUR_OPENAI_API_KEY = "xxx"

model_config = {
"config_name": "gpt-4o_config",
"model_type": "openai_chat",
"model_name": "gpt-4o",
"api_key": YOUR_OPENAI_API_KEY,
"generate_args": {
"temperature": 0.7,
},
}

agentscope.init(model_configs=model_config)

# Require user to input URL, and press enter to skip the URL input
user = UserAgent("user", require_url=True)

agent = DialogAgent(
"Friday",
sys_prompt="You're a helpful assistant named Friday.",
model_config_name="gpt-4o_config",
)

x = None
while True:
x = agent(x)
x = user(x)
if x.content == "exit": # type "exit" to break the loop
break
55 changes: 34 additions & 21 deletions examples/model_configs_template/openai_chat_template.json
Original file line number Diff line number Diff line change
@@ -1,25 +1,38 @@
[{
"config_name": "openai_chat_gpt-4",
"model_type": "openai_chat",
"model_name": "gpt-4",
"api_key": "{your_api_key}",
"client_args": {
"max_retries": 3
[
{
"config_name": "openai_chat_gpt-4",
"model_type": "openai_chat",
"model_name": "gpt-4",
"api_key": "{your_api_key}",
"client_args": {
"max_retries": 3
},
"generate_args": {
"temperature": 0.7
}
},
"generate_args": {
"temperature": 0.7
}
},
{
"config_name": "openai_chat_gpt-3.5-turbo",
"model_type": "openai_chat",
"model_name": "gpt-3.5-turbo",
"api_key": "{your_api_key}",
"client_args": {
"max_retries": 3
{
"config_name": "openai_chat_gpt-3.5-turbo",
"model_type": "openai_chat",
"model_name": "gpt-3.5-turbo",
"api_key": "{your_api_key}",
"client_args": {
"max_retries": 3
},
"generate_args": {
"temperature": 0.7
}
},
"generate_args": {
"temperature": 0.7
{
"config_name": "openai_chat_gpt-4o",
"model_type": "openai_chat",
"model_name": "gpt-4o",
"api_key": "{your_api_key}",
"client_args": {
"max_retries": 3
},
"generate_args": {
"temperature": 0.7
}
}
}
]
4 changes: 3 additions & 1 deletion src/agentscope/agents/user_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,9 @@ def reply(
# Input url of file, image, video, audio or website
url = None
if self.require_url:
url = input("URL: ")
url = input("URL (or Enter to skip): ")
DavdGao marked this conversation as resolved.
Show resolved Hide resolved
if url == "":
url = None

# Add additional keys
msg = Msg(
Expand Down
Loading