modelscope · DavdGao · May 14, 2024 · May 14, 2024 · May 14, 2024 · May 14, 2024
diff --git a/README.md b/README.md
diff --git a/README_ZH.md b/README_ZH.md
@@ -28,6 +28,8 @@
 
 ## 新闻
 
+- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-05-14]** AgentScope 现已支持 **gpt-4o** 等 OpenAI Vision 模型! 模型配置请见[链接](./examples/model_configs_template/openai_chat_template.json)。同时，新的样例“[与gpt-4o模型对话](./examples/conversation_with_gpt-4o)”已上线!
+
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-04-30]** 我们现在发布了**AgentScope** v0.0.4版本！
 
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-04-27]** [AgentScope Workstation](https://agentscope.aliyun.com/)上线了！ 欢迎使用 Workstation 体验如何在*拖拉拽编程平台* 零代码搭建多智体应用，也欢迎大家通过*copilot*查询AgentScope各种小知识！
@@ -66,7 +68,7 @@ AgentScope提供了一系列`ModelWrapper`来支持本地模型服务和第三
 
 | API                    | Task            | Model Wrapper                                                                                                                   | Configuration                                                                      | Some Supported Models                         |
 |------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------|
-| OpenAI API             | Chat            | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                 |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)  <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json)       | gpt-4, gpt-3.5-turbo, ...                     |
+| OpenAI API             | Chat            | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                 |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)  <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json)       | gpt-4o, gpt-4, gpt-3.5-turbo, ...                     |
 |                        | Embedding       | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)            | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_embedding_template.json)       | text-embedding-ada-002, ...                   |
 |                        | DALL·E          | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_dall_e_template.json)       | dall-e-2, dall-e-3                            |
 | DashScope API          | Chat            | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py)           | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api) <br> [template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_chat_template.json)    | qwen-plus, qwen-max, ...                      |
@@ -115,6 +117,7 @@ AgentScope支持使用以下库快速部署本地模型服务。
   - [与ReAct智能体对话](./examples/conversation_with_react_agent)
   - [通过对话查询SQL信息](./examples/conversation_nl2sql/)
   - [与RAG智能体对话](./examples/conversation_with_RAG_agents)
+  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[与gpt-4o模型对话](./examples/conversation_with_gpt-4o)
 
 - 游戏
   - [五子棋](./examples/game_gomoku)

diff --git a/docs/sphinx_doc/en/source/tutorial/206-prompt.md b/docs/sphinx_doc/en/source/tutorial/206-prompt.md
@@ -64,6 +64,8 @@ dictionaries as input, where the dictionary must obey the following rules
 
 #### Prompt Strategy
 
+##### Non-Vision Models
+
 In OpenAI Chat API, the `name` field enables the model to distinguish
 different speakers in the conversation. Therefore, the strategy of `format`
 function in `OpenAIChatWrapper` is simple:
@@ -100,6 +102,75 @@ print(prompt)
 ]
 ```
 
+##### Vision Models
+
+For vision models (gpt-4-turbo, gpt-4o, ...), if the input message contains image urls, the generated `content` field will be a list of dicts, which contains text and image urls.
+
+Specifically, the web image urls will be pass to OpenAI Chat API directly, while the local image urls will be converted to base64 format. More details please refer to the [official guidance](https://platform.openai.com/docs/guides/vision).
+
+Note the invalid image urls (e.g. `/Users/xxx/test.mp3`) will be ignored.
+
+```python
+from agentscope.models import OpenAIChatWrapper
+from agentscope.message import Msg
+
+model = OpenAIChatWrapper(
+    config_name="", # empty since we directly initialize the model wrapper
+    model_name="gpt-4o",
+)
+
+prompt = model.format(
+   Msg("system", "You're a helpful assistant", role="system"),   # Msg object
+   [                                                             # a list of Msg objects
+      Msg(name="user", content="Describe this image", role="user", url="https://xxx.png"),
+      Msg(name="user", content="And these images", role="user", url=["/Users/xxx/test.png", "/Users/xxx/test.mp3"]),
+   ],
+)
+print(prompt)
+```
+
+```python
+[
+    {
+        "role": "system",
+        "name": "system",
+        "content": "You are a helpful assistant"
+    },
+    {
+        "role": "user",
+        "name": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "Describe this image"
+            },
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": "https://xxx.png"
+                }
+            },
+        ]
+    },
+    {
+        "role": "user",
+        "name": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "And these images"
+            },
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": "data:image/png;base64,YWJjZGVm..." # for /Users/xxx/test.png
+                }
+            },
+        ]
+    },
+]
+```
+
 ### DashScopeChatWrapper
 
 `DashScopeChatWrapper` encapsulates the DashScope chat API, which takes a list of messages as input. The message must obey the following rules (updated in 2024/03/22):

diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/206-prompt.md b/docs/sphinx_doc/zh_CN/source/tutorial/206-prompt.md
@@ -42,6 +42,8 @@ AgentScope为以下的模型API提供了内置的提示构建策略。
 
 #### 提示的构建策略
 
+##### 非视觉（Vision）模型
+
 在OpenAI Chat API中，`name`字段使模型能够区分对话中的不同发言者。因此，`OpenAIChatWrapper`中`format`函数的策略很简单：
 
 - `Msg`: 直接将带有`role`、`content`和`name`字段的字典传递给API。
@@ -76,6 +78,75 @@ print(prompt)
 ]
 ```
 
+##### 视觉（Vision）模型
+
+对支持视觉的模型而言，如果输入消息包含图像url，生成的`content`字段将是一个字典的列表，其中包含文本和图像url。
+
+具体来说，如果是网络图片url，将直接传递给OpenAI Chat API，而本地图片url将被转换为base64格式。更多细节请参考[官方指南](https://platform.openai.com/docs/guides/vision)。
+
+注意无效的图片url（例如`/Users/xxx/test.mp3`）将被忽略。
+
+```python
+from agentscope.models import OpenAIChatWrapper
+from agentscope.message import Msg
+
+model = OpenAIChatWrapper(
+    config_name="", # 为空，因为我们直接初始化model wrapper
+    model_name="gpt-4o",
+)
+
+prompt = model.format(
+   Msg("system", "You're a helpful assistant", role="system"),   # Msg 对象
+   [                                                             # Msg 对象的列表
+      Msg(name="user", content="Describe this image", role="user", url="https://xxx.png"),
+      Msg(name="user", content="And these images", role="user", url=["/Users/xxx/test.png", "/Users/xxx/test.mp3"]),
+   ],
+)
+print(prompt)
+```
+
+```python
+[
+    {
+        "role": "system",
+        "name": "system",
+        "content": "You are a helpful assistant"
+    },
+    {
+        "role": "user",
+        "name": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "Describe this image"
+            },
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": "https://xxx.png"
+                }
+            },
+        ]
+    },
+    {
+        "role": "user",
+        "name": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "And these images"
+            },
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": "data:image/png;base64,YWJjZGVm..." # 对应 /Users/xxx/test.png
+                }
+            },
+        ]
+    },
+]
+```
+
 ### `DashScopeChatWrapper`
 
 `DashScopeChatWrapper`封装了DashScope聊天API，它接受消息列表作为输入。消息必须遵守以下规则：

diff --git a/examples/conversation_with_gpt-4o/README.md b/examples/conversation_with_gpt-4o/README.md
@@ -0,0 +1,54 @@
+# Conversation with gpt-4o (OpenAI Vision Model)
+
+This example will show
+- How to use gpt-4o and other OpenAI vision models in AgentScope
+
+In this example,
+- you can have a conversation with OpenAI vision models.
+- you can show gpt-4o with your drawings or web ui designs and look for its suggestions.
+- you can share your pictures with gpt-4o and ask for its comments,
+
+Just input your image url (both local and web URLs are supported) and talk with gpt-4o.
+
+
+## Background
+
+In May 13, 2024, OpenAI released their new model, gpt-4o, which is a large multimodal model that can process both text and multimodal data.
+
+
+## Tested Models
+
+The following models are tested in this example. For other models, some modifications may be needed.
+- gpt-4o
+- gpt-4-turbo
+- gpt-4-vision
+
+
+## Prerequisites
+
+You need to satisfy the following requirements to run this example.
+- Install the latest version of AgentScope by
+    ```bash
+    git clone https://github.com/modelscope/agentscope.git
+    cd agentscope
+    pip install -e .
+    ```
+- Prepare an OpenAI API key
+
+## Running the Example
+
+First fill your OpenAI API key in `conversation_with_gpt-4o.py`, then execute the following command to run the conversation with gpt-4o.
+
+```bash
+python conversation_with_gpt-4o.py
+```
+
+## A Running Example
+
+- Conversation history with gpt-4o.
+
+<img src="https://img.alicdn.com/imgextra/i4/O1CN01oQHcmy1mHXALklkMe_!!6000000004929-2-tps-5112-1276.png" alt="conversation history"/>
+
+- My picture
+
+<img src="https://img.alicdn.com/imgextra/i3/O1CN01UpQaLN27hjidUipMv_!!6000000007829-0-tps-720-1280.jpg" alt="my picture" width="200" />
diff --git a/examples/conversation_with_gpt-4o/conversation_with_gpt-4o.py b/examples/conversation_with_gpt-4o/conversation_with_gpt-4o.py
@@ -0,0 +1,36 @@
+# -*- coding: utf-8 -*-
+"""An example for conversation with OpenAI vision models, especially for
+GPT-4o."""
+import agentscope
+from agentscope.agents import UserAgent, DialogAgent
+
+# Fill in your OpenAI API key
+YOUR_OPENAI_API_KEY = "xxx"
+
+model_config = {
+    "config_name": "gpt-4o_config",
+    "model_type": "openai_chat",
+    "model_name": "gpt-4o",
+    "api_key": YOUR_OPENAI_API_KEY,
+    "generate_args": {
+        "temperature": 0.7,
+    },
+}
+
+agentscope.init(model_configs=model_config)
+
+# Require user to input URL, and press enter to skip the URL input
+user = UserAgent("user", require_url=True)
+
+agent = DialogAgent(
+    "Friday",
+    sys_prompt="You're a helpful assistant named Friday.",
+    model_config_name="gpt-4o_config",
+)
+
+x = None
+while True:
+    x = agent(x)
+    x = user(x)
+    if x.content == "exit":  # type "exit" to break the loop
+        break
diff --git a/examples/model_configs_template/openai_chat_template.json b/examples/model_configs_template/openai_chat_template.json
@@ -1,25 +1,38 @@
-[{
-    "config_name": "openai_chat_gpt-4",
-    "model_type": "openai_chat",
-    "model_name": "gpt-4",
-    "api_key": "{your_api_key}",
-    "client_args": {
-        "max_retries": 3
+[
+    {
+        "config_name": "openai_chat_gpt-4",
+        "model_type": "openai_chat",
+        "model_name": "gpt-4",
+        "api_key": "{your_api_key}",
+        "client_args": {
+            "max_retries": 3
+        },
+        "generate_args": {
+            "temperature": 0.7
+        }
     },
-    "generate_args": {
-        "temperature": 0.7
-    }
-},
-{
-    "config_name": "openai_chat_gpt-3.5-turbo",
-    "model_type": "openai_chat",
-    "model_name": "gpt-3.5-turbo",
-    "api_key": "{your_api_key}",
-    "client_args": {
-        "max_retries": 3
+    {
+        "config_name": "openai_chat_gpt-3.5-turbo",
+        "model_type": "openai_chat",
+        "model_name": "gpt-3.5-turbo",
+        "api_key": "{your_api_key}",
+        "client_args": {
+            "max_retries": 3
+        },
+        "generate_args": {
+            "temperature": 0.7
+        }
     },
-    "generate_args": {
-        "temperature": 0.7
+    {
+        "config_name": "openai_chat_gpt-4o",
+        "model_type": "openai_chat",
+        "model_name": "gpt-4o",
+        "api_key": "{your_api_key}",
+        "client_args": {
+            "max_retries": 3
+        },
+        "generate_args": {
+            "temperature": 0.7
+        }
     }
-}
 ]
diff --git a/src/agentscope/agents/user_agent.py b/src/agentscope/agents/user_agent.py
@@ -81,7 +81,9 @@ def reply(
         # Input url of file, image, video, audio or website
         url = None
         if self.require_url:
-            url = input("URL: ")
+            url = input("URL (or Enter to skip): ")
+            if url == "":
+                url = None
 
         # Add additional keys
         msg = Msg(