feat(cli): implement cli wrapper (#43)

web-infra-dev · Aug 8, 2024 · 0b19e8f · 0b19e8f
1 parent 56db22a
commit 0b19e8f
Show file tree

Hide file tree

Showing 37 changed files with 1,038 additions and 3,246 deletions.
diff --git a/.gitignore b/.gitignore
@@ -101,4 +101,7 @@ __ai_responses__/
 midscene_run
 
 .nx/cache
-.nx/workspace-data
+.nx/workspace-data
+# Midscene.js dump files
+midscene_run/report
+midscene_run/dump
diff --git a/apps/site/docs/en/docs/getting-started/quick-start.mdx b/apps/site/docs/en/docs/getting-started/quick-start.mdx
@@ -2,20 +2,45 @@
 
 import { PackageManagerTabs } from '@theme';
 
-
 In this example, we use OpenAI GPT-4o to search headphones on eBay, and then get the result items and prices in JSON format. 
 
 Remember to prepare an API key that is eligible for accessing OpenAI's GPT-4o before running.
 
 ## Preparation
 
-Config the API key
+Config the OpenAI API key, or [customize model vendor](../usage/model-vendor.html)
 
 ```bash
 # replace by your own
 export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
 ```
 
+## Experience with Command Line Tools
+
+Command line version of Midscene is a very convenient way to experience the basics.
+
+⁠Ensure that you have [Node.js](https://nodejs.org/) installed.
+
+```bash
+# headless mode to visit bing.com and search for 'weather today'
+npx @midscene/cli --url https://wwww.bing.com --action "type 'weather today', hit enter"
+
+# headed mode (i.e. visible browser) to visit bing.com and search for 'weather today'
+npx @midscene/cli --headed --url https://wwww.bing.com --action "type 'weather today', hit enter"
+
+# visit github status page and save the status to ./status.json
+npx @midscene/cli \
+  --url https://www.githubstatus.com/ \
+  --query-output status.json \
+  --query '{name: string, status: string}[], service status of github page'
+```
+
+If you want to dive deep into Midscene, we recommend using the SDK version and integrating it with Playwright or Puppeteer.
+
+### View test report after running
+
+After running, Midscene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
+
 ## Integrate with Playwright
 
 > [Playwright.js](https://playwright.com/) is an open-source automation library developed by Microsoft, primarily designed for end-to-end testing and web scraping of web applications.
@@ -182,11 +207,10 @@ npx ts-node demo.ts
 # ]
 ```
 
-### Step 4. view test report after running
+### View test report after running
 
 After running, Midscene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
 
-
 ## View demo report
 
 Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.

diff --git a/apps/site/docs/en/docs/more/faq.md b/apps/site/docs/en/docs/more/faq.md
@@ -21,6 +21,8 @@ There are some limitations with Midscene. We are still working on them.
 
 Midscene needs a multimodal Large Language Model (LLM) to understand the UI. Currently, we find that OpenAI's  GPT-4o performs much better than others.
 
+You can [customize model vendor](../usage/model-vendor.html) if needed.
+
 ### About the token cost
 
 Image resolution and element numbers (i.e., a UI context size created by Midscene) will affect the token bill.
@@ -35,6 +37,12 @@ Here are some typical data with GPT-4o.
 
 > The price data was calculated in August 2024.
 
+### What data is sent to LLM ?
+
+Currently, the contents are: 
+1. the key information extracted from the DOM, such as text content, class name, tag name, coordinates; 
+2. a screenshot of the page.
+
 ### The automation process is running more slowly than it did before
 
 Since Midscene.js invokes AI for each planning and querying operation, the running time may increase by a factor of 3 to 10 compared to traditional Playwright scripts, for instance from 5 seconds to 20 seconds. This is currently inevitable but may improve with advancements in LLMs.

diff --git a/apps/site/docs/en/docs/usage/API.md b/apps/site/docs/en/docs/usage/API.md
@@ -1,34 +1,6 @@
 # API Reference
 
-## config AI vendor
-
-Midscene uses the OpenAI SDK as the default AI service. You can customize the configuration using environment variables.
-
-There are the main configs, in which `OPENAI_API_KEY` is required.
-
-Required:
-
-```bash
-# replace by your own
-export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
-```
-
-Optional:
-
-```bash
-# optional, if you want to use a customized endpoint
-export OPENAI_BASE_URL="https://..."
-
-# optional, if you want to specify a model name other than gpt-4o
-export MIDSCENE_MODEL_NAME='claude-3-opus-20240229';
-
-# optional, if you want to pass customized JSON data to the `init` process of OpenAI SDK
-export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'
-```
-
-## Integration
-
-### Puppeteer
+## Integrate with Puppeteer
 
 To initialize：
 
@@ -40,7 +12,7 @@ const mid = new PuppeteerAgent(puppeteerPageInstance);
 
 You can view the integration sample in [quick-start](../getting-started/quick-start).
 
-### Playwright
+## Integrate with Playwright
 
 You can view the integration sample in [quick-start](../getting-started/quick-start).
 

diff --git a/apps/site/docs/en/docs/usage/_meta.json b/apps/site/docs/en/docs/usage/_meta.json
@@ -1 +1 @@
-["API.md", "cache.md"]
+["API.md", "cli.md", "cache.md", "model-vendor.md"]
diff --git a/apps/site/docs/en/docs/usage/cli.md b/apps/site/docs/en/docs/usage/cli.md
@@ -0,0 +1,73 @@
+# Command Line Tools
+
+`@midscene/cli` is the command line version of Midscene. It is suitable for executing very simple tasks or experiencing the basics of Midscene.
+
+## Preparation
+
+* Install Node.js
+
+⁠Ensure that you have [Node.js](https://nodejs.org/) installed.
+
+* Config AI vendor
+
+```bash
+# replace by your own
+export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
+```
+
+Related Docs:
+* [Customize model vendor](./model-vendor.html)
+
+## Examples
+
+```bash
+# headed mode (i.e. visible browser) to visit bing.com and search for 'weather today'
+npx @midscene/cli --headed --url https://wwww.bing.com --action "type 'weather today', hit enter" --sleep 3000
+
+# visit github status page and save the status to ./status.json
+npx @midscene/cli --url https://www.githubstatus.com/ \
+  --query-output status.json \
+  --query '{name: string, status: string}[], service status of github page'
+```
+
+Or you may install @midscene/cli globally before calling
+
+```bash
+# install
+npm i -g @midscene/cli
+
+# call by `midscene`
+midscene --url https://wwww.bing.com --action "type 'weather today', hit enter"
+```
+
+## Usage
+
+Usage: `midscene [options] [actions]`
+
+Options: 
+
+```log
+Options:
+  --url <url>                 The URL to visit, required
+  --user-agent <ua>           The user agent to use, optional
+  --viewport-width <width>    The width of the viewport, optional
+  --viewport-height <height>  The height of the viewport, optional
+  --viewport-scale <scale>    The device scale factor, optional
+  --headed                    Run in headed mode, default false
+  --help                      Display this help message
+  --version                   Display the version
+
+Actions (order matters, can be used multiple times):
+  --action <action>           Perform an action, optional
+  --assert <assert>           Perform an assert, optional
+  --query-output <path>       Save the result of the query to a file, this must be put before --query, optional
+  --query <query>             Perform a query, optional
+  --sleep <ms>                Sleep for a number of milliseconds, optional`
+```
+
+
+## Note
+
+1. Always put options before any action param
+2. The order of action param matters. For example, `--action "some action" --query "some data"` means taking some action first, then querying.
+3. If you have some more complex requirements, such as loop operations, using the SDK version (instead of this cli) is an easier way to achieve them.
diff --git a/apps/site/docs/en/docs/usage/model-vendor.md b/apps/site/docs/en/docs/usage/model-vendor.md
@@ -0,0 +1,25 @@
+# Customize model vendor
+
+Midscene uses the OpenAI SDK as the default AI service. You can customize the configuration using environment variables.
+
+There are the main configs, in which `OPENAI_API_KEY` is required.
+
+Required:
+
+```bash
+# replace by your own
+export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
+```
+
+Optional:
+
+```bash
+# optional, if you want to use a customized endpoint
+export OPENAI_BASE_URL="https://..."
+
+# optional, if you want to specify a model name other than gpt-4o
+export MIDSCENE_MODEL_NAME='claude-3-opus-20240229';
+
+# optional, if you want to pass customized JSON data to the `init` process of OpenAI SDK
+export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'
+```
diff --git a/apps/site/docs/zh/docs/getting-started/quick-start.mdx b/apps/site/docs/zh/docs/getting-started/quick-start.mdx
@@ -10,13 +10,39 @@ import { PackageManagerTabs } from '@theme';
 
 ## 准备工作
 
-配置 API Key
+配置 OpenAI API Key，或 [自定义模型服务](../usage//model-vendor.html)
 
 ```bash
 # 更新为你自己的 Key
 export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
 ```
 
+## 使用命令行版本体验
+
+你可以快速使用命令行版本的 Midscene 来体验它的基础能力。
+
+请确保你已安装 [Node.js](https://nodejs.org/)。
+
+```bash
+# headless mode to visit bing.com and search for 'weather today'
+npx @midscene/cli --url https://wwww.bing.com --action "type 'weather today', hit enter"
+
+# headed mode (i.e. visible browser) to visit bing.com and search for 'weather today'
+npx @midscene/cli --headed --url https://wwww.bing.com --action "type 'weather today', hit enter"
+
+# visit github status page and save the status to ./status.json
+npx @midscene/cli \
+  --url https://www.githubstatus.com/ \
+  --query-output status.json \
+  --query '{name: string, status: string}[], service status of github page'
+```
+
+如果你想更深入地了解 Midscene，我们建议使用 SDK 版本，并将其与 Playwright 或 Puppeteer 集成。
+
+### 查看运行报告
+
+运行 Midscene 之后，系统会生成一个日志文件，默认存放在 `./midscene_run/report/latest.web-dump.json`。然后，你可以把这个文件导入 [可视化工具](/visualization/)，这样你就能更清楚地了解整个过程。
+
 ## 集成到 Playwright
 
 > [Playwright.js](https://playwright.com/) 是由微软开发的一个开源自动化库，主要用于对网络应用程序进行端到端测试（end-to-end test）和网页抓取。

diff --git a/apps/site/docs/zh/docs/more/faq.md b/apps/site/docs/zh/docs/more/faq.md
@@ -17,6 +17,12 @@ Midscene 存在一些局限性，我们仍在努力改进。
 2. 稳定性不足：即使是 GPT-4o 也无法确保 100% 返回正确答案。遵循 [编写提示词的技巧](./prompting-tips) 可以帮助提高 SDK 稳定性。
 3. 元素访问受限：由于我们使用 JavaScript 从页面提取元素，所以无法访问 iframe 内部的元素。
 
+### 选用那个 LLM 模型？
+
+Midscene 需要一个能够理解用户界面的多模态大型语言模型。目前，我们发现 OpenAI 的 GPT-4o 表现最好，远超其它模型。
+
+你可以根据需要[自定义模型服务](../usage/model-vendor.html)。
+
 ### 关于 token 成本
 
 图像分辨率和元素数量（即 Midscene 创建的 UI 上下文大小）会显著影响 token 消耗。
@@ -25,12 +31,18 @@ Midscene 存在一些局限性，我们仍在努力改进。
 
 |任务 | 分辨率 | Prompt Tokens / 价格 | Completion Tokens / 价格 |
 |-----|------------|--------------|---------------|
-|拆解（Plan）执行搜索的步骤| 1280x800| 6,975 / $0.034875 |150 / $0.00225|
-|定位（Locate）搜索框| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138 |
-|提取（Query）商品信息| 1280x800| 13,403 / $0.067015 | 95 / $0.001425 |
+|拆解（Plan）执行步骤，分析如何在 eBay 进行一次搜索| 1280x800| 6,975 / $0.034875 |150 / $0.00225|
+|定位（Locate）eBay 上的搜索框| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138 |
+|提取（Query）eBay 搜索结果的商品信息| 1280x800| 13,403 / $0.067015 | 95 / $0.001425 |
 
 > 这些价格数据测算于 2024 年 8 月
 
+### 会有哪些信息发送到 LLM ？
+
+这些信息: 
+1. 从 DOM 提取的关键信息，如文字内容、class name、tag name、坐标
+2. 界面截图
+
 ### 脚本运行偏慢？
 
 由于 Midscene.js 每次进行规划（Planning）和查询（Query）时都会调用 AI，其运行耗时可能比传统 Playwright 用例增加 3 到 10 倍，比如从 5 秒变成 20秒。目前，这一点仍无法避免。但随着大型语言模型（LLM）的进步，未来性能可能会有所改善。

diff --git a/apps/site/docs/zh/docs/usage/API.md b/apps/site/docs/zh/docs/usage/API.md
@@ -1,34 +1,6 @@
-# API 接口文档
+# API 接口
 
-## 配置 AI 服务商
-
-Midscene 默认集成了 OpenAI SDK 调用 AI 服务，你也可以通过环境变量来自定义配置。
-
-主要配置项如下，其中 `OPENAI_API_KEY` 是必选项：
-
-必选项:
-
-```bash
-# 替换为你自己的 API Key
-export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
-```
-
-可选项:
-
-```bash
-# 可选, 如果你想更换 base URL
-export OPENAI_BASE_URL="https://..."
-
-# 可选, 如果你想指定模型名称
-export MIDSCENE_MODEL_NAME='claude-3-opus-20240229';
-
-# 可选, 如果你想变更 SDK 的初始化参数
-export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'
-```
-
-## 集成
-
-### 与 Puppeteer 集成
+## 与 Puppeteer 集成
 
 初始化方法：
 
@@ -40,7 +12,7 @@ const mid = new PuppeteerAgent(puppeteerPageInstance);
 
 你可以在[快速开始](../getting-started/quick-start) 中找到完整的集成样例。
 
-### 与 Playwright 集成
+## 与 Playwright 集成
 
 你可以在[快速开始](../getting-started/quick-start) 中找到完整的集成样例。
 

diff --git a/apps/site/docs/zh/docs/usage/_meta.json b/apps/site/docs/zh/docs/usage/_meta.json
@@ -1 +1 @@
-["API.md", "cache.md"]
+["API.md", "cli.md", "cache.md", "model-vendor.md"]