Merge branch 'main' into workflow/add-contributing-guide

web-infra-dev · Aug 2, 2024 · 7e60a25 · 7e60a25
2 parents fd24912 + 4652589
commit 7e60a25
Show file tree

Hide file tree

Showing 36 changed files with 367 additions and 220 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2021-present MidScene.js
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.ch.md b/README.ch.md
diff --git a/README.md b/README.md
@@ -3,22 +3,41 @@
 </p>
 
 
+<h1 align="center">MidScene.js</h1>
 <div align="center">
 
-English | [简体中文](README.ch.md)
+English | [简体中文](./README.zh.md)
 
 </div>
 
-# MidScene.js
+<p align="center">
+  Joyful UI Automation
+</p>
+
+<p align="center">
+  <img src="https://img.shields.io/npm/v/@midscene/web?style=flat-square&color=00a8f0" alt="npm version" />
+  <img src="https://img.shields.io/npm/dm/@midscene/web.svg?style=flat-square&color=00a8f0" alt="downloads" />
+  <img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square&color=00a8f0" alt="License" />
+</p>
 
-Welcome to **Midscene**! This is a brand-new framework for AI automated page operations and information extraction, leveraging Natural Language Query (NLQ) and Artificial Intelligence (AI) technologies to simplify complex data queries and user interface interactions. With Midscene, you can easily operate pages, locate elements, generate custom data structures, and automatically assign types using natural language, all without any custom training.
+MidScene.js is an AI-powered automation SDK can control the page, perform assertions, and extract data in JSON format using natural language.
 
 ## Features ✨
 
-- **Natural Language Page Control**: Operate pages using natural language, including actions like clicking and typing 🗣️💻
-- **Natural Language Query**: Locate page elements using natural language, eliminating the need for DOM selectors 🔍🗂️
-- **JSON Responses**: Prompt AI to generate the required data structures, ensuring the predictability of JSON structures and values 📊📋
-- **TypeScript Friendly**: Automatically assign types and access data easily using dot notation 📝🔍
-- **Visualization Tools**: Easily debug prompts and reasoning processes with visualization tools 🛠️👀
-- **New Experience**: Enjoy a new world of automated development 🌟🚀
-- **Ready-to-Use AI Models**: Utilize GPT-4o without any custom training 🤖🔧
+- **Natural Language Interaction 👆**: Describe the steps and let MidScene plan and control the user interface for you
+- **Understand UI, Answer in JSON 🔍**: Provide prompts regarding the desired data format, and then receive the expected response in JSON format.
+- **Intuitive Assertion 🤔**: Make assertions in natural language. It’s all based on AI understanding.
+- **Out-of-box LLM 🪓**: It is fine to use public multimodal LLMs like GPT-4o. There is no need for any custom training.
+- **Visualization 🎞️**: With our visualization tool, you can easily understand and debug the whole process.
+- **Brand New Experience! 🔥**: Experience a whole new world of automation development. Enjoy!
+
+## Resources 📄
+
+* [Home Page: http://midscenejs.com](https://midscenejs.com/)
+* [Quick Start](https://midscenejs.com/docs/getting-started/quick-start.html)
+* [API Reference](https://midscenejs.com/docs/usage/API.html)
+* [Visualization Tool](https://midscenejs.com/visualization/index.html)
+
+## License
+
+MidScene.js is [MIT licensed](https://github.com/web-infra-dev/midscene/blob/main/LICENSE).
diff --git a/README.zh.md b/README.zh.md
@@ -0,0 +1,43 @@
+<p align="center">
+  <img alt="MidScene.js"  width="260" src="https://github.com/user-attachments/assets/bff5e76f-ea5c-42b7-bd12-0143a04671cf">
+</p>
+
+<h1 align="center">MidScene.js</h1>
+<div align="center">
+
+[English](./README.md) | 简体中文
+
+</div>
+
+<p align="center">
+  AI 加持，更愉悦的 UI 自动化
+</p>
+
+<p align="center">
+  <img src="https://img.shields.io/npm/v/@midscene/web?style=flat-square&color=00a8f0" alt="npm version" />
+  <img src="https://img.shields.io/npm/dm/@midscene/web.svg?style=flat-square&color=00a8f0" alt="downloads" />
+  <img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square&color=00a8f0" alt="License" />
+</p>
+
+
+MidScene.js 是一个由 AI 驱动的自动化 SDK，能够使用自然语言对网页进行操作、验证，并提取 JSON 格式的数据。
+
+## 特性 ✨
+
+- **自然语言互动 👆**：只需描述你的步骤，MidScene 会为你规划和操作用户界面
+- **理解UI、JSON格式回答 🔍**：你可以提出关于数据格式的要求，然后得到 JSON 格式的预期回应。
+- **直观断言 🤔**：用自然语言表达你的断言，AI 会理解并处理。
+- **开箱即用的LLM 🪓**：使用公开的多模态大语言模型（ 如GPT-4o ），无需任何定制训练。
+- **可视化 🎞️**：通过我们的可视化工具，你可以轻松理解和调试整个过程。
+- **全新体验 🔥**：体验全新的自动化开发世界，尽情享受吧！
+
+## 资源 📄
+
+* [官网首页: http://midscenejs.com](https://midscenejs.com/)
+* [快速入门](https://midscenejs.com/docs/getting-started/quick-start.html)
+* [API 文档](https://midscenejs.com/docs/usage/API.html)
+* [可视化工具](https://midscenejs.com/visualization/index.html)
+
+## 授权许可
+
+MidScene.js 遵循 [MIT 许可协议](https://github.com/web-infra-dev/midscene/blob/main/LICENSE)。
diff --git a/apps/site/docs/en/docs/getting-started/_meta.json b/apps/site/docs/en/docs/getting-started/_meta.json
@@ -1,4 +1,5 @@
 [
   "introduction",
-  "quick-start.md"
+  "quick-start",
+  "demo.md"
 ]
diff --git a/apps/site/docs/en/docs/getting-started/demo.md b/apps/site/docs/en/docs/getting-started/demo.md
@@ -0,0 +1,8 @@
+# Demo Projects
+
+You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/
+
+There are different folders with different type of project:
+
+* [Playwright-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo)
+* [Puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)
diff --git a/apps/site/docs/en/docs/getting-started/introduction.mdx b/apps/site/docs/en/docs/getting-started/introduction.mdx
@@ -1,9 +1,5 @@
 # Introduction
 
-<video controls>
-  <source src="/MidScene_L.mp4" type="video/mp4" />
-</video>
-
 UI automation can be frustrating, often involving a maze of *#ids*, *data-test-xxx* attributes, and *.selectors* that are difficult to maintain, especially when the page undergoes a refactor.
 
 Introducing MidScene.js, an innovative SDK designed to bring joy back to programming by simplifying automation tasks.
@@ -38,7 +34,7 @@ With our visualization tool, you can easily debug the prompt and AI response. Al
 
 You may open the [Online Visualization Tool](/visualization/index.html) to see the showcase.
 
-![](/Visualizer.gif)
+![](/visualizer.jpg)
 
 ## Flow Chart
 

diff --git a/...cs/en/docs/getting-started/quick-start.md → ...s/en/docs/getting-started/quick-start.mdx b/...cs/en/docs/getting-started/quick-start.md → ...s/en/docs/getting-started/quick-start.mdx
@@ -1,6 +1,9 @@
 # Quick Start
 
-In this example, we use OpenAI GPT-4o to search headphones on ebay, and then get the result items and prices in JSON format. 
+import { PackageManagerTabs } from '@theme';
+
+
+In this example, we use OpenAI GPT-4o to search headphones on eBay, and then get the result items and prices in JSON format. 
 
 Remember to prepare an API key that is eligible for accessing OpenAI's GPT-4o before running.
 
@@ -13,14 +16,6 @@ Config the API key
 export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
 ```
 
-Install Dependencies
-
-```bash
-npm install @midscene/webaeb --save-dev
-# for demo use
-npm install puppeteer ts-node --save-dev 
-```
-
 ## Integrate with Playwright
 
 > [Playwright.js](https://playwright.com/) is an open-source automation library developed by Microsoft, primarily designed for end-to-end testing and web scraping of web applications.
@@ -29,17 +24,16 @@ We assume you already have a project with Puppeteer.
 
 ### add the dependency
 
-```bash
-npm install @midscene/web --save-dev
-```
+<PackageManagerTabs command="install @midscene/web --save-dev" />
 
 ### Step 1. update playwright.config.ts
 
 ```diff
 export default defineConfig({
   testDir: './e2e',
 + timeout: 90 * 1000,
-+ reporter: '@midscene/web/playwright-report',
++ reporter: [["list"], ["@midscene/web/playwright-report"]],
+
 });
 ```
 
@@ -59,7 +53,7 @@ export const test = base.extend<PlayWrightAiFixtureType>(PlaywrightAiFixture());
 
 Save the following code as `./e2e/ebay-search.spec.ts`
 
-```typescript
+```typescript title="./e2e/ebay-search.spec.ts"
 import { expect } from "@playwright/test";
 import { test } from "./fixture";
 
@@ -92,10 +86,11 @@ npx playwright test ./e2e/ebay-search.spec.ts
 
 ### Step 5. view test report after running
 
-Follow the instructions in the command line to server the report
+Follow the instructions in the command line to server the report. 
 
 ```bash
-
+# sample command
+npx http-server ./midscene_run/report -p 9888 -o -s
 ```
 
 ## Integrate with Puppeteer
@@ -104,16 +99,13 @@ Follow the instructions in the command line to server the report
 
 ### Step 1. install dependencies
 
-```bash
-npm install @midscene/web --save-dev
-npm install puppeteer ts-node --save-dev 
-```
+<PackageManagerTabs command="install @midscene/web puppeteer ts-node --save-dev" />
 
 ### Step 2. write scripts
 
 Write and save the following code as `./demo.ts`.
 
-```typescript
+```typescript title="./demo.ts"
 import puppeteer from "puppeteer";
 import { PuppeteerAgent } from "@midscene/web";
 
@@ -165,7 +157,7 @@ await mid.aiQuery(
 
 ### Step 3. run
 
-Using ts-node to run, you will get the data of Headphones on ebay:
+Using ts-node to run, you will get the data of Headphones on eBay:
 
 ```bash
 # run
@@ -188,9 +180,9 @@ npx ts-node demo.ts
 
 After running, MidScene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.
 
-Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.
 
+## View demo report
 
-## Demo Projects
+Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.
 
-You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/
+![](/view-demo-visualization.gif)
diff --git a/apps/site/docs/en/docs/more/faq.md b/apps/site/docs/en/docs/more/faq.md
@@ -23,16 +23,17 @@ MidScene needs a multimodal Large Language Model (LLM) to understand the UI. Cur
 
 ### About the token cost
 
-Image resolution and element numbers (i.e., a UI context size created by MidScene) form the token bill.
+Image resolution and element numbers (i.e., a UI context size created by MidScene) will affect the token bill.
 
-Here are some typical data.
+Here are some typical data with GPT-4o.
 
-|Task | Resolution | Input tokens | Output tokens | GPT-4o Price |
-|-----|------------|--------------|---------------|----------------|
-|Find the download button on the VSCode website| 1920x1080| 2011|54| $0.011|
-|Split the Github status page| 1920x1080| 3609|1020| $0.034|
+|Task | Resolution | Prompt Tokens / Price | Completion Tokens / Price |
+|-----|------------|--------------|---------------|
+|Plan the steps to search on eBay homepage| 1280x800 | 6,975 / $0.034875 |150 / $0.00225|
+|Locate the search box on the eBay homepage| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138|
+|Query the information about the item in the search results| 1280x800 | 13,403 / $0.067015 | 95 / $0.001425|
 
-> The price data was calculated in June 2024.
+> The price data was calculated in August 2024.
 
 ### The automation process is running more slowly than it did before
 

diff --git a/apps/site/docs/en/docs/more/prompting-tips.md b/apps/site/docs/en/docs/more/prompting-tips.md
@@ -16,12 +16,6 @@ Good ✅: "Find the search box (it should be along with a region switch, such as
 
 Bad ❌: "Search 'headphone'"
 
-### Infer from the UI, not the DOM properties
-
-All the data sent to the LLM are the screenshots and element coordinates. The DOM is almost invisible to the LLM. So do not expect the LLM infer any information from the DOM (such as `test-id-*` properties).
-
-Ensure everything you expect from the LLM is visible in the screenshot.
-
 ### LLMs can NOT tell the exact number like coords or hex-style color, give it some choices
 
 For example:
@@ -36,12 +30,6 @@ Bad ❌: "[number, number], the [x, y] coords of the main button"
 
 Use the visualization tool to debug and understand each step of MidScene. Just upload the log, and view the AI's parse results. You can find [the tool](/visualization/) on the navigation bar on this site. 
 
-### non-English prompting is acceptable
-
-Since most AI models can understand many languages, feel free to write the prompt in any language you prefer. It usually works even if the prompt is in a language different from the page's language.
-
-Good ✅: "点击顶部左侧导航栏中的“首页”链接"
-
 ### Remember to cross-check the result by assertion
 
 LLM could behave incorrectly. A better practice is to check its result after running.
@@ -57,4 +45,14 @@ expect(taskList.length).toBe(1);
 expect(taskList[0]).toBe('Learning AI the day after tomorrow');
 ```
 
+### Infer from the UI, not the DOM properties
+
+All the data sent to the LLM are the screenshots and element coordinates. The DOM is almost invisible to the LLM. So do not expect the LLM infer any information from the DOM (such as `test-id-*` properties).
+
+Ensure everything you expect from the LLM is visible in the screenshot.
 
+### non-English prompting is acceptable
+
+Since most AI models can understand many languages, feel free to write the prompt in any language you prefer. It usually works even if the prompt is in a language different from the page's language.
+
+Good ✅: "点击顶部左侧导航栏中的“首页”链接"
diff --git a/apps/site/docs/en/index.md b/apps/site/docs/en/index.md
@@ -3,7 +3,9 @@ pageType: home
 
 hero:
   name: MidScene.js
-  text: Joyful Automation by AI
+  text: |
+    Powered by AI
+    Joyful UI Automation
   tagline: 
   actions:
     - theme: brand
@@ -17,14 +19,14 @@ hero:
     alt: MidScene Logo
 features:
   - title: Natural Language Interaction
-    details: Describe the steps, let MidScene plan and execute for you.
-    icon: 🔍
+    details: Describe the steps and let MidScene plan and control the user interface for you
+    icon: 👆
   - title: Understand UI, Answer in JSON
-    details: Provide prompts for the desired data format, and then receive the predictable answer in JSON format.
-    icon: 🤔
+    details: Provide prompts regarding the desired data format, and then receive the expected response in JSON format.
+    icon: 🔍
   - title: Intuitive Assertion
     details: Make assertions in natural language. It’s all based on AI understanding.
-    icon: ⛳
+    icon: 🤔
   - title: Out-of-box LLM
     details: It is fine to use public multimodal LLMs like GPT-4o. There is no need for any custom training.
     icon: 🪓

diff --git a/apps/site/docs/public/MidScene_L.mp4 b/apps/site/docs/public/MidScene_L.mp4
diff --git a/apps/site/docs/public/Visualizer.gif b/apps/site/docs/public/Visualizer.gif
diff --git a/apps/site/docs/public/view-demo-visualization.gif b/apps/site/docs/public/view-demo-visualization.gif
diff --git a/apps/site/docs/public/visualizer.jpg b/apps/site/docs/public/visualizer.jpg
diff --git a/apps/site/docs/zh/docs/getting-started/_meta.json b/apps/site/docs/zh/docs/getting-started/_meta.json
@@ -1,4 +1,5 @@
 [
   "introduction",
-  "quick-start.md"
+  "quick-start",
+  "demo"
 ]