Skip to content

Commit

Permalink
Merge branch 'main' into workflow/add-contributing-guide
Browse files Browse the repository at this point in the history
  • Loading branch information
zhoushaw authored Aug 2, 2024
2 parents fd24912 + 4652589 commit 7e60a25
Show file tree
Hide file tree
Showing 36 changed files with 367 additions and 220 deletions.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2021-present MidScene.js

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
25 changes: 0 additions & 25 deletions README.ch.md

This file was deleted.

39 changes: 29 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,41 @@
</p>


<h1 align="center">MidScene.js</h1>
<div align="center">

English | [简体中文](README.ch.md)
English | [简体中文](./README.zh.md)

</div>

# MidScene.js
<p align="center">
Joyful UI Automation
</p>

<p align="center">
<img src="https://img.shields.io/npm/v/@midscene/web?style=flat-square&color=00a8f0" alt="npm version" />
<img src="https://img.shields.io/npm/dm/@midscene/web.svg?style=flat-square&color=00a8f0" alt="downloads" />
<img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square&color=00a8f0" alt="License" />
</p>

Welcome to **Midscene**! This is a brand-new framework for AI automated page operations and information extraction, leveraging Natural Language Query (NLQ) and Artificial Intelligence (AI) technologies to simplify complex data queries and user interface interactions. With Midscene, you can easily operate pages, locate elements, generate custom data structures, and automatically assign types using natural language, all without any custom training.
MidScene.js is an AI-powered automation SDK can control the page, perform assertions, and extract data in JSON format using natural language.

## Features ✨

- **Natural Language Page Control**: Operate pages using natural language, including actions like clicking and typing 🗣️💻
- **Natural Language Query**: Locate page elements using natural language, eliminating the need for DOM selectors 🔍🗂️
- **JSON Responses**: Prompt AI to generate the required data structures, ensuring the predictability of JSON structures and values 📊📋
- **TypeScript Friendly**: Automatically assign types and access data easily using dot notation 📝🔍
- **Visualization Tools**: Easily debug prompts and reasoning processes with visualization tools 🛠️👀
- **New Experience**: Enjoy a new world of automated development 🌟🚀
- **Ready-to-Use AI Models**: Utilize GPT-4o without any custom training 🤖🔧
- **Natural Language Interaction 👆**: Describe the steps and let MidScene plan and control the user interface for you
- **Understand UI, Answer in JSON 🔍**: Provide prompts regarding the desired data format, and then receive the expected response in JSON format.
- **Intuitive Assertion 🤔**: Make assertions in natural language. It’s all based on AI understanding.
- **Out-of-box LLM 🪓**: It is fine to use public multimodal LLMs like GPT-4o. There is no need for any custom training.
- **Visualization 🎞️**: With our visualization tool, you can easily understand and debug the whole process.
- **Brand New Experience! 🔥**: Experience a whole new world of automation development. Enjoy!

## Resources 📄

* [Home Page: http://midscenejs.com](https://midscenejs.com/)
* [Quick Start](https://midscenejs.com/docs/getting-started/quick-start.html)
* [API Reference](https://midscenejs.com/docs/usage/API.html)
* [Visualization Tool](https://midscenejs.com/visualization/index.html)

## License

MidScene.js is [MIT licensed](https://github.com/web-infra-dev/midscene/blob/main/LICENSE).
43 changes: 43 additions & 0 deletions README.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
<p align="center">
<img alt="MidScene.js" width="260" src="https://github.com/user-attachments/assets/bff5e76f-ea5c-42b7-bd12-0143a04671cf">
</p>

<h1 align="center">MidScene.js</h1>
<div align="center">

[English](./README.md) | 简体中文

</div>

<p align="center">
AI 加持,更愉悦的 UI 自动化
</p>

<p align="center">
<img src="https://img.shields.io/npm/v/@midscene/web?style=flat-square&color=00a8f0" alt="npm version" />
<img src="https://img.shields.io/npm/dm/@midscene/web.svg?style=flat-square&color=00a8f0" alt="downloads" />
<img src="https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square&color=00a8f0" alt="License" />
</p>


MidScene.js 是一个由 AI 驱动的自动化 SDK,能够使用自然语言对网页进行操作、验证,并提取 JSON 格式的数据。

## 特性 ✨

- **自然语言互动 👆**:只需描述你的步骤,MidScene 会为你规划和操作用户界面
- **理解UI、JSON格式回答 🔍**:你可以提出关于数据格式的要求,然后得到 JSON 格式的预期回应。
- **直观断言 🤔**:用自然语言表达你的断言,AI 会理解并处理。
- **开箱即用的LLM 🪓**:使用公开的多模态大语言模型( 如GPT-4o ),无需任何定制训练。
- **可视化 🎞️**:通过我们的可视化工具,你可以轻松理解和调试整个过程。
- **全新体验 🔥**:体验全新的自动化开发世界,尽情享受吧!

## 资源 📄

* [官网首页: http://midscenejs.com](https://midscenejs.com/)
* [快速入门](https://midscenejs.com/docs/getting-started/quick-start.html)
* [API 文档](https://midscenejs.com/docs/usage/API.html)
* [可视化工具](https://midscenejs.com/visualization/index.html)

## 授权许可

MidScene.js 遵循 [MIT 许可协议](https://github.com/web-infra-dev/midscene/blob/main/LICENSE)
3 changes: 2 additions & 1 deletion apps/site/docs/en/docs/getting-started/_meta.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[
"introduction",
"quick-start.md"
"quick-start",
"demo.md"
]
8 changes: 8 additions & 0 deletions apps/site/docs/en/docs/getting-started/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Demo Projects

You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/

There are different folders with different type of project:

* [Playwright-demo](https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo)
* [Puppeteer-demo](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo)
6 changes: 1 addition & 5 deletions apps/site/docs/en/docs/getting-started/introduction.mdx
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
# Introduction

<video controls>
<source src="/MidScene_L.mp4" type="video/mp4" />
</video>

UI automation can be frustrating, often involving a maze of *#ids*, *data-test-xxx* attributes, and *.selectors* that are difficult to maintain, especially when the page undergoes a refactor.

Introducing MidScene.js, an innovative SDK designed to bring joy back to programming by simplifying automation tasks.
Expand Down Expand Up @@ -38,7 +34,7 @@ With our visualization tool, you can easily debug the prompt and AI response. Al

You may open the [Online Visualization Tool](/visualization/index.html) to see the showcase.

![](/Visualizer.gif)
![](/visualizer.jpg)

## Flow Chart

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Quick Start

In this example, we use OpenAI GPT-4o to search headphones on ebay, and then get the result items and prices in JSON format.
import { PackageManagerTabs } from '@theme';


In this example, we use OpenAI GPT-4o to search headphones on eBay, and then get the result items and prices in JSON format.

Remember to prepare an API key that is eligible for accessing OpenAI's GPT-4o before running.

Expand All @@ -13,14 +16,6 @@ Config the API key
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

Install Dependencies

```bash
npm install @midscene/webaeb --save-dev
# for demo use
npm install puppeteer ts-node --save-dev
```

## Integrate with Playwright

> [Playwright.js](https://playwright.com/) is an open-source automation library developed by Microsoft, primarily designed for end-to-end testing and web scraping of web applications.
Expand All @@ -29,17 +24,16 @@ We assume you already have a project with Puppeteer.

### add the dependency

```bash
npm install @midscene/web --save-dev
```
<PackageManagerTabs command="install @midscene/web --save-dev" />

### Step 1. update playwright.config.ts

```diff
export default defineConfig({
testDir: './e2e',
+ timeout: 90 * 1000,
+ reporter: '@midscene/web/playwright-report',
+ reporter: [["list"], ["@midscene/web/playwright-report"]],

});
```

Expand All @@ -59,7 +53,7 @@ export const test = base.extend<PlayWrightAiFixtureType>(PlaywrightAiFixture());

Save the following code as `./e2e/ebay-search.spec.ts`

```typescript
```typescript title="./e2e/ebay-search.spec.ts"
import { expect } from "@playwright/test";
import { test } from "./fixture";

Expand Down Expand Up @@ -92,10 +86,11 @@ npx playwright test ./e2e/ebay-search.spec.ts

### Step 5. view test report after running

Follow the instructions in the command line to server the report
Follow the instructions in the command line to server the report.

```bash

# sample command
npx http-server ./midscene_run/report -p 9888 -o -s
```

## Integrate with Puppeteer
Expand All @@ -104,16 +99,13 @@ Follow the instructions in the command line to server the report
### Step 1. install dependencies

```bash
npm install @midscene/web --save-dev
npm install puppeteer ts-node --save-dev
```
<PackageManagerTabs command="install @midscene/web puppeteer ts-node --save-dev" />

### Step 2. write scripts

Write and save the following code as `./demo.ts`.

```typescript
```typescript title="./demo.ts"
import puppeteer from "puppeteer";
import { PuppeteerAgent } from "@midscene/web";

Expand Down Expand Up @@ -165,7 +157,7 @@ await mid.aiQuery(

### Step 3. run

Using ts-node to run, you will get the data of Headphones on ebay:
Using ts-node to run, you will get the data of Headphones on eBay:

```bash
# run
Expand All @@ -188,9 +180,9 @@ npx ts-node demo.ts

After running, MidScene will generate a log dump, which is placed in `./midscene_run/report/latest.web-dump.json` by default. Then put this file into [Visualization Tool](/visualization/), and you will have a clearer understanding of the process.

Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.

## View demo report

## Demo Projects
Click the 'Load Demo' button in the [Visualization Tool](/visualization/), you will be able to see the results of the previous code as well as some other samples.

You can clone a complete demo project in this repo: https://github.com/web-infra-dev/midscene-example/
![](/view-demo-visualization.gif)
15 changes: 8 additions & 7 deletions apps/site/docs/en/docs/more/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,17 @@ MidScene needs a multimodal Large Language Model (LLM) to understand the UI. Cur

### About the token cost

Image resolution and element numbers (i.e., a UI context size created by MidScene) form the token bill.
Image resolution and element numbers (i.e., a UI context size created by MidScene) will affect the token bill.

Here are some typical data.
Here are some typical data with GPT-4o.

|Task | Resolution | Input tokens | Output tokens | GPT-4o Price |
|-----|------------|--------------|---------------|----------------|
|Find the download button on the VSCode website| 1920x1080| 2011|54| $0.011|
|Split the Github status page| 1920x1080| 3609|1020| $0.034|
|Task | Resolution | Prompt Tokens / Price | Completion Tokens / Price |
|-----|------------|--------------|---------------|
|Plan the steps to search on eBay homepage| 1280x800 | 6,975 / $0.034875 |150 / $0.00225|
|Locate the search box on the eBay homepage| 1280x800 | 8,004 / $0.04002 | 92 / $0.00138|
|Query the information about the item in the search results| 1280x800 | 13,403 / $0.067015 | 95 / $0.001425|

> The price data was calculated in June 2024.
> The price data was calculated in August 2024.
### The automation process is running more slowly than it did before

Expand Down
22 changes: 10 additions & 12 deletions apps/site/docs/en/docs/more/prompting-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,6 @@ Good ✅: "Find the search box (it should be along with a region switch, such as

Bad ❌: "Search 'headphone'"

### Infer from the UI, not the DOM properties

All the data sent to the LLM are the screenshots and element coordinates. The DOM is almost invisible to the LLM. So do not expect the LLM infer any information from the DOM (such as `test-id-*` properties).

Ensure everything you expect from the LLM is visible in the screenshot.

### LLMs can NOT tell the exact number like coords or hex-style color, give it some choices

For example:
Expand All @@ -36,12 +30,6 @@ Bad ❌: "[number, number], the [x, y] coords of the main button"

Use the visualization tool to debug and understand each step of MidScene. Just upload the log, and view the AI's parse results. You can find [the tool](/visualization/) on the navigation bar on this site.

### non-English prompting is acceptable

Since most AI models can understand many languages, feel free to write the prompt in any language you prefer. It usually works even if the prompt is in a language different from the page's language.

Good ✅: "点击顶部左侧导航栏中的“首页”链接"

### Remember to cross-check the result by assertion

LLM could behave incorrectly. A better practice is to check its result after running.
Expand All @@ -57,4 +45,14 @@ expect(taskList.length).toBe(1);
expect(taskList[0]).toBe('Learning AI the day after tomorrow');
```

### Infer from the UI, not the DOM properties

All the data sent to the LLM are the screenshots and element coordinates. The DOM is almost invisible to the LLM. So do not expect the LLM infer any information from the DOM (such as `test-id-*` properties).

Ensure everything you expect from the LLM is visible in the screenshot.

### non-English prompting is acceptable

Since most AI models can understand many languages, feel free to write the prompt in any language you prefer. It usually works even if the prompt is in a language different from the page's language.

Good ✅: "点击顶部左侧导航栏中的“首页”链接"
14 changes: 8 additions & 6 deletions apps/site/docs/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ pageType: home

hero:
name: MidScene.js
text: Joyful Automation by AI
text: |
Powered by AI
Joyful UI Automation
tagline:
actions:
- theme: brand
Expand All @@ -17,14 +19,14 @@ hero:
alt: MidScene Logo
features:
- title: Natural Language Interaction
details: Describe the steps, let MidScene plan and execute for you.
icon: 🔍
details: Describe the steps and let MidScene plan and control the user interface for you
icon: 👆
- title: Understand UI, Answer in JSON
details: Provide prompts for the desired data format, and then receive the predictable answer in JSON format.
icon: 🤔
details: Provide prompts regarding the desired data format, and then receive the expected response in JSON format.
icon: 🔍
- title: Intuitive Assertion
details: Make assertions in natural language. It’s all based on AI understanding.
icon:
icon: 🤔
- title: Out-of-box LLM
details: It is fine to use public multimodal LLMs like GPT-4o. There is no need for any custom training.
icon: 🪓
Expand Down
Binary file removed apps/site/docs/public/MidScene_L.mp4
Binary file not shown.
Binary file removed apps/site/docs/public/Visualizer.gif
Binary file not shown.
Binary file added apps/site/docs/public/view-demo-visualization.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added apps/site/docs/public/visualizer.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion apps/site/docs/zh/docs/getting-started/_meta.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[
"introduction",
"quick-start.md"
"quick-start",
"demo"
]
Loading

0 comments on commit 7e60a25

Please sign in to comment.