Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix the Player style when the locator is failed #129

Merged
merged 2 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions packages/midscene/src/ai-model/inspect.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ export async function AiInspectElement<
}) {
const { context, multi, targetElementDescription, callAI, useModel } =
options;
const { screenshotBase64 } = context;
const { screenshotBase64, screenshotBase64WithElementMarker } = context;
const { description, elementById } = await describeUserPage(context);

// meet quick answer
Expand All @@ -61,7 +61,7 @@ export async function AiInspectElement<
{
type: 'image_url',
image_url: {
url: screenshotBase64,
url: screenshotBase64WithElementMarker || screenshotBase64,
},
},
{
Expand Down
19 changes: 14 additions & 5 deletions packages/midscene/src/ai-model/prompt/planning.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,16 +42,16 @@ Remember:

If the planned tasks are sequential and tasks may appear only after the execution of previous tasks, this is considered normal. Thoughts, prompts, and error messages should all be in the same language as the user query.

## Objective 2 (sub objective): Give a quick answer to the action with type "Locate" you just planned
## Objective 2 (sub objective, only for action with type "Locate"): Give a quick answer to the action with type "Locate" you just planned, append a \`quickAnswer\` field after the \`param\` field

Review the action you just planned. If the action type is 'Locate', provide a quick answer: Does any element meet the description in the prompt? If so, answer with the following format, as the \`quickAnswer\` field in the output JSON:
If the action type is 'Locate', provide a quick answer: Does any element meet the description in the prompt? If so, answer with the following format, as the \`quickAnswer\` field in the output JSON:
{
"reason": "Reason for finding element 4: It is located in the upper right corner, is an image type, and according to the screenshot, it is a shopping cart icon button",
"text": "PLACEHOLDER", // Replace PLACEHOLDER with the text of elementInfo, if none, leave empty
"id": "wefew2222few2" // id of this element, replace with actual value in practice
}

If the action type is not 'Locate', or there is no element meets the description in the prompt (usually because it will show up after some interaction), the answer should be null.
If there is no element meets the description in the prompt (usually because it will show up later after some interaction), the \`quickAnswer\` field should be null.

## Output JSON Format:

Expand All @@ -65,7 +65,7 @@ Please return the result in JSON format as follows:
"param": {
"prompt": "The search bar"
},
"quickAnswer": { // since the first action is Locate, so we need to give a quick answer
"quickAnswer": { // since this action type is 'Locate', and we can find the element, so we need to give a quick answer
"reason": "Reason for finding element 4: It is located in the upper right corner, is an input type, and according to the screenshot, it is a search bar",
"text": "PLACEHOLDER", // Replace PLACEHOLDER with the text of elementInfo, if none, leave empty
"id": "wefew2222few2" // ID of this element, replace with actual value in practice
Expand All @@ -76,6 +76,14 @@ Please return the result in JSON format as follows:
"type": "Tap", // Type of action, like 'Tap' 'Hover' ...
"param": any, // Parameter towards the task type
},
{
"thought": "Reasons for generating this task, and why this task is feasible on this page",
"type": "Locate", // Type of action, like 'Tap' 'Hover' ...
"param": {
"prompt": "The search bar"
},
"quickAnswer": null,
},
// ... more actions
],
error?: string, // Overall error messages. If there is any error occurs during the task planning (i.e. error in previous 'actions' array), conclude the errors again, put error messages here,
Expand Down Expand Up @@ -111,7 +119,8 @@ export const planSchema: ResponseFormatJSONSchema = {
},
param: {
type: ['object', 'null'],
description: 'Parameter towards the task type, can be null',
description:
'Parameter towards the task type, can be null only when the type field is Tap or Hover',
},
quickAnswer: {
type: ['object', 'null'],
Expand Down
2 changes: 2 additions & 0 deletions packages/midscene/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ export interface AIAssertionResponse {
export abstract class UIContext<ElementType extends BaseElement = BaseElement> {
abstract screenshotBase64: string;

abstract screenshotBase64WithElementMarker?: string;

abstract content: ElementType[];

abstract size: Size;
Expand Down
2 changes: 1 addition & 1 deletion packages/midscene/src/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ export function writeDumpReport(
const attributesArr = Object.keys(attributes || {}).map((key) => {
return `${key}="${encodeURIComponent(attributes![key])}"`;
});
return `<script type="midscene_web_dump" type="application/json" ${attributesArr.join(' ')}>${dumpString}</script>`;
return `<script type="midscene_web_dump" type="application/json" ${attributesArr.join(' ')}>\n${dumpString}\n</script>`;
});
reportContent = tpl.replace('{{dump}}', dumps.join('\n'));
}
Expand Down
2 changes: 1 addition & 1 deletion packages/visualizer/scripts/build-html.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ function build() {
const resultWithDemo = tplReplacer(html, {
css: `<style>\n${css}\n</style>\n`,
js: `<script>\n${js}\n</script>`,
dump: `<script type="midscene_web_dump" type="application/json">${demoData}</script>`,
dump: `<script type="midscene_web_dump" type="application/json">\n${demoData}\n</script>`,
});
writeFileSync(outputDemoHTML, resultWithDemo);
console.log(`HTML file generated successfully: ${outputDemoHTML}`);
Expand Down
54 changes: 36 additions & 18 deletions packages/visualizer/src/component/blackboard.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ import { useBlackboardPreference, useInsightDump } from './store';

const itemFillAlpha = 0.4;
const highlightAlpha = 0.4;
const bgOnAlpha = 1;
const bgOffAlpha = 0.3;
const noop = () => {
// noop
};
Expand Down Expand Up @@ -70,7 +68,7 @@ const BlackBoard = (): JSX.Element => {
const highlightIds = highlightElements.map((e) => e.id);

const { context } = dump!;
const { size, screenshotBase64 } = context;
const { size, screenshotBase64, screenshotBase64WithElementMarker } = context;

const screenWidth = size.width;
const screenHeight = size.height;
Expand All @@ -84,9 +82,11 @@ const BlackBoard = (): JSX.Element => {

// key overlays
const pixiBgRef = useRef<PIXI.Sprite>();
const { bgVisible, setBgVisible, elementsVisible, setTextsVisible } =
const { markerVisible, setMarkerVisible, elementsVisible, setTextsVisible } =
useBlackboardPreference();

const ifMarkerAvailable = !!screenshotBase64WithElementMarker;

useEffect(() => {
Promise.resolve(
(async () => {
Expand Down Expand Up @@ -139,14 +139,28 @@ const BlackBoard = (): JSX.Element => {
img.onload = () => {
if (!app.stage) return;
const screenshotTexture = PIXI.Texture.from(img);
const screenshotSprite = new PIXI.Sprite(screenshotTexture);
screenshotSprite.x = 0;
screenshotSprite.y = 0;
screenshotSprite.width = screenWidth;
screenshotSprite.height = screenHeight;
app.stage.addChildAt(screenshotSprite, 0);
pixiBgRef.current = screenshotSprite;
screenshotSprite.alpha = bgVisible ? bgOnAlpha : bgOffAlpha;
const backgroundSprite = new PIXI.Sprite(screenshotTexture);
backgroundSprite.x = 0;
backgroundSprite.y = 0;
backgroundSprite.width = screenWidth;
backgroundSprite.height = screenHeight;
app.stage.addChildAt(backgroundSprite, 0);

if (ifMarkerAvailable) {
const markerImg = new Image();
markerImg.src = screenshotBase64WithElementMarker;
markerImg.onload = () => {
const markerTexture = PIXI.Texture.from(markerImg);
const markerSprite = new PIXI.Sprite(markerTexture);
markerSprite.x = 0;
markerSprite.y = 0;
markerSprite.width = screenWidth;
markerSprite.height = screenHeight;
app.stage.addChildAt(markerSprite, 1);
pixiBgRef.current = markerSprite;
markerSprite.visible = markerVisible;
};
}
};
}, [app.stage, appInitialed]);

Expand All @@ -156,7 +170,7 @@ const BlackBoard = (): JSX.Element => {
highlightContainer.removeChildren();
elementMarkContainer.removeChildren();

// element mark
// element rects
context.content.forEach((element) => {
const { rect, content, id } = element;
const ifHighlight = highlightIds.includes(id);
Expand Down Expand Up @@ -198,10 +212,10 @@ const BlackBoard = (): JSX.Element => {
// elementsVisible,
]);

const onSetBg: CheckboxProps['onChange'] = (e) => {
setBgVisible(e.target.checked);
const onSetMarkerVisible: CheckboxProps['onChange'] = (e) => {
setMarkerVisible(e.target.checked);
if (pixiBgRef.current) {
pixiBgRef.current.alpha = e.target.checked ? bgOnAlpha : bgOffAlpha;
pixiBgRef.current.visible = e.target.checked;
}
};

Expand Down Expand Up @@ -238,8 +252,12 @@ const BlackBoard = (): JSX.Element => {
/>
<div className="blackboard-filter">
<div className="overlay-control">
<Checkbox checked={bgVisible} onChange={onSetBg}>
Screenshot
<Checkbox
checked={markerVisible}
onChange={onSetMarkerVisible}
disabled={!ifMarkerAvailable}
>
Marker
</Checkbox>
<Checkbox checked={elementsVisible} onChange={onSetElementsVisible}>
Elements
Expand Down
5 changes: 3 additions & 2 deletions packages/visualizer/src/component/player.less
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
width: fit-content;
max-width: 100%;
max-height: 100%;
padding: @player-spacing;
padding: @player-spacing 0;
padding-bottom: 0;
background: #434443DD;
box-sizing: border-box;
Expand All @@ -27,7 +27,7 @@
align-items: center;
justify-content: center;
overflow: hidden;

padding: 0 @player-spacing;
canvas {
max-width: 100%;
max-height: 100%;
Expand Down Expand Up @@ -65,6 +65,7 @@
display: flex;
flex-direction: row;
flex-shrink: 0;
padding: 0 @player-spacing;

.status-icon {
transition: .2s;
Expand Down
8 changes: 6 additions & 2 deletions packages/visualizer/src/component/player.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -545,16 +545,20 @@ const Player = (): JSX.Element => {
return acc + item.duration + (item.insightCameraDuration || 0);
}, 0);

const progressUpdateInterval = 300;
// progress bar
const progressUpdateInterval = 200;
const startTime = performance.now();
setAnimationProgress(0);
const updateProgress = () => {
const progress = Math.min(
(performance.now() - startTime) / totalDuration,
1,
);

setAnimationProgress(progress);
return timeout(updateProgress, progressUpdateInterval);
if (progress < 1) {
return timeout(updateProgress, progressUpdateInterval);
}
};
frame(updateProgress);

Expand Down
19 changes: 17 additions & 2 deletions packages/visualizer/src/component/replay-scripts.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ export interface AnimationScript {
}

const stillDuration = 1200;
const stillAfterInsightDuration = 300;
const locateDuration = 800;
const actionDuration = 1000;
const clearInsightDuration = 200;
Expand Down Expand Up @@ -185,9 +186,23 @@ export const generateAnimationScripts = (
throw new Error('insight dump is required');
}
const insightContentLength = insightDump.context.content.length;

if (insightDump.context.screenshotBase64WithElementMarker) {
// show the original screenshot first
scripts.push({
type: 'img',
img: insightDump.context.screenshotBase64,
duration: stillAfterInsightDuration,
title,
subTitle,
});
}

scripts.push({
type: 'insight',
img: insightDump.context.screenshotBase64,
img:
insightDump.context.screenshotBase64WithElementMarker ||
insightDump.context.screenshotBase64,
insightDump: insightDump,
camera:
currentCameraState === fullPageCameraState || !insightCameraState
Expand All @@ -202,7 +217,7 @@ export const generateAnimationScripts = (

scripts.push({
type: 'sleep',
duration: 800,
duration: stillAfterInsightDuration,
title,
subTitle,
});
Expand Down
16 changes: 9 additions & 7 deletions packages/visualizer/src/component/store.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ import { generateAnimationScripts } from './replay-scripts';

const { create } = Z;
export const useBlackboardPreference = create<{
bgVisible: boolean;
markerVisible: boolean;
elementsVisible: boolean;
setBgVisible: (visible: boolean) => void;
setMarkerVisible: (visible: boolean) => void;
setTextsVisible: (visible: boolean) => void;
}>((set) => ({
bgVisible: true,
markerVisible: true,
elementsVisible: true,
setBgVisible: (visible: boolean) => {
set({ bgVisible: visible });
setMarkerVisible: (visible: boolean) => {
set({ markerVisible: visible });
},
setTextsVisible: (visible: boolean) => {
set({ elementsVisible: visible });
Expand Down Expand Up @@ -126,8 +126,10 @@ export const useExecutionDump = create<{
execution.tasks.forEach((task) => {
if (task.type === 'Insight') {
const insightTask = task as ExecutionTaskInsightLocate;
width = insightTask.log?.dump?.context?.size?.width || 1920;
height = insightTask.log?.dump?.context?.size?.height || 1080;
if (insightTask.log?.dump?.context?.size?.width) {
width = insightTask.log?.dump?.context?.size?.width;
height = insightTask.log?.dump?.context?.size?.height;
}
}
});
});
Expand Down
5 changes: 3 additions & 2 deletions packages/web-integration/src/common/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -44,15 +44,16 @@ export async function parseContextFromWebPage(
const size = await imageInfoOfBase64(screenshotBase64);

// composite element infos to screenshot
const screenshotBase64WithElementInfos = await compositeElementInfoImg({
const screenshotBase64WithElementMarker = await compositeElementInfoImg({
inputImgBase64: screenshotBase64.split(';base64,').pop() as string,
elementsPositionInfo: elementsPositionInfoWithoutText,
});

return {
content: elementsInfo,
size,
screenshotBase64: `data:image/png;base64,${screenshotBase64WithElementInfos}`,
screenshotBase64,
screenshotBase64WithElementMarker: `data:image/png;base64,${screenshotBase64WithElementMarker}`,
url,
};
}
Expand Down
1 change: 1 addition & 0 deletions packages/web-integration/src/extractor/client-extractor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ export function extractTextWithPosition(initNode: Document): ElementInfo[] {
nodeType = NodeType.BUTTON;
break;
case 'SEARCHINPUT':
case 'TEXTINPUT':
case 'INPUT':
nodeType = NodeType.FORM_ITEM;
break;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,11 @@ describe(
);
const mid = new PuppeteerAgent(originPage);

await mid.aiAction('Click the password input on page');

await mid.aiAction('scroll down two screen');

const widgets = await mid.aiQuery(
await mid.aiQuery(
'find all inputs in the page, return the field name in string[]',
);

Expand Down
Loading