Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf Testing: Reloading vs creating fresh page for subsequent samples #51379

Closed
WunderBart opened this issue Jun 9, 2023 · 9 comments · Fixed by #52022
Closed

Perf Testing: Reloading vs creating fresh page for subsequent samples #51379

WunderBart opened this issue Jun 9, 2023 · 9 comments · Fixed by #52022
Assignees
Labels
[Type] Performance Related to performance efforts

Comments

@WunderBart
Copy link
Member

ℹ️ This is an observation I've recently had when migrating performance specs to Playwright.

Measuring LCP (Largest Contentful Paint) after reloading the current page vs. after creating a new page in a separate context:

Sample Same page reloading New page for each sample
1 877.70 284.30
2 214.90 249.50
3 221.20 252.30
4 212.10 258.50
5 212.60 261.60
6 221.50 251.20
7 222.10 255.00
8 217.50 249.20
9 221.90 252.90
10 207.60 247.50
11 227.50 254.10
12 209.20 252.10
13 215.20 251.40
14 216.10 247.30
15 213.60 253.00
16 223.90 251.20

Page reloading gives faster times, which suggests there's some caching going on. Testing against an isolated page shows more consistent numbers from the very first sample and is closer to what a real user would experience. I will be addressing this in #51084.

/cc @youknowriad @dmsnell

@WunderBart WunderBart added the [Type] Performance Related to performance efforts label Jun 9, 2023
@WunderBart WunderBart self-assigned this Jun 9, 2023
@youknowriad
Copy link
Contributor

cc @oandregal

@youknowriad
Copy link
Contributor

I think we should avoid changing the logic of the performance jobs while migrating to playwright. Changing logic is tricky and we should do it in their own PRs that way we can see the impact on the graphs post merge.

@dmsnell
Copy link
Member

dmsnell commented Jun 10, 2023

Testing against isolated pages is critical for getting usable performance metrics. I would caution against trying to minimize the metrics reported by the tests, because that's not the purpose of the tests. We should optimize for reliability and consistency in the measurements, because that's about the only way we can derive value from them, which is tracking impacts on editor performance over time.

@WunderBart
Copy link
Member Author

I think we should avoid changing the logic of the performance jobs while migrating to playwright. Changing logic is tricky and we should do it in their own PRs that way we can see the impact on the graphs post merge.

Agreed, I'll aim to stay as 1:1 as possible. We need to keep in mind, though, that the migration itself will likely alter the numbers due to, e.g. Playwright's auto-waiting mechanism. I'll be taking the measurements before- and after migration and putting them in #51084.

@WunderBart
Copy link
Member Author

Testing against isolated pages is critical for getting usable performance metrics.

Agreed. To be clear, as I haven't mentioned in the PR description above - reloading is what we're currently doing with Puppeteer, and using fresh pages is what I suggest doing instead with Playwright. As per #51379 (comment), for the migration itself I'll migrate with the reloading logic as it was and leave the fresh page refactor to a follow-up PR.

@dmsnell
Copy link
Member

dmsnell commented Jun 12, 2023

Thanks for clarifying @WunderBart - the only part I really mean to point to is what we did in #47889 - that shouldn't be lowered to a page reload, since page reloads were what introduced bias in the performance measurements.

If we're talking about other tests that aren't doing that, that's surprising I guess, but maybe we can move that direction in those too. We should be able to identify these tests by examining their variance and bias across different rounds of testing. For example, if we run the tests 300 times and the variance is still above the difference in a similar way to running the test 3 times, then that's the bit to pay attention to.

@WunderBart
Copy link
Member Author

WunderBart commented Jun 12, 2023

that shouldn't be lowered to a page reload, since page reloads were what introduced bias in the performance measurements.

I guess what we missed there was loop-based sampling inside a single test, e.g., in the theme perf tests. What happens there is the new page being created in the beforeEach hook, but then each sample is taken from the same page after loop-based navigation to the same URL (so basically reloading). Regarding #47889, when creating new pages we should also create them in new contexts to make sure they're isolated, which was missed in that PR.

@youknowriad
Copy link
Contributor

Agreed, I'll aim to stay as 1:1 as possible. We need to keep in mind, though, that the migration itself will likely alter the numbers due to, e.g. Playwright's auto-waiting mechanism.

I think for our numbers (most of them) we use the chrome trace so this shouldn't be impacted by the auto-waiting or is it? If it is I'd say we should find a way to disable the auto-waiting or don't migrate because this might give us numbers that don't correspond to what we want to test.

@WunderBart
Copy link
Member Author

WunderBart commented Jun 12, 2023

Agreed, I'll aim to stay as 1:1 as possible. We need to keep in mind, though, that the migration itself will likely alter the numbers due to, e.g. Playwright's auto-waiting mechanism.

I think for our numbers (most of them) we use the chrome trace so this shouldn't be impacted by the auto-waiting or is it? If it is I'd say we should find a way to disable the auto-waiting or don't migrate because this might give us numbers that don't correspond to what we want to test.

Ah, I don't really know yet, sorry for the fuss. I will see once I make the comparison (I'm currently struggling with a weird CI issue 😩)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Performance Related to performance efforts
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants