Perf Testing: Reloading vs creating fresh page for subsequent samples #51379

WunderBart · 2023-06-09T14:05:59Z

ℹ️ This is an observation I've recently had when migrating performance specs to Playwright.

Measuring LCP (Largest Contentful Paint) after reloading the current page vs. after creating a new page in a separate context:

Sample	Same page reloading	New page for each sample
1	877.70	284.30
2	214.90	249.50
3	221.20	252.30
4	212.10	258.50
5	212.60	261.60
6	221.50	251.20
7	222.10	255.00
8	217.50	249.20
9	221.90	252.90
10	207.60	247.50
11	227.50	254.10
12	209.20	252.10
13	215.20	251.40
14	216.10	247.30
15	213.60	253.00
16	223.90	251.20

Page reloading gives faster times, which suggests there's some caching going on. Testing against an isolated page shows more consistent numbers from the very first sample and is closer to what a real user would experience. I will be addressing this in #51084.

/cc @youknowriad @dmsnell

youknowriad · 2023-06-09T17:03:55Z

cc @oandregal

youknowriad · 2023-06-09T17:04:45Z

I think we should avoid changing the logic of the performance jobs while migrating to playwright. Changing logic is tricky and we should do it in their own PRs that way we can see the impact on the graphs post merge.

dmsnell · 2023-06-10T08:12:36Z

Testing against isolated pages is critical for getting usable performance metrics. I would caution against trying to minimize the metrics reported by the tests, because that's not the purpose of the tests. We should optimize for reliability and consistency in the measurements, because that's about the only way we can derive value from them, which is tracking impacts on editor performance over time.

WunderBart · 2023-06-12T09:24:42Z

I think we should avoid changing the logic of the performance jobs while migrating to playwright. Changing logic is tricky and we should do it in their own PRs that way we can see the impact on the graphs post merge.

Agreed, I'll aim to stay as 1:1 as possible. We need to keep in mind, though, that the migration itself will likely alter the numbers due to, e.g. Playwright's auto-waiting mechanism. I'll be taking the measurements before- and after migration and putting them in #51084.

WunderBart · 2023-06-12T09:30:27Z

Testing against isolated pages is critical for getting usable performance metrics.

Agreed. To be clear, as I haven't mentioned in the PR description above - reloading is what we're currently doing with Puppeteer, and using fresh pages is what I suggest doing instead with Playwright. As per #51379 (comment), for the migration itself I'll migrate with the reloading logic as it was and leave the fresh page refactor to a follow-up PR.

dmsnell · 2023-06-12T09:34:05Z

Thanks for clarifying @WunderBart - the only part I really mean to point to is what we did in #47889 - that shouldn't be lowered to a page reload, since page reloads were what introduced bias in the performance measurements.

If we're talking about other tests that aren't doing that, that's surprising I guess, but maybe we can move that direction in those too. We should be able to identify these tests by examining their variance and bias across different rounds of testing. For example, if we run the tests 300 times and the variance is still above the difference in a similar way to running the test 3 times, then that's the bit to pay attention to.

WunderBart · 2023-06-12T09:52:21Z

that shouldn't be lowered to a page reload, since page reloads were what introduced bias in the performance measurements.

I guess what we missed there was loop-based sampling inside a single test, e.g., in the theme perf tests. What happens there is the new page being created in the beforeEach hook, but then each sample is taken from the same page after loop-based navigation to the same URL (so basically reloading). Regarding #47889, when creating new pages we should also create them in new contexts to make sure they're isolated, which was missed in that PR.

youknowriad · 2023-06-12T10:59:24Z

Agreed, I'll aim to stay as 1:1 as possible. We need to keep in mind, though, that the migration itself will likely alter the numbers due to, e.g. Playwright's auto-waiting mechanism.

I think for our numbers (most of them) we use the chrome trace so this shouldn't be impacted by the auto-waiting or is it? If it is I'd say we should find a way to disable the auto-waiting or don't migrate because this might give us numbers that don't correspond to what we want to test.

WunderBart · 2023-06-12T11:37:06Z

Agreed, I'll aim to stay as 1:1 as possible. We need to keep in mind, though, that the migration itself will likely alter the numbers due to, e.g. Playwright's auto-waiting mechanism.

I think for our numbers (most of them) we use the chrome trace so this shouldn't be impacted by the auto-waiting or is it? If it is I'd say we should find a way to disable the auto-waiting or don't migrate because this might give us numbers that don't correspond to what we want to test.

Ah, I don't really know yet, sorry for the fuss. I will see once I make the comparison (I'm currently struggling with a weird CI issue 😩)

WunderBart added the [Type] Performance Related to performance efforts label Jun 9, 2023

WunderBart self-assigned this Jun 9, 2023

WunderBart mentioned this issue Jun 14, 2023

Migrate performance tests to Playwright #51084

Merged

WunderBart mentioned this issue Jul 12, 2023

Switch performance tests to Playwright #52022

Merged

WunderBart closed this as completed in #52022 Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf Testing: Reloading vs creating fresh page for subsequent samples #51379

Perf Testing: Reloading vs creating fresh page for subsequent samples #51379

WunderBart commented Jun 9, 2023

youknowriad commented Jun 9, 2023

youknowriad commented Jun 9, 2023

dmsnell commented Jun 10, 2023

WunderBart commented Jun 12, 2023

WunderBart commented Jun 12, 2023

dmsnell commented Jun 12, 2023

WunderBart commented Jun 12, 2023 •

edited

Loading

youknowriad commented Jun 12, 2023

WunderBart commented Jun 12, 2023 •

edited

Loading

Perf Testing: Reloading vs creating fresh page for subsequent samples #51379

Perf Testing: Reloading vs creating fresh page for subsequent samples #51379

Comments

WunderBart commented Jun 9, 2023

youknowriad commented Jun 9, 2023

youknowriad commented Jun 9, 2023

dmsnell commented Jun 10, 2023

WunderBart commented Jun 12, 2023

WunderBart commented Jun 12, 2023

dmsnell commented Jun 12, 2023

WunderBart commented Jun 12, 2023 • edited Loading

youknowriad commented Jun 12, 2023

WunderBart commented Jun 12, 2023 • edited Loading

WunderBart commented Jun 12, 2023 •

edited

Loading

WunderBart commented Jun 12, 2023 •

edited

Loading