Add RFC to disable Chromium stability checks #131

past · 2023-02-01T00:48:31Z

This RFC follows a proposal from @jgraham in a discussion on Matrix from a few days ago.

past · 2023-02-09T00:16:30Z

In the WPT RFCs and infrastructure meeting yesterday I promised to share more details about how the flakiness detection works in Chromium CI. Here is a publicly accessible recent example, courtesy of @WeizhongX.

If you look closely you see that in step 45 we run blink_wpt_tests, which fail. Then in step 50 the system is trying to query the results database for any pre-existing flakiness. And it was able to decide that the failure in blink_web_tests (step 42) is due to a known flaky test, since there is a record of prior failed runs with that test (check the stdout link). It doesn't see any pre-existing flakiness in the failed blink_wpt_tests though. Then in step 57 it retries (perhaps unnecessarily) the shards with the patch a second time. In step 71, it retries without the patch applied. At this point, any previously failed tests will be run 10 times each. Any single failure here will cause the test to be treated as an existing failure, and the patch will be exonerated. It doesn't detect any failures, so the patch is clearly at fault. Finally in step 78 we can see that the patch was exonerated for the blink_web_tests failure since it was pre-existing, but in step 80 it declares the patch responsible for the blink_wpt_tests failures since reruns without the patch were successful.

In case it is useful, the recipe code for trybots can be found here and the recipe documentation is here as well. There are also a few different internal dashboards and alerts that automate the investigation process and provide aggregate data to browser engineers.

Of course all this is not easy to upstream to WPT infra, but hopefully it provides some pointers on what a future comprehensive solution could look like. For now, disabling the stability checks or making them non-blocking is the most efficient path.

jgraham

I don't object to this or anything, but I think we should be clear that it's specifically about Chromium engineers preferring a different tradeoff to the status quo based on the properties of their own test system, and that the arguments don't readily generalise outside of that context.

As mentioned we also need to specifically pick a solution and outline what's required to implement that solution (e.g. to make Chrome checks non-blocking we actually need to change the taskcluster configuration so they don't cause the sink task to fail).

jgraham · 2023-02-09T14:34:01Z

rfcs/disable_chrome_stability.md

+flaky, wasting engineering time down the road as they block landing unrelated
+PRs. We currently run stability jobs for Firefox and Chrome, but not when
+[merging export PRs from their own repositories](https://github.com/web-platform-tests/wpt/issues/29737).
+The reason is that browser engineers usually have better tools in their


I don't think we should make claims about vendor CIs having better tooling in general. The reason we don't run stability checks in specific browsers for PRs upstreamed from that browser's own repo isn't that we expect vendors have better tooling, but that we assume that a change which was considered stable enough to land in the browser repo is also stable enough to land upstream, or at least that if it does cause flake in the browser repo it will be fixed there too. In addition it's unusually likely that patches from vendor repos add flaky tests at the same time as fixing the underlying issue. That's fine in the vendor CI where the build running the test will have the code changes, but since the builds we are running won't get those changes until they reach the relevant experimental channel, the added tests are likely to be unstable in GitHub CI.

I don't think I am making any strong claims here, I said "usually have better tools". Your comment suggests that something makes doing the work downstream "better", but doesn't clearly identify that is. I tried to relax it further to "usually have better tools or processes", I hope that is satisfactory.

jgraham · 2023-02-09T14:36:35Z

rfcs/disable_chrome_stability.md

+browser vendors (e.g. [1](https://github.com/web-platform-tests/wpt/pull/38046#event-8332522292),
+[2](https://github.com/web-platform-tests/wpt/pull/38034)).
+
+Keeping the job but making it non-blocking would also avoid the impact on


The RFC needs to pick a specific option. I tend to favour the option where we keep the checks running but make them non-blocking. We can always disable them later if that doesn't provide value.

Agreed and text updated accordingly.

jgraham · 2023-02-09T14:44:09Z

rfcs/disable_chrome_stability.md

+Firefox engineer introducing a flaky test that causes wpt-chrome-dev-stability
+to fail intermittently would be annoying, but fortunately the Chromium CI has
+more tools available for such failures to be properly triaged and investigated.
+Therefore wpt-chrome-dev-stability runs are not providing significant value


If the claim is "Chrome developers would rather fix issues from third party changes in their own repo/CI, rather than working with the test author to fix the issue when it's detected in GitHub CI", that's fine, but we should be clear about what tradeoff is being made and who is happy with that tradeoff. In general there are still concerns about tests landing in wpt that are dependent on browser specifics (e.g. https://twitter.com/rniwa_dev/status/1620601785995100162) and flakiness is one of the most obvious ways that such specifics might show up.

I don't understand what wording changes would satisfy you here, I believe we are in agreement about there being a tradeoff, which I describe in the Risks section. And to stress the point of who is happy and making that tradeoff, I only speak about the Chromium engineers in the entire RFC. Happy to amend further if you can provide a specific text.

foolip · 2023-03-21T14:36:46Z

I just had web-platform-tests/wpt#39105 blocked by this. I can admin merge myself, but most contributors can't and would be stuck for a while. @past do you have time soon to finish this?

whimboo · 2023-03-28T07:06:20Z

I just had web-platform-tests/wpt#39105 blocked by this. I can admin merge myself, but most contributors can't and would be stuck for a while. @past do you have time soon to finish this?

I can second that. I don't have admin permissions and as such cannot merge myself. We always have to ask @jgraham to get such PRs merged which in recent weeks is probably around 90% of the PRs. We as well would appreciate if this PR can be finished.

past · 2023-04-21T23:09:56Z

Apologies for the delay. I believe I have now addressed the review feedback, so this is ready for another review.

whimboo · 2023-05-15T09:18:34Z

@past so who has to review these changes from Google's side? Looks like this PR is stuck.

Interestingly I have seen no such failure for quite some time for our PRs, but maybe I have missed such failures for others. Is there a way to easily see the failure rate of the Chromium stability job?

past · 2023-05-15T16:41:34Z

Hey Henrik, @jgraham promised to take another look so I am waiting for a green light from him to land this.

jgraham · 2023-05-19T15:17:54Z

I don't quite know what you were hoping for, but I've rewritten the RFC to better explain the tradeoffs as I understand them, and provide a bit more detail on the actual implementation requirements.

past · 2023-05-19T17:37:27Z

Thanks James, I'm happy with your changes if you want to review and merge this.

jgraham · 2023-05-22T12:35:54Z

Since this was a fairly substantive change I think it requires at least the one week timeout before merging. But also I'd be interested in input from other members of @web-platform-tests/wpt-core-team

Add RFC to disable Chromium stability checks

c748e3c

jgraham reviewed Feb 9, 2023

View reviewed changes

Address review feedback

891685e

Rewrite to better explain the tradeoffs

3d3c91a

Ms2ger approved these changes May 22, 2023

View reviewed changes

past merged commit ee6fce1 into master Jun 6, 2023

past deleted the past/disable-chrome-stability branch June 6, 2023 15:18

whimboo mentioned this pull request Jun 13, 2023

Investigate webdriver failures on wpt-chrome-dev-stability action web-platform-tests/wpt#38450

Open

thiagowfx mentioned this pull request Jul 3, 2023

WebDriver BiDi: Expand list of people with admin merge permissions web-platform-tests/wpt#40860

Closed

This was referenced Jul 12, 2023

Make wpt-chrome-dev-stability non-blocking #159

Closed

Make wpt-chrome-dev-stability non-blocking web-platform-tests/wpt#40990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RFC to disable Chromium stability checks #131

Add RFC to disable Chromium stability checks #131

past commented Feb 1, 2023

past commented Feb 9, 2023

jgraham left a comment

jgraham Feb 9, 2023 •

edited

Loading

past Apr 21, 2023

jgraham Feb 9, 2023

past Apr 21, 2023

jgraham Feb 9, 2023

past Apr 21, 2023

foolip commented Mar 21, 2023

whimboo commented Mar 28, 2023

past commented Apr 21, 2023 •

edited

Loading

whimboo commented May 15, 2023

past commented May 15, 2023

jgraham commented May 19, 2023

past commented May 19, 2023

jgraham commented May 22, 2023

Add RFC to disable Chromium stability checks #131

Add RFC to disable Chromium stability checks #131

Conversation

past commented Feb 1, 2023

past commented Feb 9, 2023

jgraham left a comment

Choose a reason for hiding this comment

jgraham Feb 9, 2023 • edited Loading

Choose a reason for hiding this comment

past Apr 21, 2023

Choose a reason for hiding this comment

jgraham Feb 9, 2023

Choose a reason for hiding this comment

past Apr 21, 2023

Choose a reason for hiding this comment

jgraham Feb 9, 2023

Choose a reason for hiding this comment

past Apr 21, 2023

Choose a reason for hiding this comment

foolip commented Mar 21, 2023

whimboo commented Mar 28, 2023

past commented Apr 21, 2023 • edited Loading

whimboo commented May 15, 2023

past commented May 15, 2023

jgraham commented May 19, 2023

past commented May 19, 2023

jgraham commented May 22, 2023

jgraham Feb 9, 2023 •

edited

Loading

past commented Apr 21, 2023 •

edited

Loading