Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC to disable Chromium stability checks #131

Merged
merged 3 commits into from
Jun 6, 2023

Conversation

past
Copy link
Member

@past past commented Feb 1, 2023

This RFC follows a proposal from @jgraham in a discussion on Matrix from a few days ago.

@past
Copy link
Member Author

past commented Feb 9, 2023

In the WPT RFCs and infrastructure meeting yesterday I promised to share more details about how the flakiness detection works in Chromium CI. Here is a publicly accessible recent example, courtesy of @WeizhongX.

If you look closely you see that in step 45 we run blink_wpt_tests, which fail. Then in step 50 the system is trying to query the results database for any pre-existing flakiness. And it was able to decide that the failure in blink_web_tests (step 42) is due to a known flaky test, since there is a record of prior failed runs with that test (check the stdout link). It doesn't see any pre-existing flakiness in the failed blink_wpt_tests though. Then in step 57 it retries (perhaps unnecessarily) the shards with the patch a second time. In step 71, it retries without the patch applied. At this point, any previously failed tests will be run 10 times each. Any single failure here will cause the test to be treated as an existing failure, and the patch will be exonerated. It doesn't detect any failures, so the patch is clearly at fault. Finally in step 78 we can see that the patch was exonerated for the blink_web_tests failure since it was pre-existing, but in step 80 it declares the patch responsible for the blink_wpt_tests failures since reruns without the patch were successful.

In case it is useful, the recipe code for trybots can be found here and the recipe documentation is here as well. There are also a few different internal dashboards and alerts that automate the investigation process and provide aggregate data to browser engineers.

Of course all this is not easy to upstream to WPT infra, but hopefully it provides some pointers on what a future comprehensive solution could look like. For now, disabling the stability checks or making them non-blocking is the most efficient path.

Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object to this or anything, but I think we should be clear that it's specifically about Chromium engineers preferring a different tradeoff to the status quo based on the properties of their own test system, and that the arguments don't readily generalise outside of that context.

As mentioned we also need to specifically pick a solution and outline what's required to implement that solution (e.g. to make Chrome checks non-blocking we actually need to change the taskcluster configuration so they don't cause the sink task to fail).

flaky, wasting engineering time down the road as they block landing unrelated
PRs. We currently run stability jobs for Firefox and Chrome, but not when
[merging export PRs from their own repositories](https://github.com/web-platform-tests/wpt/issues/29737).
The reason is that browser engineers usually have better tools in their
Copy link
Contributor

@jgraham jgraham Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make claims about vendor CIs having better tooling in general. The reason we don't run stability checks in specific browsers for PRs upstreamed from that browser's own repo isn't that we expect vendors have better tooling, but that we assume that a change which was considered stable enough to land in the browser repo is also stable enough to land upstream, or at least that if it does cause flake in the browser repo it will be fixed there too. In addition it's unusually likely that patches from vendor repos add flaky tests at the same time as fixing the underlying issue. That's fine in the vendor CI where the build running the test will have the code changes, but since the builds we are running won't get those changes until they reach the relevant experimental channel, the added tests are likely to be unstable in GitHub CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I am making any strong claims here, I said "usually have better tools". Your comment suggests that something makes doing the work downstream "better", but doesn't clearly identify that is. I tried to relax it further to "usually have better tools or processes", I hope that is satisfactory.

browser vendors (e.g. [1](https://github.com/web-platform-tests/wpt/pull/38046#event-8332522292),
[2](https://github.com/web-platform-tests/wpt/pull/38034)).

Keeping the job but making it non-blocking would also avoid the impact on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC needs to pick a specific option. I tend to favour the option where we keep the checks running but make them non-blocking. We can always disable them later if that doesn't provide value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and text updated accordingly.

Firefox engineer introducing a flaky test that causes wpt-chrome-dev-stability
to fail intermittently would be annoying, but fortunately the Chromium CI has
more tools available for such failures to be properly triaged and investigated.
Therefore wpt-chrome-dev-stability runs are not providing significant value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the claim is "Chrome developers would rather fix issues from third party changes in their own repo/CI, rather than working with the test author to fix the issue when it's detected in GitHub CI", that's fine, but we should be clear about what tradeoff is being made and who is happy with that tradeoff. In general there are still concerns about tests landing in wpt that are dependent on browser specifics (e.g. https://twitter.com/rniwa_dev/status/1620601785995100162) and flakiness is one of the most obvious ways that such specifics might show up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what wording changes would satisfy you here, I believe we are in agreement about there being a tradeoff, which I describe in the Risks section. And to stress the point of who is happy and making that tradeoff, I only speak about the Chromium engineers in the entire RFC. Happy to amend further if you can provide a specific text.

@foolip
Copy link
Member

foolip commented Mar 21, 2023

I just had web-platform-tests/wpt#39105 blocked by this. I can admin merge myself, but most contributors can't and would be stuck for a while. @past do you have time soon to finish this?

@whimboo
Copy link
Contributor

whimboo commented Mar 28, 2023

I just had web-platform-tests/wpt#39105 blocked by this. I can admin merge myself, but most contributors can't and would be stuck for a while. @past do you have time soon to finish this?

I can second that. I don't have admin permissions and as such cannot merge myself. We always have to ask @jgraham to get such PRs merged which in recent weeks is probably around 90% of the PRs. We as well would appreciate if this PR can be finished.

@past
Copy link
Member Author

past commented Apr 21, 2023

Apologies for the delay. I believe I have now addressed the review feedback, so this is ready for another review.

@whimboo
Copy link
Contributor

whimboo commented May 15, 2023

@past so who has to review these changes from Google's side? Looks like this PR is stuck.

Interestingly I have seen no such failure for quite some time for our PRs, but maybe I have missed such failures for others. Is there a way to easily see the failure rate of the Chromium stability job?

@past
Copy link
Member Author

past commented May 15, 2023

Hey Henrik, @jgraham promised to take another look so I am waiting for a green light from him to land this.

@jgraham
Copy link
Contributor

jgraham commented May 19, 2023

I don't quite know what you were hoping for, but I've rewritten the RFC to better explain the tradeoffs as I understand them, and provide a bit more detail on the actual implementation requirements.

@past
Copy link
Member Author

past commented May 19, 2023

Thanks James, I'm happy with your changes if you want to review and merge this.

@jgraham
Copy link
Contributor

jgraham commented May 22, 2023

Since this was a fairly substantive change I think it requires at least the one week timeout before merging. But also I'd be interested in input from other members of @web-platform-tests/wpt-core-team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants