Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webdriver/tests/ are flaky in Firefox and Safari #28925

Closed
foolip opened this issue May 10, 2021 · 10 comments
Closed

webdriver/tests/ are flaky in Firefox and Safari #28925

foolip opened this issue May 10, 2021 · 10 comments

Comments

@foolip
Copy link
Member

foolip commented May 10, 2021

In many recent PRs touching webdriver/tests/ I've seen that the wpt.fyi checks show many differences in test results for Firefox and Safari, with the largest differences typically for Safari. Example:
https://github.com/web-platform-tests/wpt/pull/28875/checks?check_run_id=2522115464
https://github.com/web-platform-tests/wpt/pull/28875/checks?check_run_id=2521309324

This makes it difficult to make changes to WebDriver tests with confidence, because it always looks like there are some regressions. I've had to result to comparing two sets of results with manually constructed wpt.fyi URLs and filters to convince myself in a few occasions:
#28757 (comment)
#28789 (comment)

@burg @gsnedders @jgraham @whimboo is this something you observe in your own CI as well? If it could be made more consistent here in WPT's CI, the risk of regressing the tests accidentally would go down.

@gsnedders
Copy link
Member

The flakiness with the user prompts in safaridriver is known (rdar://54401037 for anyone at Apple who comes across this)

@foolip
Copy link
Member Author

foolip commented May 10, 2021

@gsnedders if it is all caused by user prompts, do you think there's any hacky hack that could be used to make the tests more stable in the meantime?

@gsnedders
Copy link
Member

@gsnedders if it is all caused by user prompts, do you think there's any hacky hack that could be used to make the tests more stable in the meantime?

No idea.

@foolip
Copy link
Member Author

foolip commented May 10, 2021

OK 😄

@whimboo
Copy link
Contributor

whimboo commented May 11, 2021

@foolip are the failures always around changing the window size of the Firefox window? We have some known intermittent failures on Linux for that, but I wonder if using Ubuntu 20.04 makes it even worse. In our CI we still have 18.04 LTS.

@foolip
Copy link
Member Author

foolip commented May 11, 2021

@whimboo it could be, but the user_prompts.py part of the test names is what's drawn my attention. I've also noticed that sometimes the failure of a test jumps to another test, as would happen if there's something async going on that will fail the current test, whatever it is.

From https://wpt.fyi/insights you can generate views that are helpful for this:
https://wpt.fyi/results/webdriver/tests?label=master&label=experimental&max-count=10&product=firefox&q=seq%28%28status%3APASS%7Cstatus%3AOK%29%20%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%29%20seq%28%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%20%28status%3APASS%7Cstatus%3AOK%29%29

It does look like it's always those 3 tests...

@whimboo
Copy link
Contributor

whimboo commented May 11, 2021

Thanks for that link. So from these failures and when I enable details I can only see an AssertionError in most of the cases. Sadly that doesn't give all the details. I assume all tests aren't run with trace logging enabled? For geckodriver and Marionette this can be done via --webdriver-arg=-vv.

What changed recently in Firefox is the new type of content modal dialogs, which are in use by alert, confirm, or prompt so that it would match. And I landed support for these dialogs on April 29th. Since then I fixed some intermittent failures but those shouldn't have affected the WebDriver tests. So is there a way to check if the pass/fail rate got worse around that date?

@foolip
Copy link
Member Author

foolip commented May 11, 2021

If you start at https://wpt.fyi/runs and start scrolling down you can find older runs, but that's a bit tedious. This isn't possible via the UI (I think) but if you add to=2021-04-29 you get runs before that date:
https://wpt.fyi/results/webdriver/tests?label=experimental&label=master&max-count=10&to=2021-04-29T00%3A00%3A00.000Z&product=firefox&q=seq%28%28status%3APASS%7Cstatus%3AOK%29%20%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%29%20seq%28%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%20%28status%3APASS%7Cstatus%3AOK%29%29

It looks like the tests were flaky then as well. Going back a year more in time it's the same.

Have you checked if these are stable in Gecko CI? If yes, and if these tests are perfectly reliable locally, then you could try making logging more verbose by tweaking here:

wpt_args += [
"--log-mach-level=info",
"--log-mach=-",
"-y",
"--no-pause",
"--no-restart-on-unexpected",
"--install-fonts",
"--no-headless",
"--verify-log-full"
]

@whimboo
Copy link
Contributor

whimboo commented May 11, 2021

Oh, I can actually see a lot of multiple statuses set for these tests:

https://searchfox.org/mozilla-central/source/testing/web-platform/meta/webdriver/tests/maximize_window/user_prompts.py.ini

That might explain why I haven't seen any failures in our CI. Sadly we won't have the time to dig further into this anytime soon. :/ But it's good to see it's not related to the new kind of modals.

@whimboo
Copy link
Contributor

whimboo commented Aug 21, 2023

Quite a bit of time has been passed by and we improved the tests and our WebDriver classic implementation a lot since then. I would suggest that we close this issue and if necessary file specific issues for flakiness as seen.

@whimboo whimboo closed this as completed Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants