Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate webdriver failures on wpt-chrome-dev-stability action #38450

Open
jcscottiii opened this issue Feb 10, 2023 · 15 comments
Open

Investigate webdriver failures on wpt-chrome-dev-stability action #38450

jcscottiii opened this issue Feb 10, 2023 · 15 comments

Comments

@jcscottiii
Copy link
Contributor

Background

This PR was stuck for awhile because there was a failure on the wpt-chrome-dev-stability GitHub action. The PR changed some webdriver test files.

The errors seen in the PR include:

Logs 1 (click to expand/collapse)
ERROR test_no_top_browsing_context - setup error: webdriver.error.UnknownErrorException: unknown error (500): unknown error: Chrome failed to start: crashed.
Logs 2 (click to expand/collapse)
79:42.61 INFO STDOUT: E           webdriver.error.WebDriverException: tab crashed (500): tab crashed
79:42.61 INFO STDOUT: E             (Session info: chrome=111.0.5562.0)
79:42.61 INFO STDOUT: E     
Logs 3 (click to expand/collapse)
 1:14.81 TEST_END: Test OK. Subtests passed 10/11. Unexpected 1
FAIL test_cross_origin[capabilities0] - webdriver.error.StaleElementReferenceException: stale element reference (404): stale element reference: element is not attached to the page document
session = <Session 3ef0c79b43712a6fbfb8ac9f25035771>
url = <function url.<locals>.url at 0x7efe191543a0>

    @pytest.mark.capabilities({"acceptInsecureCerts": True})
    def test_cross_origin(session, url):
        base_path = ("/webdriver/tests/support/html/subframe.html" +
                     "?pipe=header(Cross-Origin-Opener-Policy,same-origin")
        first_page = url(base_path, protocol="https")
        second_page = url(base_path, protocol="https", domain="alt")
    
        session.url = first_page
        session.url = second_page
    
        elem = session.find.css("#delete", all=False)
    
        response = back(session)
        assert_success(response)
    
        assert session.url == first_page
    
        with pytest.raises(error.NoSuchElementException):
>           elem.click()

base_path  = '/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin'
elem       = <Element 237d3c14-e8d0-40e3-b669-6198f14f7f01>
first_page = 'https://web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin'
response   = <[ValueError('Sign not allowed in string format specifier') raised in repr()] Response object at 0x7efe19090070>
second_page = 'https://not-web-platform.test:8443/webdriver/tests/support/html/subframe.html?pipe=header(Cross-Origin-Opener-Policy,same-origin'
session    = <Session 3ef0c79b43712a6fbfb8ac9f25035771>
url        = <function url.<locals>.url at 0x7efe191543a0>

webdriver/tests/back/back.py:168: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tools/webdriver/webdriver/client.py:22: in inner
    return func(self, *args, **kwargs)
        args       = ()
        func       = <function Element.click at 0x7efe19b36310>
        kwargs     = {}
        self       = <Element 237d3c14-e8d0-40e3-b669-6198f14f7f01>
        session    = <Session 3ef0c79b43712a6fbfb8ac9f25035771>
tools/webdriver/webdriver/client.py:845: in click
    self.send_element_command("POST", "click", {})
        self       = <Element 237d3c14-e8d0-40e3-b669-6198f14f7f01>
tools/webdriver/webdriver/client.py:835: in send_element_command
    return self.session.send_session_command(method, url, body)
        body       = {}
        method     = 'POST'
        self       = <Element 237d3c14-e8d0-40e3-b669-6198f14f7f01>
        uri        = 'click'
        url        = 'element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click'
tools/webdriver/webdriver/client.py:661: in send_session_command
    return self.send_command(method, url, body, timeout)
        body       = {}
        method     = 'POST'
        self       = <Session 3ef0c79b43712a6fbfb8ac9f25035771>
        timeout    = None
        uri        = 'element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click'
        url        = 'session/3ef0c79b43712a6fbfb8ac9f25035771/element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Session 3ef0c79b43712a6fbfb8ac9f25035771>, method = 'POST'
url = 'session/3ef0c79b43712a6fbfb8ac9f25035771/element/237d3c14-e8d0-40e3-b669-6198f14f7f01/click'
body = {}, timeout = None

    def send_command(self, method, url, body=None, timeout=None):
        """
        Send a command to the remote end and validate its success.
    
        :param method: HTTP method to use in request.
        :param uri: "Command part" of the HTTP request URL,
            e.g. `window/rect`.
        :param body: Optional body of the HTTP request.
    
        :return: `None` if the HTTP response body was empty, otherwise
            the `value` field returned after parsing the response
            body as JSON.
    
        :raises error.WebDriverException: If the remote end returns
            an error.
        :raises ValueError: If the response body does not contain a
            `value` key.
        """
    
        response = self.transport.send(
            method, url, body,
            encoder=protocol.Encoder, decoder=protocol.Decoder,
            session=self, timeout=timeout)
    
        if response.status != 200:
            err = error.from_response(response)
    
            if isinstance(err, error.InvalidSessionIdException):
                # The driver could have already been deleted the session.
                self.session_id = None
    
>           raise err
E           webdriver.error.StaleElementReferenceException: stale element reference (404): stale element reference: element is not attached to the page document
E             (Session info: chrome=111.0.5562.0)
E           
E           Remote-end stacktrace:
E           
E           #0 0x55bed92e4152 <unknown>
E           #1 0x55bed926ced3 <unknown>
E           #2 0x55bed8ff4741 <unknown>
E           #3 0x55bed8ff8009 <unknown>
E           #4 0x55bed8ff7d1a <unknown>
E           #5 0x55bed8ff8087 <unknown>
E           #6 0x55bed902e930 <unknown>
E           #7 0x55bed9024093 <unknown>
E           #8 0x55bed90536c2 <unknown>
E           #9 0x55bed9023b42 <unknown>
E           #10 0x55bed905388e <unknown>
E           #11 0x55bed906b0fe <unknown>
E           #12 0x55bed9053463 <unknown>
E           #13 0x55bed9021fd2 <unknown>
E           #14 0x55bed902320c <unknown>
E           #15 0x55bed92a3d6b <unknown>
E           #16 0x55bed92ba7a4 <unknown>
E           #17 0x55bed92ba05f <unknown>
E           #18 0x55bed92baf55 <unknown>
E           #19 0x55bed92a5c73 <unknown>
E           #20 0x55bed92bb2db <unknown>
E           #21 0x55bed92957c7 <unknown>
E           #22 0x55bed92d85a8 <unknown>
E           #23 0x55bed92d86eb <unknown>
E           #24 0x55bed92f47f6 <unknown>
E           #25 0x7fb931998609 start_thread
E           #26 0x7fb93143c133 clone
Logs 4 (click to expand/collapse)
setup error: webdriver.error.UnknownErrorException: unknown error (500): unknown error: Chrome failed to start: crashed

To prove that it was unrelated, we created a PR that only touched the whitespace in the same files. From there, we could conclude that it was safe to merge since the same errors came up.

Risk of not resolving

  • There might be a legitimate reason why Chrome is crashing
  • Another change to webdriver files will cause another PR to be stuck for awhile. Causing headaches and extra cycles to re-solve this.

Initial Hypotheses

  1. Something is wrong with a dependency in the Docker container.
  1. There might be a real error in Chrome and webdriver
  • Evidence: TBD
@thiagowfx
Copy link
Member

thiagowfx commented Jun 9, 2023

I cannot submit several of my PRs because of this blocker. cc @foolip @jgraham (who force-merged previously) could you help investigate this?

Example PR: #40470
Logs: https://github.com/web-platform-tests/wpt/pull/40470/checks?check_run_id=14136304439

Another example PR: #40421
Logs: https://github.com/web-platform-tests/wpt/pull/40421/checks?check_run_id=14132619502

The failures are all related to webdriver classic, which are all unrelated to both PRs. You can confirm this by grepping for "FAIL" in the logs: all failures are pertaining /webdriver/tests/classic.

@thiagowfx
Copy link
Member

thiagowfx commented Jun 9, 2023

Actually, the error messages ask to tag a group instead of individuals, so let's do that:

These may be pre-existing or new flakes. Please try to reproduce (see the above WPT command, though some flags may not be needed when running locally) and determine if your change introduced the flake. If you are unable to reproduce the problem, please tag @web-platform-tests/wpt-core-team in a comment for help.

These may be pre-existing or newly slow tests. Slow tests indicate that a test ran very close to the test timeout limit and so may become TIMEOUT-flaky in the future. Consider speeding up the test or breaking it into multiple tests. For help, please tag @web-platform-tests/wpt-core-team in a comment.

cc @web-platform-tests/wpt-core-team

@thiagowfx
Copy link
Member

I believe I understand the pattern. Whenever files in /webdriver/tests/support are touched, the CI fails because of the current pre-existing failures.

@whimboo
Copy link
Contributor

whimboo commented Jun 13, 2023

Now with web-platform-tests/rfcs#131 merged what does it mean for those jobs? Do those changes have to be applied now?

@jcscottiii
Copy link
Contributor Author

Hey @whimboo! The next steps would be to implement that RFC. We are currently prioritizing our work. Once we have a timeline for that work we will comment on that RFC.

@thiagowfx
Copy link
Member

@nechaev-chromium I believe you may have fixed some of those issues, with https://chromium-review.googlesource.com/c/chromium/src/+/4675633

I fixed some of those with #40887

@thiagowfx
Copy link
Member

See also: #40990

@whimboo
Copy link
Contributor

whimboo commented Aug 2, 2023

I was out for 3 weeks. @thiagowfx are those jobs are more stable nowadays? In case they still fail often what else might be left to do? It's at least good to see that this crash has been fixed!

@thiagowfx
Copy link
Member

Splitting the tests was overall helpful. They are more stable, but not completely. #41083 also needs to be fixed.

@nechaev-chromium
Copy link
Contributor

We have fixed two causes of ConnectionRefusedError. The fixes must be available since 117.0.5915.x

@whimboo
Copy link
Contributor

whimboo commented Sep 18, 2023

Is there any work left to do? Recently it looks pretty good around this job. Through I'm not sure how often it still fails for PRs and landings that I don't watch.

@thiagowfx
Copy link
Member

The question we should ask ourselves is: Do we still require admin merges to bypass this? If yes, then there's still work to do. I haven't merged any non-trivial PRs recently, @Lightning00Blade @OrKoN what's your experience in the last few weeks?

@whimboo
Copy link
Contributor

whimboo commented Sep 18, 2023

FYI we have had a quite good experience lately with admin merge requests when specifically asking the web-platform-tests/admins team directly. Some person should always be around for help.

@past
Copy link
Member

past commented Sep 22, 2023

Note that #40990 was fixed, so the wpt-chrome-dev-stability check should no longer block any PR.

@whimboo
Copy link
Contributor

whimboo commented Jan 4, 2024

Note that there is also a bug in Chrome which causes an extra 100ms delay when trying to resize or re-position a window. With #43853 I'm going to add a workaround until it's fixed.

With this PR landed the chrome wdspec tests will drastically speed-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants