Failing GitHub Actions font-test was marked as successful #17396

Snuffleupagus · 2023-12-09T10:34:36Z

In a recent PR the font-tests failed completely on Windows, however the GitHub Actions workflow didn't actually catch the failure which means that the test was marked as successful: https://github.com/mozilla/pdf.js/actions/runs/7150415038/job/19473662790?pr=17395#step:8:20

That seems quite bad, since it means that we could very easily overlook a failing test. (Running gulp fonttest locally on Windows works fine though, so I'm not sure how to debug this.)

/cc @timvandermeij Any ideas here?

The text was updated successfully, but these errors were encountered:

timvandermeij · 2023-12-09T12:54:59Z

Yikes, that's not great indeed. I'll have a look at that.

timvandermeij · 2023-12-10T15:06:43Z

I have managed to reproduce the issue by putting back the code from #17172 (comment) (it talks about an invalid browsing context in BiDi which is what we also see here) and a hothacked node_modules/puppeteer-core/lib/esm/puppeteer/bidi/Connection.js file where the exception on line 123 is raised unconditionally.

Putting back the page closing lines is required to trigger the #onContextDestroyed function and that in turn raises the exception. However, the Puppeteer exception is not bubbling up to our test runner process, which causes it to exit with status 0 and indicates success. Somewhere in Puppeteer it seems that the exception is logged and then swallowed, or it's occurring in the browser context and not bubbling up further. In any case, this looks like a regression from enabling BiDi and sadly this doesn't seem possible to fix on our end given that the exception in Puppeteer doesn't even reach our code.

@whimboo @OrKoN Does this perhaps ring a bell for you, also given the comment at #17172 (comment) which was needed for BiDi? The exception in Puppeteer is triggered as soon as await page.close(); is called, but it's only logged and doesn't cause this promise to reject (putting a try/catch around this does nothing). This feels like a bug in Puppeteer to me, also given the apparent behavior change that required the removal of this loop, but you probably know much more about how the Puppeteer internals handle this than I do.

Snuffleupagus · 2023-12-10T15:31:12Z

As a temporary solution, depending on how difficult/time-consuming a proper fix is, could we do something similar to the old botio font-test and check that the success message is being logged and manually throw an exception?
https://github.com/mozilla/botio-files-pdfjs/pull/42/files#diff-aa6bfe19954403f5a8eac1c2259476609786b1c9b05bfaa81a360a87430a7710L26-L33

timvandermeij · 2023-12-10T15:37:15Z

Yes, I think that could work as a temporary solution indeed. Let's await a response to the message above, and if it's indeed an upstream issue that we can't fix here we can decide to:

Temporarily revert from BiDi to CDP which didn't have this issue, and go back to BiDi once the upstream fix has landed.
Temporarily introduce the workaround, and remove it later once the upstream fix has landed.
Wait for a new upstream release if the fix is trivial and can be made fairly quickly.

whimboo · 2023-12-11T12:01:22Z

This is indeed strange. So when a browsing context gets destroyed in the browser it's basically gone after the browsingContext.contextDestroyed event has been received. So something might have already removed the given browsing context from this.#browsingContexts in Connection.ts. The only method removing entries is unregisterBrowsingContexts, which is called from dispose in BrowsingContext.ts.

@sadym-chromium could you maybe help while @OrKoN is away?

sadym-chromium · 2023-12-11T12:25:08Z

@whimboo could you please provide a link to the test in pdfjs which is failing?

whimboo · 2023-12-11T13:46:42Z

I don't know which exact test fails here but see the initial comment with a link to the CI log. All the font tests can be found at https://github.com/mozilla/pdf.js/tree/master/test/font.

@timvandermeij and @Snuffleupagus could you may give more details about the code that actually triggers this exception?

Snuffleupagus · 2023-12-11T13:54:31Z

This breaks when run, on Windows, as part of this GitHub Actions workflow: https://github.com/mozilla/pdf.js/blob/master/.github/workflows/font_tests.yml

timvandermeij · 2023-12-11T21:40:51Z

I have been working on creating a reduced test case that would demonstrate the issue in isolation, and while doing this I found that I had been chasing a red herring in my previous analysis where I thought the Puppeteer exception didn't bubble up to our test runner. It turns out it actually does, and that is an issue in our test runner that has been there since basically "forever" (I could trace it back to a commit from 7 years ago). For some reason we never noticed this before, but the following code is problematic here:

https://github.com/mozilla/pdf.js/blob/master/gulpfile.mjs#L693-L695

The issue that we take the exit code from the spawned Node.js process and then... do nothing with it. This pattern exists in more places, so that's something we can fix on our end, and we should also audit the other similar call sites.

However, if we do that now the builds will actually fail and likely block merging PRs, so we still need to do something about the Puppeteer exception being raised in the first place, so one of the options from #17396 (comment). Strangely this only happens on Windows in our CI, and I also can't reproduce it on Linux at all. However, it does happen consistently in the CI; the most recent run is https://github.com/mozilla/pdf.js/actions/runs/7173542922/job/19533117343 and for completeness I'm listing the full traceback here:

Started 7 tests for firefox with seed 89123.
file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/puppeteer/bidi/Connection.js:123
            throw new Error(`BrowsingContext ${contextId} does not exist.`);
                  ^

Error: BrowsingContext undefined does not exist.
    at BidiConnection.getBrowsingContext (file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/puppeteer/bidi/Connection.js:123:19)
    at #onContextDestroyed (file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/puppeteer/bidi/Browser.js:161:42)
    at file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/third_party/mitt/mitt.js:1:242
    at Array.map (<anonymous>)
    at Object.emit (file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/third_party/mitt/mitt.js:1:226)
    at BidiConnection.emit (file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/puppeteer/common/EventEmitter.js:85:23)
    at BidiConnection.onMessage (file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/puppeteer/bidi/Connection.js:93:26)
    at WebSocket.<anonymous> (file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/puppeteer/node/NodeWebSocketTransport.js:46:32)
    at callListener (D:\a\pdf.js\pdf.js\node_modules\ws\lib\event-target.js:290:14)
    at WebSocket.onMessage (D:\a\pdf.js\pdf.js\node_modules\ws\lib\event-target.js:209:9)

Does the traceback perhaps provide any hints as to why the browsing context in Puppeteer could be undefined on Windows only? Would it perhaps make sense to check for that in Puppeteer?

I'm not sure how useful this is, but in an attempt to help with reproducing this I have tried to construct a reduced test case of how we use Puppeteer (note that we no longer have the page closing loop because that's indeed unnecessary, but we had it before and was seen to be problematic in #17172 (comment) so it might help with triggering the bug), so perhaps this can be used for reproducing/debugging on Windows:

import puppeteer from "puppeteer";

async function reproduce() {
  const browser = await puppeteer.launch({
    product: "firefox",
    protocol: "webDriverBiDi",
    headless: false,
  });
  const page = await browser.newPage();
  for (const page of await browser.pages()) {
    await page.close();
  }
  await browser.close();
}

await reproduce();

Save this as repro.js and run it with node repro.js. On Linux this works, but I can only hope this fails on Windows. If this does work, it could be that we're perhaps hitting a Windows-only BiDi bug triggered by Jasmine or our test runner code because the browser does seem to start and load the Jasmine test runner since we do get to

pdf.js/test/unit/testreporter.js

Line 57 in 76e3e52

sendInfo(`Started ${total} tests for ${browser} with seed ${seed}.`);

in the traceback. However, after that it seems to crash because no actual Jasmine tests are run.

I hope this all helps to debug the issue, and if I can somehow provide more information, feel free to let me know.

whimboo · 2023-12-11T22:10:15Z

Thank you for the update Tim! To get even more details of what might go wrong I would suggest the following:

For the Puppeteer options add the dumpio field and set it to true.
Especially for Firefox set the preference remote.log.level="Trace" which will enable trace logs
Upload the changes to the CI and trigger the relevant tests

I could then have a look at the logs tomorrow to see what might go wrong on our side.

But you could also additionally add the DEBUG="puppeteer:* environment variable (maybe as a separate log) so that we can see which puppeteer commands and events are actually in use.

timvandermeij · 2023-12-17T20:39:21Z

@whimboo Sorry for the delay here. I have made #17432 to apply the debugging suggestions you mentioned (removed all other workflows and limited the font tests workflow to just Windows to make sure the logs are as usable as possible).

You can find the build here but since I don't know how long the retention is I have also downloaded the full raw logs and attach them here as logs.txt. Fortunately on Windows the run crashes pretty quickly after Puppeteer has started, but it's still quite a bit of logs that I hope will be helpful for debugging the issue.

The final logs when the exception happens are:

1702845086027	RemoteAgent	DEBUG	WebDriverBiDiConnection 0cc49e5b-65db-42dd-a862-46623e93bec5 <- {"type":"event","method":"browsingContext.contextDestroyed","params":{"url":"chrome://gfxsanity/content/sanitytest.html","children":null,"parent":null}}
file:///D:/a/pdf.js/pdf.js/node_modules/puppeteer-core/lib/esm/puppeteer/bidi/Connection.js:123
            throw new Error(`BrowsingContext ${contextId} does not exist.`);

This looks interesting to me; I'm not sure what "chrome://gfxsanity/content/sanitytest.html" is exactly, but maybe that's the anomaly that causes this only on Windows somehow since I don't recognize this URL as something that we use or trigger ourselves.

whimboo · 2023-12-18T10:19:46Z

Oh wow! Yes, this is indeed a problem with the sanity test window then! This window should not be used at all and we have it disabled for geckodriver and Marionette but forgot about it in Puppeteer. So temporarily you could set the preference media.sanity-test.disabled to true. Mind checking if that (alone) works? If yes I'll go ahead and provide a patch for Puppeteer.

whimboo · 2023-12-20T16:17:12Z

I've created a PR for Puppeteer so that it should be fixed with the next release.

timvandermeij · 2023-12-23T18:24:34Z

@whimboo I have created #17457 and can confirm that setting media.sanity-test.disabled to true fixes the issue on Windows. You can find the logs here: https://github.com/mozilla/pdf.js/actions/runs/7309787007/job/19917756293?pr=17457. Thank you for making the patch upstream; we'll await a new release then and update the test code on our side as well to respect non-zero exit codes.

whimboo · 2024-01-04T13:56:13Z

A new puppeteer release is out which contains my fix:
https://github.com/puppeteer/puppeteer/releases/tag/puppeteer-core-v21.7.0

So you could get this issue fixed by upgrading Puppeteer. CC @calixteman.

timvandermeij · 2024-01-06T13:11:27Z

I'm reopening this because even though the upstream issue is fixed now we still need to update our Gulpfile to not ignore non-zero exit codes, to prevent this from going by unnoticed again in the future.

Snuffleupagus added the test label Dec 9, 2023

timvandermeij self-assigned this Dec 9, 2023

Snuffleupagus changed the title ~~Failing GiHub Actions font-test was marked as successful~~ Failing GitHub Actions font-test was marked as successful Dec 9, 2023

timvandermeij removed their assignment Dec 10, 2023

timvandermeij added upstream regression labels Dec 10, 2023

timvandermeij mentioned this issue Dec 17, 2023

[DO NOT MERGE] Debug Puppeteer with font tests #17432

Closed

whimboo mentioned this issue Dec 20, 2023

fix: disable GFX sanity window for Firefox and enable WebDriver BiDi CI jobs for Windows puppeteer/puppeteer#11578

Merged

Snuffleupagus linked a pull request Jan 4, 2024 that will close this issue

Update Puppeteer to version 21.7.0 #17477

Merged

calixteman closed this as completed in #17477 Jan 4, 2024

timvandermeij reopened this Jan 6, 2024

timvandermeij removed regression upstream labels Jan 6, 2024

timvandermeij self-assigned this Jan 21, 2024

timvandermeij mentioned this issue Jan 21, 2024

Don't ignore test.mjs child process exit codes in the Gulpfile #17555

Merged

timvandermeij closed this as completed in #17555 Jan 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing GitHub Actions font-test was marked as successful #17396

Failing GitHub Actions font-test was marked as successful #17396

Snuffleupagus commented Dec 9, 2023

timvandermeij commented Dec 9, 2023

timvandermeij commented Dec 10, 2023 •

edited

Loading

Snuffleupagus commented Dec 10, 2023

timvandermeij commented Dec 10, 2023 •

edited

Loading

whimboo commented Dec 11, 2023

sadym-chromium commented Dec 11, 2023

whimboo commented Dec 11, 2023

Snuffleupagus commented Dec 11, 2023

timvandermeij commented Dec 11, 2023 •

edited

Loading

whimboo commented Dec 11, 2023

timvandermeij commented Dec 17, 2023 •

edited

Loading

whimboo commented Dec 18, 2023

whimboo commented Dec 20, 2023

timvandermeij commented Dec 23, 2023

whimboo commented Jan 4, 2024

timvandermeij commented Jan 6, 2024

Failing GitHub Actions font-test was marked as successful #17396

Failing GitHub Actions font-test was marked as successful #17396

Comments

Snuffleupagus commented Dec 9, 2023

timvandermeij commented Dec 9, 2023

timvandermeij commented Dec 10, 2023 • edited Loading

Snuffleupagus commented Dec 10, 2023

timvandermeij commented Dec 10, 2023 • edited Loading

whimboo commented Dec 11, 2023

sadym-chromium commented Dec 11, 2023

whimboo commented Dec 11, 2023

Snuffleupagus commented Dec 11, 2023

timvandermeij commented Dec 11, 2023 • edited Loading

whimboo commented Dec 11, 2023

timvandermeij commented Dec 17, 2023 • edited Loading

whimboo commented Dec 18, 2023

whimboo commented Dec 20, 2023

timvandermeij commented Dec 23, 2023

whimboo commented Jan 4, 2024

timvandermeij commented Jan 6, 2024

timvandermeij commented Dec 10, 2023 •

edited

Loading

timvandermeij commented Dec 10, 2023 •

edited

Loading

timvandermeij commented Dec 11, 2023 •

edited

Loading

timvandermeij commented Dec 17, 2023 •

edited

Loading