Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel/test-assert-esm-cjs-message-verify is flaky #53962

Closed
richardlau opened this issue Jul 20, 2024 · 4 comments · Fixed by #53967
Closed

parallel/test-assert-esm-cjs-message-verify is flaky #53962

richardlau opened this issue Jul 20, 2024 · 4 comments · Fixed by #53967
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. linux Issues and PRs related to the Linux platform. ppc Issues and PRs related to the Power architecture. s390 Issues and PRs related to the s390 architecture.

Comments

@richardlau
Copy link
Member

Test

parallel/test-assert-esm-cjs-message-verify

Platform

Linux PPC64LE, Linux s390x, Linux x64

Console output

00:56:31 not ok 18 parallel/test-assert-esm-cjs-message-verify
00:56:31   ---
00:56:31   duration_ms: 330.23900
00:56:31   severity: fail
00:56:31   exitcode: 1
00:56:31   stack: |-
00:56:31     TAP version 13
00:56:31     # Subtest: ensure the assert.ok throwing similar error messages for esm and cjs files
00:56:31         # Subtest: should return code 1 for each command
00:56:31         ok 1 - should return code 1 for each command
00:56:31           ---
00:56:31           duration_ms: 129.378294
00:56:31           ...
00:56:31         1..1
00:56:31     not ok 1 - ensure the assert.ok throwing similar error messages for esm and cjs files
00:56:31       ---
00:56:31       duration_ms: 132.938811
00:56:31       type: 'suite'
00:56:31       location: '/home/iojs/build/workspace/node-test-commit-linux/test/parallel/test-assert-esm-cjs-message-verify.js:26:1'
00:56:31       failureType: 'hookFailed'
00:56:31       error: "EEXIST: file already exists, mkdir '/home/iojs/node-tmp/.tmp.15'"
00:56:31       code: 'EEXIST'
00:56:31       stack: |-
00:56:31         Object.mkdirSync (node:fs:1359:26)
00:56:31         Object.refresh (/home/iojs/build/workspace/node-test-commit-linux/test/common/tmpdir.js:36:6)
00:56:31         SuiteContext.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/test/parallel/test-assert-esm-cjs-message-verify.js:49:12)
00:56:31         TestHook.runInAsyncScope (node:async_hooks:206:9)
00:56:31         TestHook.run (node:internal/test_runner/test:866:25)
00:56:31         TestHook.run (node:internal/test_runner/test:1117:18)
00:56:31         TestHook.run (node:internal/util:545:20)
00:56:31         node:internal/test_runner/test:786:20
00:56:31         process.processTicksAndRejections (node:internal/process/task_queues:95:5)
00:56:31         async Suite.runHook (node:internal/test_runner/test:784:7)
00:56:31       ...
00:56:31     1..1
00:56:31     # tests 1
00:56:31     # suites 1
00:56:31     # pass 1
00:56:31     # fail 0
00:56:31     # cancelled 0
00:56:31     # skipped 0
00:56:31     # todo 0
00:56:31     # duration_ms 144.242093
00:56:31   ...

Build links

Additional information

This shows up in today's Reliability report https://github.com/nodejs/reliability/actions/runs/10015836224 (issue creation failed because the "body is too long (maximum is 65536 characters"):

image

It looks like the test sporadically failed on Windows previously (according to past reliability reports) but the Linux failures are only showing up in today's report.

I can also reproduce this locally on Linux x64 (RHEL 9) with

./tools/test.py --repeat 1000 -J parallel/test-assert-esm-cjs-message-verify

Failures rates vary (seeing anything between 4-10 failures per 1000 runs).

@richardlau richardlau added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Jul 20, 2024
@github-actions github-actions bot added ppc Issues and PRs related to the Power architecture. s390 Issues and PRs related to the s390 architecture. linux Issues and PRs related to the Linux platform. labels Jul 20, 2024
@richardlau
Copy link
Member Author

I ran a git bisect run with

#!/usr/bin/env bash
./configure --verbose
make -j 88
tools/test.py --repeat=1000 -J parallel/test-assert-esm-cjs-message-verify
$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# bad: [cf8e5356d9fbbdca40d9b9fe062b3c292b7e91e3] lib: improve error message when index not found on cjs
git bisect bad cf8e5356d9fbbdca40d9b9fe062b3c292b7e91e3
# status: waiting for good commit(s), bad commit known
# good: [0b1ff6965e3bb3cedfe3563f4ddefc33e7fa69d2] src: fix potential segmentation fault in SQLite
git bisect good 0b1ff6965e3bb3cedfe3563f4ddefc33e7fa69d2
# good: [3d019ce6af35399d78617d4df57ce248d3d37d83] cli: document `--inspect` port `0` behavior
git bisect good 3d019ce6af35399d78617d4df57ce248d3d37d83
# bad: [a523c345b153616a8265f8691ebfa8547273f27a] inspector: add initial support for network inspection
git bisect bad a523c345b153616a8265f8691ebfa8547273f27a
# bad: [7168295e7a94e6cfef69e43d26d7d5b57d3e1cf9] fs: move `rmSync` implementation to c++
git bisect bad 7168295e7a94e6cfef69e43d26d7d5b57d3e1cf9
# good: [cafd44dc7eff20cfa6c36b289e24efd793d4422a] esm: refactor `get_format`
git bisect good cafd44dc7eff20cfa6c36b289e24efd793d4422a
# first bad commit: [7168295e7a94e6cfef69e43d26d7d5b57d3e1cf9] fs: move `rmSync` implementation to c++
$

which is pointing to #53617 (cc @nodejs/fs @anonrig ).

@aduh95
Copy link
Contributor

aduh95 commented Jul 21, 2024

Yagiz and I discussed this privately, Yagiz investigation led him to think the flakiness was introduced because the old implementation was using setTimeout (not IO blocking), and the new one is using sleep (IO blocking). IMO that's a valid change, having a sync method not IO blocking was a weird design (according to Yagiz, it was due to rm and rmSync sharing the same code), so my recommendation would be to keep the IO blocking implementation – but not land it on any release line, to not risk break our users that would rely on that behavior, like this test was.

targos pushed a commit that referenced this issue Jul 28, 2024
PR-URL: #53967
Fixes: #53962
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
@targos
Copy link
Member

targos commented Jul 30, 2024

According to https://github.com/nodejs/node/pull/53617/files#diff-102db77122b5d23f42d497abfc7d626e60723d911caa6293fd16a5ec92b0d653L220, the old implementation was using sleep (I don't see how setTimeout could be used correctly in a synchronous function)

RafaelGSS pushed a commit that referenced this issue Aug 5, 2024
PR-URL: #53967
Fixes: #53962
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
marco-ippolito pushed a commit that referenced this issue Aug 19, 2024
PR-URL: #53967
Fixes: #53962
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
marco-ippolito pushed a commit that referenced this issue Aug 19, 2024
PR-URL: #53967
Fixes: #53962
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
marco-ippolito pushed a commit that referenced this issue Aug 19, 2024
PR-URL: #53967
Fixes: #53962
Reviewed-By: Rafael Gonzaga <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Reviewed-By: Marco Ippolito <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. linux Issues and PRs related to the Linux platform. ppc Issues and PRs related to the Power architecture. s390 Issues and PRs related to the s390 architecture.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants