Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: investigate - async-hooks/test-graph.signal #14568

Closed
refack opened this issue Aug 1, 2017 · 11 comments
Closed

test: investigate - async-hooks/test-graph.signal #14568

refack opened this issue Aug 1, 2017 · 11 comments
Labels
async_hooks Issues and PRs related to the async hooks subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. process Issues and PRs related to the process subsystem.

Comments

@refack
Copy link
Contributor

refack commented Aug 1, 2017

  • Version: master
  • Platform: linux
  • Subsystem: async_hooks

https://ci.nodejs.org/job/node-test-commit-linux/11523/nodes=centos5-64

not ok 26 async-hooks/test-graph.signal
  ---
  duration_ms: 0.155
  severity: fail
  stack: |-
    Mismatched onsigusr2 function calls. Expected exactly 2, actual 0.
        at Object.exports.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-64/test/common/index.js:475:10)
        at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-64/test/async-hooks/test-graph.signal.js:16:30)
        at Module._compile (module.js:573:30)
        at Object.Module._extensions..js (module.js:584:10)
        at Module.load (module.js:507:32)
        at tryModuleLoad (module.js:470:12)
        at Function.Module._load (module.js:462:3)
        at Function.Module.runMain (module.js:609:10)
        at startup (bootstrap_node.js:158:16)
  ...

code

...
hooks.enable();
process.on('SIGUSR2', common.mustCall(onsigusr2, 2));

let count = 0;
exec(`kill -USR2 ${process.pid}`);
...

/cc @nodejs/async_hooks @Fishrock123 @thlorenz

@refack refack added async_hooks Issues and PRs related to the async hooks subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. process Issues and PRs related to the process subsystem. labels Aug 1, 2017
@refack
Copy link
Contributor Author

refack commented Aug 1, 2017

Making sure it's just a flake: https://ci.nodejs.org/job/node-test-commit-linux/11526/

@BridgeAR
Copy link
Member

@refack do you still know the outcome of that CI?

@nodejs/async_hooks it would be great if you could have a look at this. Having a green CI for the next Code & Learn would be really neat!

@refack
Copy link
Contributor Author

refack commented Sep 28, 2017

It is a flake (did not reproduce under normal load).

@Trott
Copy link
Member

Trott commented Nov 18, 2017

https://ci.nodejs.org/job/node-test-commit-linux/14214/nodes=centos7-64/consoleFull:

not ok 27 async-hooks/test-graph.signal
  ---
  duration_ms: 0.205
  severity: fail
  stack: |-
    Mismatched onsigusr2Again function calls. Expected exactly 1, actual 0.
        at Object.exports.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64/test/common/index.js:490:10)
        at process.onsigusr2 (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64/test/async-hooks/test-graph.signal.js:30:34)
        at process.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64/test/common/index.js:522:15)
        at process.emit (events.js:159:13)
        at Signal.wrap.onsignal (internal/process.js:198:44)

@Trott Trott mentioned this issue Nov 18, 2017
3 tasks
@Trott
Copy link
Member

Trott commented Dec 6, 2017

ubuntu1604_sharedlibs_fips20_x64 too:

https://ci.nodejs.org/job/node-test-commit-linux-linked/nodes=ubuntu1604_sharedlibs_fips20_x64/620/consoleFull

03:19:01 not ok 27 async-hooks/test-graph.signal
03:19:01   ---
03:19:01   duration_ms: 0.255
03:19:01   severity: fail
03:19:01   stack: |-
03:19:01     Mismatched onsigusr2Again function calls. Expected exactly 1, actual 0.
03:19:01         at Object.exports.mustCall (/home/iojs/build/workspace/node-test-commit-linux-linked/nodes/ubuntu1604_sharedlibs_fips20_x64/test/common/index.js:488:10)
03:19:01         at process.onsigusr2 (/home/iojs/build/workspace/node-test-commit-linux-linked/nodes/ubuntu1604_sharedlibs_fips20_x64/test/async-hooks/test-graph.signal.js:30:34)
03:19:01         at process.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux-linked/nodes/ubuntu1604_sharedlibs_fips20_x64/test/common/index.js:522:15)
03:19:01         at process.emit (events.js:126:13)
03:19:01         at Signal.wrap.onsignal (internal/process.js:198:44)
03:19:01   ...

@Trott
Copy link
Member

Trott commented Dec 6, 2017

Stress test under load (96 concurrent processes): https://ci.nodejs.org/job/node-stress-single-test/1556/nodes=centos5-64/

(UPDATE: clean)

@Trott
Copy link
Member

Trott commented Dec 6, 2017

Now trying a stress test without load, because wondering if maybe it's when something happens too fast or something:

https://ci.nodejs.org/job/node-stress-single-test/1557/nodes=centos5-64/console

(UPDATE: clean)

@Trott
Copy link
Member

Trott commented Dec 6, 2017

My theory at this point is that exec being asynchronous and/or kill returns control to the caller before the target of the signal has responded to it, and nothing is keeping the event loop open, so there's no reason the process just can't exit without ever receiving the signal.

If that makes sense, the solution might be to use an interval or something to keep the event loop open. Will experiment with that, but would prefer to be able to reproduce first...

@Trott
Copy link
Member

Trott commented Dec 6, 2017

Seeing if I have any better luck with a centos7-64 stress test:

1 process: https://ci.nodejs.org/job/node-stress-single-test/1558/nodes=centos7-64/

96 processes: https://ci.nodejs.org/job/node-stress-single-test/1559/nodes=centos7-64/

@Trott
Copy link
Member

Trott commented Dec 6, 2017

Hooray! The CentOS7-64 with 96 simultaneous processes is showing failures! So now I can try out a fix...

not ok 1745 async-hooks/test-graph.signal
  ---
  duration_ms: 8.664
  severity: fail
  stack: |-
    Mismatched onsigusr2Again function calls. Expected exactly 1, actual 0.
        at Object.exports.mustCall (/home/iojs/build/workspace/node-stress-single-test/nodes/centos7-64/test/common/index.js:488:10)
        at process.onsigusr2 (/home/iojs/build/workspace/node-stress-single-test/nodes/centos7-64/test/async-hooks/test-graph.signal.js:30:34)
        at process.<anonymous> (/home/iojs/build/workspace/node-stress-single-test/nodes/centos7-64/test/common/index.js:522:15)
        at process.emit (events.js:126:13)
        at Signal.wrap.onsignal (internal/process.js:198:44)

Trott added a commit to Trott/io.js that referenced this issue Dec 6, 2017
Make sure event loop remains open long enough for signal to be received.

Fixes: nodejs#14568
@Trott
Copy link
Member

Trott commented Dec 6, 2017

Proposed fix: #17509

@Trott Trott closed this as completed in 8997026 Dec 8, 2017
MylesBorins pushed a commit that referenced this issue Dec 12, 2017
Make sure event loop remains open long enough for signal to be received.

PR-URL: #17509
Fixes: #14568
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
MylesBorins pushed a commit that referenced this issue Dec 12, 2017
Make sure event loop remains open long enough for signal to be received.

PR-URL: #17509
Fixes: #14568
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
MylesBorins pushed a commit that referenced this issue Dec 12, 2017
Make sure event loop remains open long enough for signal to be received.

PR-URL: #17509
Fixes: #14568
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
gibfahn pushed a commit that referenced this issue Dec 19, 2017
Make sure event loop remains open long enough for signal to be received.

PR-URL: #17509
Fixes: #14568
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
gibfahn pushed a commit that referenced this issue Dec 20, 2017
Make sure event loop remains open long enough for signal to be received.

PR-URL: #17509
Fixes: #14568
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. process Issues and PRs related to the process subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants