Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process: emit IPC messages on next tick #6909

Merged
merged 4 commits into from
May 24, 2016
Merged

Conversation

cjihrig
Copy link
Contributor

@cjihrig cjihrig commented May 20, 2016

Checklist
  • tests and code linting passes
  • the commit message follows commit guidelines
Affected core subsystem(s)

child_process

Description of change

Currently, if an IPC event handler throws an error, it can cause the message to not be consumed, leading to messages piling up. This commit causes IPC events to be emitted on the next tick, allowing the channel's processing logic to move forward as normal.

Refs: #6561
Refs: #6902

R= @santigimeno @bnoordhuis

@nodejs-github-bot nodejs-github-bot added the child_process Issues and PRs related to the child_process subsystem. label May 20, 2016
@santigimeno
Copy link
Member

I like this solution better.

LGTM if CI agrees.

@bnoordhuis
Copy link
Member

LGTM. This is also a fix for #3072, I think? Is it possible to write a reliable regression test?

@santigimeno
Copy link
Member

santigimeno commented May 21, 2016

I have a test: santigimeno@c07eae8, that passes with this change, and never exits without it. Feel free to use it / modify it if it works for you.

@cjihrig
Copy link
Contributor Author

cjihrig commented May 21, 2016

Test LGTM. I'll pull it into this PR.

@jasnell
Copy link
Member

jasnell commented May 21, 2016

LGTM with the test added and green CI

@bnoordhuis
Copy link
Member

Test LGTM2. @cjihrig Consider stress-testing it before you land the PR.

@cjihrig
Copy link
Contributor Author

cjihrig commented May 23, 2016

Normal CI: https://ci.nodejs.org/job/node-test-pull-request/2742/. Will stress test next.

@cjihrig
Copy link
Contributor Author

cjihrig commented May 23, 2016

The test timed out on Windows due to the different scheduling policy. I've added another commit, which addresses it, but for whatever reason, GitHub isn't currently showing it.

Here it is. @bnoordhuis and @santigimeno what do you think?

--- a/lib/cluster.js
+++ b/lib/cluster.js
@@ -719,7 +719,11 @@ function workerInit() {
       const handle = handles[key];
       delete handles[key];
       waitingCount++;
-      handle.owner.close(checkWaitingCount);
+
+      if (handle.owner)
+        handle.owner.close(checkWaitingCount);
+      else
+        handle.close(checkWaitingCount);
     }

     checkWaitingCount();

@cjihrig cjihrig force-pushed the next-tick branch 2 times, most recently from c5efa43 to 7cfabab Compare May 23, 2016 16:34
@cjihrig
Copy link
Contributor Author

cjihrig commented May 23, 2016

GitHub seems to be working again. CI after latest commit: https://ci.nodejs.org/job/node-test-pull-request/2744/

@cjihrig
Copy link
Contributor Author

cjihrig commented May 23, 2016

CI is green, except for one PPC machine that seems hung or something.

@santigimeno
Copy link
Member

santigimeno commented May 23, 2016

LGTM.

The last commit seems to fix a different issue though. A couple of comments:

  • Is SCHED_RR usable on Windows? or is it simply not recommended? If the latter, maybe the SchedulingPolicy of the cluster could be set unconditionally to RR in the test so it works the same on every platform.
  • Maybe it's worth adding a different test that exercises the ownerless handles problem on all the platforms, not only on Windows. I have created this if you want to use it: santigimeno@e8f284b

@cjihrig
Copy link
Contributor Author

cjihrig commented May 24, 2016

CI with all four commits: https://ci.nodejs.org/job/node-test-pull-request/2765/

@cjihrig
Copy link
Contributor Author

cjihrig commented May 24, 2016

CI was all green except for one flake (test-stdout-close-catch - #6918).

Stress test (Ubuntu): https://ci.nodejs.org/job/node-stress-single-test/737/
Stress test (Windows): https://ci.nodejs.org/job/node-stress-single-test/738/

@bnoordhuis
Copy link
Member

LGTM. Looks like a flake on the fbsd10 buildbot.

cjihrig and others added 4 commits May 24, 2016 12:20
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@cjihrig cjihrig merged commit f0a07d9 into nodejs:master May 24, 2016
@cjihrig
Copy link
Contributor Author

cjihrig commented May 24, 2016

Thanks all. Landed in d59917b, dd21bd9, aadfe6c, and f0a07d9.

@cjihrig cjihrig deleted the next-tick branch May 24, 2016 16:24
Fishrock123 pushed a commit to Fishrock123/node that referenced this pull request May 30, 2016
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Fishrock123 pushed a commit to Fishrock123/node that referenced this pull request May 30, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@MylesBorins
Copy link
Contributor

@cjihrig lts?

@cjihrig
Copy link
Contributor Author

cjihrig commented Jun 3, 2016

IMO yes. It keeps the IPC channel from getting constipated.

MylesBorins pushed a commit that referenced this pull request Jul 1, 2016
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 1, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 1, 2016
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 1, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 11, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 12, 2016
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 12, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 12, 2016
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 12, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Jul 12, 2016
cjihrig added a commit to cjihrig/node that referenced this pull request Jun 24, 2017
This commit fixes a regression related to IPC 'message'
events. When messages are not emitted in the next tick,
a 'message' handler that throws can break the IPC read
loop.

Refs: nodejs#6909
Refs: nodejs#13459
Refs: nodejs#13648
PR-URL: nodejs#13856
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
addaleax pushed a commit that referenced this pull request Jun 29, 2017
This commit fixes a regression related to IPC 'message'
events. When messages are not emitted in the next tick,
a 'message' handler that throws can break the IPC read
loop.

Refs: #6909
Refs: #13459
Refs: #13648
PR-URL: #13856
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
addaleax pushed a commit that referenced this pull request Jul 11, 2017
This commit fixes a regression related to IPC 'message'
events. When messages are not emitted in the next tick,
a 'message' handler that throws can break the IPC read
loop.

Refs: #6909
Refs: #13459
Refs: #13648
PR-URL: #13856
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
addaleax pushed a commit that referenced this pull request Jul 18, 2017
This commit fixes a regression related to IPC 'message'
events. When messages are not emitted in the next tick,
a 'message' handler that throws can break the IPC read
loop.

Refs: #6909
Refs: #13459
Refs: #13648
PR-URL: #13856
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants