Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process: workaround fd passing issue on OS X #7572

Merged
merged 1 commit into from
Aug 20, 2016

Conversation

santigimeno
Copy link
Member

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

child_process

Description of change

There's an issue on some OS X versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. As a workaround for OS X, don't close the
handle until the NODE_HANDLE_ACK is received by the sender.
Added test-child-process-pass-fd that is basically test-cluster-net-send but
creating lots of workers, so the issue reproduces on OS X consistently.

Fixes: #7512

/cc @bnoordhuis

@santigimeno santigimeno added child_process Issues and PRs related to the child_process subsystem. macos Issues and PRs related to the macOS platform / OSX. labels Jul 6, 2016
@Fishrock123
Copy link
Contributor

// a platform bug, the handle has to be closed after it has been received
// by the target process.
if (handle && !options.keepOpen) {
if ((process.platform === 'darwin') && target) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any harm in dropping the darwin checks here and in the tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I also think it's better not to special-case.

@santigimeno
Copy link
Member Author

santigimeno commented Jul 7, 2016

PR updated addressing @bnoordhuis and @cjihrig comments. Also, it includes modifications to make sure the _pendingHandle is closed even if the NODE_HANDLE_ACK is not received due to IPC channel disconnection. Thanks

if (handle && !options.keepOpen) {
if (target) {
assert(!target._pendingHandle);
target._pendingHandle = handle;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should target._pendingHandle be a list or a Set? ISTM that it's not inconceivable for another handle to be sent while still waiting for the ACK for the first handle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that was not possible because the messages/handles where stored in _handleQueue while there were pending ACK's: https://github.com/nodejs/node/blob/master/lib/internal/child_process.js#L566-575. Is this assumption correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. That's a bit of an implicit relationship, though. Maybe you can add a comment explaining the dependency on _handleQueue?

@bnoordhuis
Copy link
Member

LGTM with a comment.

@santigimeno
Copy link
Member Author

Added requested comment. Not too sure of the wording though.

@bnoordhuis
Copy link
Member

LGTM

Added requested comment. Not too sure of the wording though.

It's fine. The next person knows they'll need to look at _handleQueue and that's the important thing.

for (let i = 0; i < N; ++i) {
const worker = fork(__filename, ['child', common.PORT + i]);
worker.once('message', common.mustCall((msg, handle) => {
assert.equal(msg, 'handle');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use strictEqual() here and throughout.

@cjihrig
Copy link
Contributor

cjihrig commented Jul 11, 2016

LGTM with a couple comments.

@santigimeno
Copy link
Member Author

Test updated. Rebased and squashed into a single commit.

const fork = require('child_process').fork;
const net = require('net');

const N = 80;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's a good idea to spawn 80 processes. You might hit the ulimit on some systems and it will be very slow on the less powerful buildbots. It takes about 500 ms to start a process on the rpi1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. That was one of the reasons why initially I was running this test only on OS X. I chose 80 because it was the magic number that caused the test to fail in my OS X almost 100% of the time. Maybe it's not a bad idea to run the test only in OS X as it seems an OS X only bug, and the basic functionality the test is checking is already covered by test-cluster-net-send.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could skip if '67'.includes(process.config.variables.arm_version), that would filter out the ARMv6 and ARMv7 buildbots (i.e., the rpis.)

@santigimeno
Copy link
Member Author

Updated so it skips the test in armv6 and armv7.
CI: https://ci.nodejs.org/job/node-test-pull-request/3285/

@santigimeno
Copy link
Member Author

Unfortunately this change breaks test-cluster-send-handle-twice on win10. I'll try to look into it.

@santigimeno
Copy link
Member Author

Strangely, the problem in win10 was that the NODE_HANDLE_ACK was received before the req.oncomplete was called which in turn calls the postSend method. Calling the postSend method just after sending the handle through the IPC seems to fix the issue (not completely sure it's the correct solution though).

@santigimeno
Copy link
Member Author

@santigimeno
Copy link
Member Author

@bnoordhuis @cjihrig Thoughts?

@bnoordhuis
Copy link
Member

@santigimeno Are you asking for a re-review or is there an open question we should answer?

@santigimeno
Copy link
Member Author

@bnoordhuis basically a review. I'm not sure it's a good idea not waiting for req.oncomplete to call postSend though, but that's the only way I could find to fix the win10 issue.

@santigimeno
Copy link
Member Author

@santigimeno
Copy link
Member Author

Incorporated @cjihrig suggestion and squashed into a single commit. Last CI was green. I'll land this tomorrow if there's no objection.

@santigimeno
Copy link
Member Author

@santigimeno
Copy link
Member Author

One more CI run as there were some problems in the arm bots:
https://ci.nodejs.org/job/node-test-pull-request/3752/

There's an issue on some `OS X` versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. In order to fix this behaviour, don't close the
handle until the `NODE_HANDLE_ACK` is received by the sender.
Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but
creating lots of workers, so the issue reproduces on `OS X` consistently.

Fixes: nodejs#7512
PR-URL: nodejs#7572
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@santigimeno
Copy link
Member Author

Last CI: https://ci.nodejs.org/job/node-test-pull-request/3765/
All is green except for test-crypto-timing-safe-equal failures.

@santigimeno santigimeno merged commit db6253f into nodejs:master Aug 20, 2016
@santigimeno
Copy link
Member Author

Landed in db6253f. Thanks!

evanlucas pushed a commit that referenced this pull request Aug 24, 2016
There's an issue on some `OS X` versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. In order to fix this behaviour, don't close the
handle until the `NODE_HANDLE_ACK` is received by the sender.
Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but
creating lots of workers, so the issue reproduces on `OS X` consistently.

Fixes: #7512
PR-URL: #7572
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@MylesBorins
Copy link
Contributor

@santigimeno / @bnoordhuis / @cjihrig should this be backported?

@santigimeno
Copy link
Member Author

@thealphanerd I think so. Do you need a backport PR?

@MylesBorins
Copy link
Contributor

yes please

santigimeno added a commit to santigimeno/node that referenced this pull request Oct 3, 2016
There's an issue on some `OS X` versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. In order to fix this behaviour, don't close the
handle until the `NODE_HANDLE_ACK` is received by the sender.
Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but
creating lots of workers, so the issue reproduces on `OS X` consistently.

Fixes: nodejs#7512
Ref: nodejs#7572
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@santigimeno santigimeno mentioned this pull request Oct 3, 2016
4 tasks
@santigimeno
Copy link
Member Author

@thealphanerd backport here: #8904

MylesBorins pushed a commit that referenced this pull request Oct 4, 2016
There's an issue on some `OS X` versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. In order to fix this behaviour, don't close the
handle until the `NODE_HANDLE_ACK` is received by the sender.
Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but
creating lots of workers, so the issue reproduces on `OS X` consistently.

Fixes: #7512
Ref: #8904
PR-URL: #7572
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Oct 10, 2016
There's an issue on some `OS X` versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. In order to fix this behaviour, don't close the
handle until the `NODE_HANDLE_ACK` is received by the sender.
Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but
creating lots of workers, so the issue reproduces on `OS X` consistently.

Fixes: #7512
Ref: #8904
PR-URL: #7572
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
rvagg pushed a commit that referenced this pull request Oct 18, 2016
There's an issue on some `OS X` versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. In order to fix this behaviour, don't close the
handle until the `NODE_HANDLE_ACK` is received by the sender.
Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but
creating lots of workers, so the issue reproduces on `OS X` consistently.

Fixes: #7512
Ref: #8904
PR-URL: #7572
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Oct 26, 2016
There's an issue on some `OS X` versions when passing fd's between processes.
When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. In order to fix this behaviour, don't close the
handle until the `NODE_HANDLE_ACK` is received by the sender.
Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but
creating lots of workers, so the issue reproduces on `OS X` consistently.

Fixes: #7512
Ref: #8904
PR-URL: #7572
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Oct 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. macos Issues and PRs related to the macOS platform / OSX.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue passing file descriptors in OS X
5 participants