-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
child_process: workaround fd passing issue on OS X #7572
Conversation
// a platform bug, the handle has to be closed after it has been received | ||
// by the target process. | ||
if (handle && !options.keepOpen) { | ||
if ((process.platform === 'darwin') && target) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any harm in dropping the darwin checks here and in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I also think it's better not to special-case.
PR updated addressing @bnoordhuis and @cjihrig comments. Also, it includes modifications to make sure the |
if (handle && !options.keepOpen) { | ||
if (target) { | ||
assert(!target._pendingHandle); | ||
target._pendingHandle = handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should target._pendingHandle
be a list or a Set? ISTM that it's not inconceivable for another handle to be sent while still waiting for the ACK for the first handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that was not possible because the messages/handles where stored in _handleQueue
while there were pending ACK's: https://github.com/nodejs/node/blob/master/lib/internal/child_process.js#L566-575. Is this assumption correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. That's a bit of an implicit relationship, though. Maybe you can add a comment explaining the dependency on _handleQueue
?
LGTM with a comment. |
Added requested comment. Not too sure of the wording though. |
LGTM
It's fine. The next person knows they'll need to look at _handleQueue and that's the important thing. |
for (let i = 0; i < N; ++i) { | ||
const worker = fork(__filename, ['child', common.PORT + i]); | ||
worker.once('message', common.mustCall((msg, handle) => { | ||
assert.equal(msg, 'handle'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use strictEqual()
here and throughout.
LGTM with a couple comments. |
Test updated. Rebased and squashed into a single commit. |
const fork = require('child_process').fork; | ||
const net = require('net'); | ||
|
||
const N = 80; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's a good idea to spawn 80 processes. You might hit the ulimit on some systems and it will be very slow on the less powerful buildbots. It takes about 500 ms to start a process on the rpi1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. That was one of the reasons why initially I was running this test only on OS X
. I chose 80 because it was the magic number that caused the test to fail in my OS X
almost 100% of the time. Maybe it's not a bad idea to run the test only in OS X
as it seems an OS X
only bug, and the basic functionality the test is checking is already covered by test-cluster-net-send
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you could skip if '67'.includes(process.config.variables.arm_version)
, that would filter out the ARMv6 and ARMv7 buildbots (i.e., the rpis.)
Updated so it skips the test in armv6 and armv7. |
Unfortunately this change breaks |
Strangely, the problem in |
@bnoordhuis @cjihrig Thoughts? |
@santigimeno Are you asking for a re-review or is there an open question we should answer? |
@bnoordhuis basically a review. I'm not sure it's a good idea not waiting for |
Incorporated @cjihrig suggestion and squashed into a single commit. Last CI was green. I'll land this tomorrow if there's no objection. |
One more CI run as there were some problems in the arm bots: |
There's an issue on some `OS X` versions when passing fd's between processes. When the handle associated to a specific file descriptor is closed by the sender process before it's received in the destination, the handle is indeed closed while it should remain opened. In order to fix this behaviour, don't close the handle until the `NODE_HANDLE_ACK` is received by the sender. Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but creating lots of workers, so the issue reproduces on `OS X` consistently. Fixes: nodejs#7512 PR-URL: nodejs#7572 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Last CI: https://ci.nodejs.org/job/node-test-pull-request/3765/ |
Landed in db6253f. Thanks! |
There's an issue on some `OS X` versions when passing fd's between processes. When the handle associated to a specific file descriptor is closed by the sender process before it's received in the destination, the handle is indeed closed while it should remain opened. In order to fix this behaviour, don't close the handle until the `NODE_HANDLE_ACK` is received by the sender. Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but creating lots of workers, so the issue reproduces on `OS X` consistently. Fixes: #7512 PR-URL: #7572 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
@santigimeno / @bnoordhuis / @cjihrig should this be backported? |
@thealphanerd I think so. Do you need a backport PR? |
yes please |
There's an issue on some `OS X` versions when passing fd's between processes. When the handle associated to a specific file descriptor is closed by the sender process before it's received in the destination, the handle is indeed closed while it should remain opened. In order to fix this behaviour, don't close the handle until the `NODE_HANDLE_ACK` is received by the sender. Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but creating lots of workers, so the issue reproduces on `OS X` consistently. Fixes: nodejs#7512 Ref: nodejs#7572 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
@thealphanerd backport here: #8904 |
There's an issue on some `OS X` versions when passing fd's between processes. When the handle associated to a specific file descriptor is closed by the sender process before it's received in the destination, the handle is indeed closed while it should remain opened. In order to fix this behaviour, don't close the handle until the `NODE_HANDLE_ACK` is received by the sender. Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but creating lots of workers, so the issue reproduces on `OS X` consistently. Fixes: #7512 Ref: #8904 PR-URL: #7572 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
There's an issue on some `OS X` versions when passing fd's between processes. When the handle associated to a specific file descriptor is closed by the sender process before it's received in the destination, the handle is indeed closed while it should remain opened. In order to fix this behaviour, don't close the handle until the `NODE_HANDLE_ACK` is received by the sender. Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but creating lots of workers, so the issue reproduces on `OS X` consistently. Fixes: #7512 Ref: #8904 PR-URL: #7572 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
There's an issue on some `OS X` versions when passing fd's between processes. When the handle associated to a specific file descriptor is closed by the sender process before it's received in the destination, the handle is indeed closed while it should remain opened. In order to fix this behaviour, don't close the handle until the `NODE_HANDLE_ACK` is received by the sender. Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but creating lots of workers, so the issue reproduces on `OS X` consistently. Fixes: #7512 Ref: #8904 PR-URL: #7572 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
There's an issue on some `OS X` versions when passing fd's between processes. When the handle associated to a specific file descriptor is closed by the sender process before it's received in the destination, the handle is indeed closed while it should remain opened. In order to fix this behaviour, don't close the handle until the `NODE_HANDLE_ACK` is received by the sender. Added `test-child-process-pass-fd` that is basically `test-cluster-net-send` but creating lots of workers, so the issue reproduces on `OS X` consistently. Fixes: #7512 Ref: #8904 PR-URL: #7572 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Checklist
make -j4 test
(UNIX), orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
child_process
Description of change
There's an issue on some
OS X
versions when passing fd's between processes.When the handle associated to a specific file descriptor is closed by the sender
process before it's received in the destination, the handle is indeed closed
while it should remain opened. As a workaround for
OS X
, don't close thehandle until the
NODE_HANDLE_ACK
is received by the sender.Added
test-child-process-pass-fd
that is basicallytest-cluster-net-send
butcreating lots of workers, so the issue reproduces on
OS X
consistently.Fixes: #7512
/cc @bnoordhuis