Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: worker.disconnect() goes into forever loop though worker.isConnected() === true #6561

Closed
viliusl opened this issue May 4, 2016 · 4 comments
Labels
cluster Issues and PRs related to the cluster subsystem.

Comments

@viliusl
Copy link

viliusl commented May 4, 2016

System info:

$ node -v
v4.4.3
$ uname -a
Darwin localhost 15.2.0 Darwin Kernel Version 15.2.0: Fri Nov 13 19:56:56 PST 2015; root:xnu-3248.20.55~2/RELEASE_X86_64 x86_64

verified for node@5 and node@6.

code:

'use strict';
const http = require('http'),
  cluster = require('cluster');

if (cluster.isMaster) {
  http.createServer().listen(3000);
  cluster.fork();
} else {
  process.on('uncaughtException', e => {
    console.log('uncaught', e);
    if (cluster.worker.isConnected() && !cluster.worker.isDead()) {
      cluster.worker.disconnect();
    }
  });
  http.createServer().listen(3000);
}

output:

$ node test
uncaught { Error: bind EADDRINUSE null:3000
    at Object.exports._errnoException (util.js:896:11)
    at exports._exceptionWithHostPort (util.js:919:20)
    at cb (net.js:1311:16)
    at rr (cluster.js:620:14)
    at Worker.<anonymous> (cluster.js:590:9)
    at process.<anonymous> (cluster.js:750:8)
    at emitTwo (events.js:111:20)
    at process.emit (events.js:191:7)
    at handleMessage (internal/child_process.js:718:10)
    at Pipe.channel.onread (internal/child_process.js:444:11)
  code: 'EADDRINUSE',
  errno: 'EADDRINUSE',
  syscall: 'bind',
  address: null,
  port: 3000 }
uncaught TypeError: Cannot read property 'apply' of undefined
    at process.<anonymous> (cluster.js:750:7)
    at emitTwo (events.js:111:20)
    at process.emit (events.js:191:7)
    at handleMessage (internal/child_process.js:718:10)
    at Pipe.channel.onread (internal/child_process.js:444:11)
uncaught TypeError: Cannot read property 'apply' of undefined
    at process.<anonymous> (cluster.js:750:7)
    at emitTwo (events.js:111:20)
    at process.emit (events.js:191:7)
    at handleMessage (internal/child_process.js:718:10)
    at Pipe.channel.onread (internal/child_process.js:444:11)

and it keeps going forever.

My expectations in this case:

  • given worker says it's connected, worker.disconnect() should be safe.

Or maybe I'm just doing smth wrong here.

@mscdex mscdex added cluster Issues and PRs related to the cluster subsystem. v4.x labels May 4, 2016
@sam-github
Copy link
Contributor

You are doing something wrong, do not make async calls in an uncaughtException handler.

You don't need to disconnect, rethrow the exception (or don't catch it at all), and let the worker exit. The disconnect will happen automatically by the system.

@viliusl
Copy link
Author

viliusl commented May 4, 2016

@sam-github thanks for clarification. Still it's quite strange and non-uniform behaviour.

Say code:

'use strict';
const http = require('http'),
  cluster = require('cluster');

if (cluster.isMaster) {
  cluster.on('disconnect', worker => {
    console.log('worker disconnected');
    worker.kill();
  });
  cluster.on('exit', worker => {
    console.log('worker exited');
    worker.kill();
  });

  const worker = cluster.fork();
} else {
  process.on('uncaughtException', e => {
    console.log('uncaught', e);
    if (cluster.worker.isConnected() && !cluster.worker.isDead()) {
      cluster.worker.disconnect();
    }
  });
  http.createServer().listen(3000, () => {throw new Error('qwe')});
}

or code:

'use strict';
const http = require('http'),
  cluster = require('cluster');

if (cluster.isMaster) {
  cluster.on('disconnect', worker => {
    console.log('worker disconnected');
    worker.kill();
  });
  cluster.on('exit', worker => {
    console.log('worker exited');
    worker.kill();
  });

  const worker = cluster.fork();
} else {
  process.on('uncaughtException', e => {
    console.log('uncaught', e);
    if (cluster.worker.isConnected() && !cluster.worker.isDead()) {
      cluster.worker.disconnect();
    }
  });
  process.nextTick(() => {throw new Error('qwe')});
}

yields output:

uncaught [Error: qwe]
worker disconnected
worker exited

Seems like same case (uncaught exception), but behaviour is different. I'm not sure if EADDRINUSE is somehow special error with specific handling, but from my (user) perspective both are uncaughtException's.

If you see this behaviour as correct and considering I'm doing a no-no thing (async in uncaughtException handler) you can close this issue.

Cheers,
V

@sam-github
Copy link
Contributor

Sorry, should be more clear. I do think this is a bug, we should fix it (if we can, I assume we can).

However, you should also not be doing the thing you are doing, even if we did fix it.

cjihrig pushed a commit to cjihrig/node that referenced this issue May 24, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
cjihrig added a commit to cjihrig/node that referenced this issue May 24, 2016
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
cjihrig pushed a commit to cjihrig/node that referenced this issue May 24, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@viliusl
Copy link
Author

viliusl commented May 25, 2016

Thanks alot!

cjihrig added a commit to cjihrig/node that referenced this issue May 25, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: nodejs#6561
PR-URL: nodejs#6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue May 30, 2016
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue May 30, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue May 30, 2016
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue May 30, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: nodejs#6561
PR-URL: nodejs#6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Fishrock123 pushed a commit to Fishrock123/node that referenced this issue May 30, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: nodejs#6561
PR-URL: nodejs#6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
rvagg pushed a commit that referenced this issue Jun 2, 2016
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
rvagg pushed a commit that referenced this issue Jun 2, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
rvagg pushed a commit that referenced this issue Jun 2, 2016
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
rvagg pushed a commit that referenced this issue Jun 2, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
rvagg pushed a commit that referenced this issue Jun 2, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 1, 2016
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 1, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 1, 2016
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 1, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 11, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 11, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Currently, if an IPC event handler throws an error, it can
cause the message to not be consumed, leading to messages piling
up. This commit causes IPC events to be emitted on the next tick,
allowing the channel's processing logic to move forward as
normal.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
The test in this commit runs correctly if IPC messages are
properly consumed and emitted. Otherwise, the test times out.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
When a worker is disconnecting, it shuts down all of the handles
it is waiting on. It is possible that a handle does not have an
owner, which causes a crash. This commit closes such handles
without accessing the missing owner.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
This test checks that ownerless cluster worker handles are closed
correctly on disconnection.

Fixes: #6561
PR-URL: #6909
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
cluster's internal message handling includes a cache of callback
functions. Once the message for that callback is received, it is
removed from the cache. If, for any reason, the same message ID
is processed twice, the callback will be missing from the cache
and cluster will try to call undefined as a function. This commit
guards against this scenario.

Refs: #6561
PR-URL: #6902
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants