cluster: worker.disconnect() goes into forever loop though worker.isConnected() === true #6561

viliusl · 2016-05-04T09:26:20Z

System info:

$ node -v
v4.4.3

$ uname -a
Darwin localhost 15.2.0 Darwin Kernel Version 15.2.0: Fri Nov 13 19:56:56 PST 2015; root:xnu-3248.20.55~2/RELEASE_X86_64 x86_64

verified for node@5 and node@6.

code:

'use strict';
const http = require('http'),
  cluster = require('cluster');

if (cluster.isMaster) {
  http.createServer().listen(3000);
  cluster.fork();
} else {
  process.on('uncaughtException', e => {
    console.log('uncaught', e);
    if (cluster.worker.isConnected() && !cluster.worker.isDead()) {
      cluster.worker.disconnect();
    }
  });
  http.createServer().listen(3000);
}

output:

$ node test
uncaught { Error: bind EADDRINUSE null:3000
    at Object.exports._errnoException (util.js:896:11)
    at exports._exceptionWithHostPort (util.js:919:20)
    at cb (net.js:1311:16)
    at rr (cluster.js:620:14)
    at Worker.<anonymous> (cluster.js:590:9)
    at process.<anonymous> (cluster.js:750:8)
    at emitTwo (events.js:111:20)
    at process.emit (events.js:191:7)
    at handleMessage (internal/child_process.js:718:10)
    at Pipe.channel.onread (internal/child_process.js:444:11)
  code: 'EADDRINUSE',
  errno: 'EADDRINUSE',
  syscall: 'bind',
  address: null,
  port: 3000 }
uncaught TypeError: Cannot read property 'apply' of undefined
    at process.<anonymous> (cluster.js:750:7)
    at emitTwo (events.js:111:20)
    at process.emit (events.js:191:7)
    at handleMessage (internal/child_process.js:718:10)
    at Pipe.channel.onread (internal/child_process.js:444:11)
uncaught TypeError: Cannot read property 'apply' of undefined
    at process.<anonymous> (cluster.js:750:7)
    at emitTwo (events.js:111:20)
    at process.emit (events.js:191:7)
    at handleMessage (internal/child_process.js:718:10)
    at Pipe.channel.onread (internal/child_process.js:444:11)

and it keeps going forever.

My expectations in this case:

given worker says it's connected, worker.disconnect() should be safe.

Or maybe I'm just doing smth wrong here.

The text was updated successfully, but these errors were encountered:

sam-github · 2016-05-04T14:00:19Z

You are doing something wrong, do not make async calls in an uncaughtException handler.

You don't need to disconnect, rethrow the exception (or don't catch it at all), and let the worker exit. The disconnect will happen automatically by the system.

viliusl · 2016-05-04T18:47:18Z

@sam-github thanks for clarification. Still it's quite strange and non-uniform behaviour.

Say code:

'use strict';
const http = require('http'),
  cluster = require('cluster');

if (cluster.isMaster) {
  cluster.on('disconnect', worker => {
    console.log('worker disconnected');
    worker.kill();
  });
  cluster.on('exit', worker => {
    console.log('worker exited');
    worker.kill();
  });

  const worker = cluster.fork();
} else {
  process.on('uncaughtException', e => {
    console.log('uncaught', e);
    if (cluster.worker.isConnected() && !cluster.worker.isDead()) {
      cluster.worker.disconnect();
    }
  });
  http.createServer().listen(3000, () => {throw new Error('qwe')});
}

or code:

'use strict';
const http = require('http'),
  cluster = require('cluster');

if (cluster.isMaster) {
  cluster.on('disconnect', worker => {
    console.log('worker disconnected');
    worker.kill();
  });
  cluster.on('exit', worker => {
    console.log('worker exited');
    worker.kill();
  });

  const worker = cluster.fork();
} else {
  process.on('uncaughtException', e => {
    console.log('uncaught', e);
    if (cluster.worker.isConnected() && !cluster.worker.isDead()) {
      cluster.worker.disconnect();
    }
  });
  process.nextTick(() => {throw new Error('qwe')});
}

yields output:

uncaught [Error: qwe]
worker disconnected
worker exited

Seems like same case (uncaught exception), but behaviour is different. I'm not sure if EADDRINUSE is somehow special error with specific handling, but from my (user) perspective both are uncaughtException's.

If you see this behaviour as correct and considering I'm doing a no-no thing (async in uncaughtException handler) you can close this issue.

Cheers,
V

sam-github · 2016-05-06T20:04:26Z

Sorry, should be more clear. I do think this is a bug, we should fix it (if we can, I assume we can).

However, you should also not be doing the thing you are doing, even if we did fix it.

The test in this commit runs correctly if IPC messages are properly consumed and emitted. Otherwise, the test times out. Fixes: nodejs#6561 PR-URL: nodejs#6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

When a worker is disconnecting, it shuts down all of the handles it is waiting on. It is possible that a handle does not have an owner, which causes a crash. This commit closes such handles without accessing the missing owner. Fixes: nodejs#6561 PR-URL: nodejs#6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>

This test checks that ownerless cluster worker handles are closed correctly on disconnection. Fixes: nodejs#6561 PR-URL: nodejs#6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

viliusl · 2016-05-25T05:30:26Z

Thanks alot!

cluster's internal message handling includes a cache of callback functions. Once the message for that callback is received, it is removed from the cache. If, for any reason, the same message ID is processed twice, the callback will be missing from the cache and cluster will try to call undefined as a function. This commit guards against this scenario. Refs: nodejs#6561 PR-URL: nodejs#6902 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>

Currently, if an IPC event handler throws an error, it can cause the message to not be consumed, leading to messages piling up. This commit causes IPC events to be emitted on the next tick, allowing the channel's processing logic to move forward as normal. Fixes: nodejs#6561 PR-URL: nodejs#6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>

The test in this commit runs correctly if IPC messages are properly consumed and emitted. Otherwise, the test times out. Fixes: nodejs#6561 PR-URL: nodejs#6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

When a worker is disconnecting, it shuts down all of the handles it is waiting on. It is possible that a handle does not have an owner, which causes a crash. This commit closes such handles without accessing the missing owner. Fixes: nodejs#6561 PR-URL: nodejs#6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>

This test checks that ownerless cluster worker handles are closed correctly on disconnection. Fixes: nodejs#6561 PR-URL: nodejs#6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

cluster's internal message handling includes a cache of callback functions. Once the message for that callback is received, it is removed from the cache. If, for any reason, the same message ID is processed twice, the callback will be missing from the cache and cluster will try to call undefined as a function. This commit guards against this scenario. Refs: nodejs#6561 PR-URL: nodejs#6902 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>

Currently, if an IPC event handler throws an error, it can cause the message to not be consumed, leading to messages piling up. This commit causes IPC events to be emitted on the next tick, allowing the channel's processing logic to move forward as normal. Fixes: #6561 PR-URL: #6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>

The test in this commit runs correctly if IPC messages are properly consumed and emitted. Otherwise, the test times out. Fixes: #6561 PR-URL: #6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

When a worker is disconnecting, it shuts down all of the handles it is waiting on. It is possible that a handle does not have an owner, which causes a crash. This commit closes such handles without accessing the missing owner. Fixes: #6561 PR-URL: #6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>

This test checks that ownerless cluster worker handles are closed correctly on disconnection. Fixes: #6561 PR-URL: #6909 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>

cluster's internal message handling includes a cache of callback functions. Once the message for that callback is received, it is removed from the cache. If, for any reason, the same message ID is processed twice, the callback will be missing from the cache and cluster will try to call undefined as a function. This commit guards against this scenario. Refs: #6561 PR-URL: #6902 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]>