[cluster.js] removeHandlesForWorker() on both 'exit' and 'disconnect' events. #9418

jshkurti · 2015-03-16T19:57:30Z

[cluster.js][worker] Calls removeHandlesForWorker(worker) on both 'exit' and 'disconnect' event instead of just 'disconnect' because sometimes it happens that 'disconnect' event isn't emitted at all in which case some handles will be left and when all workers will be dead an exception raised "AssertionError: Resource leak detected".

Hello, I have previously posted this issue : #9409
I finally came up with this piece of code which intentionally triggers the excpetion :

var http    = require('http');
var cluster = require('cluster');

if (cluster.isMaster) {

  cluster.on('disconnect', function(worker, code, signal) {
    console.log('worker ' + worker.process.pid + ' disconnected');
  });
  cluster.on('exit', function(worker, code, signal) {
    console.log('worker ' + worker.process.pid + ' died');
  });

  var worker = cluster.fork();
  worker.process.removeAllListeners('disconnect');

  setInterval(function keepAlive() {}, 1000);

} else {
  console.log('Hello, Im the worker with pid', process.pid);

  setTimeout(function() {
    process.exit(0);
  }, 2000);

  http.createServer(function(req, res) {
    res.writeHead(200);
    res.end("hello world\n");
  }).listen(8000);
}

And finally I found the solution which is to double check that the handles are removed when a worker dies even if it does not emit the 'disconnect' event but only the 'exit' event as it happens sometimes.
Calling removeHandlesForWorker() twice isn't much of a problem because it does nothing if the worker has already been removed from that handle.
Thanks.

[cluster.js][worker] Calls ��removeHandlesForWorker(worker) on both 'exit' and 'disconnect' event instead of just 'disconnect' because sometimes it happens that 'disconnect' event isn't emitted at all in which case some handles with be left and when all workers will be dead an exception raised "AssertionError: Resource leak detected".

jshkurti · 2015-03-18T19:12:51Z

@misterdjules @sam-github @tjfontaine : Any thoughts on this ?

misterdjules · 2015-03-18T21:41:30Z

@jshkurti Thank you for the pull-request! For now, most of the team is focused on releasing v0.12.1, which is why it's going to take some time to review it. It's at the top of my review list though, so I'll get there soon.

sam-github · 2015-03-19T05:00:47Z

A couple comments.

this isn't a reproduction... its deliberately damaging the cluster module by removing all listeners for disconnect... I could add a line to remove all exit listeners, too, and then things would still be "broken"
is it possible that you actually have code that is calling removeAllListeners, and that you are accidently provoking this scenario?
when child processes exit, the o/s closes their IPC pipes. unquestionably. if disconnect is not firing, I can only think of a few reasons why:
- exit before the ipc channel was made
- damage to the event system (what you are demoing above)
- .. some other bug in node

Figuring out why disconnect isn't occurring should be the focus, I think, but I'll look carefully at your patch to see if it is safe.

jshkurti · 2015-03-19T15:21:56Z

Thank you for the quick response :)

1. this isn't a reproduction... its deliberately damaging the cluster module by removing all listeners for disconnect... I could add a line to remove all exit listeners, too, and then things would still be "broken"

I agree, it was the only way to trigger the exception on purpose though.
In production it happens quite randomly when a worker dies.

2. is it possible that you actually have code that is calling removeAllListeners, and that you are accidently provoking this scenario?

No, I double-checked everything. As I said above it happens quite randomly. Let's say once in 50 restarts. For some reason the worker dies without firing 'disconnect' or maybe 'disconnect' is fired after 'exit'.

I updated the code :

worker.process.once('exit', function(exitCode, signalCode) {
  /*
  * Remove the handles associated with this
  * worker a second time just in case 'disconnect'
  * event isn't emitted.
  */
  if (worker.state != 'disconnected')
    removeHandlesForWorker(worker);

to avoid redundant calls to removeHandlesForWorker() and thus only call it if 'exit' event is fired before 'disconnect'.

gzurbach · 2015-06-09T14:45:36Z

Any update on this? We also use clusters and every few hours or so all Node processes are down because of this bug. We did not have this issue last week with 0.10.

jshkurti · 2015-06-09T14:52:06Z

+1

You can use this workaround for now :
#9409 (comment)

gzurbach · 2015-06-09T15:05:58Z

Thanks I will look at this today. I was also considering using PM2 and stop managing my own clusters but I'd rather not add another dependency on my servers.

jasnell · 2015-08-15T18:32:39Z

@jshkurti @gzurbach ... how would you like to proceed on this one? If this is something that needs to be fixed in v0.12, then opening a new PR targeted there would be best. Otherwise, this may need to be moved over to http://github.com/nodejs/node master.

nullivex · 2015-10-07T06:34:56Z

I am still seeing this on 4.1.2

Stopping mock cluster...

assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: Resource leak detected.
    at removeWorker (cluster.js:328:9)
    at ChildProcess.<anonymous> (cluster.js:348:34)
    at ChildProcess.g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at ChildProcess.emit (events.js:172:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)
      2) "after all" hook

alsotang · 2015-10-23T08:44:36Z

os: Linux VM_56_101_centos 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
node version: 4.2.1

Each time I kill a worker, the master exit together with AssertionError: Resource leak detected.

the cluster file is simple:

var cluster = require('cluster');
var http = require('http');

var app = require('../app');

//var numCPUs = require('os').cpus().length;
var numCPUs = 1; // 等离线统计分离出来后, 再将核数调整上去


var port = process.env.PORT || '3000';

if (cluster.isMaster) {
    // Fork workers.
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }

    cluster.on('fork', function (worker) {
        console.log('[%s] [worker:%d] new worker start', Date(), worker.process.pid);
    })

    cluster.on('exit', function (worker) {
        console.error('[%s] [master:%s] wroker:%s disconnect, suicide: %s, state: %s.',
            Date(), process.pid, worker.process.pid, worker.suicide, worker.state);
        setTimeout(function () {
            console.log('restart worker...')
            cluster.fork();
        }, 1000)
    })
} else {
    http.createServer(app).listen(port);
}

jshkurti · 2015-10-30T18:42:56Z

nodejs/node#3510

Nodejs-Jenkins added pr master labels Mar 16, 2015

tjfontaine removed master pr labels Mar 16, 2015

Nodejs-Jenkins added pr master labels Mar 16, 2015

Update cluster.js

33a4c12

jshkurti mentioned this pull request Mar 20, 2015

[Cluster] AssertionError: Resource leak detected. #9409

Closed

jasnell added the cluster label Aug 27, 2015

Splaktar mentioned this pull request Sep 18, 2015

Resource leak that brings down the Hub gdg-x/hub#54

Closed

jshkurti closed this Oct 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cluster.js] removeHandlesForWorker() on both 'exit' and 'disconnect' events. #9418

[cluster.js] removeHandlesForWorker() on both 'exit' and 'disconnect' events. #9418

jshkurti commented Mar 16, 2015

jshkurti commented Mar 18, 2015

misterdjules commented Mar 18, 2015

sam-github commented Mar 19, 2015

jshkurti commented Mar 19, 2015

gzurbach commented Jun 9, 2015

jshkurti commented Jun 9, 2015

gzurbach commented Jun 9, 2015

jasnell commented Aug 15, 2015

nullivex commented Oct 7, 2015

alsotang commented Oct 23, 2015

jshkurti commented Oct 30, 2015

[cluster.js] removeHandlesForWorker() on both 'exit' and 'disconnect' events. #9418

[cluster.js] removeHandlesForWorker() on both 'exit' and 'disconnect' events. #9418

Conversation

jshkurti commented Mar 16, 2015

jshkurti commented Mar 18, 2015

misterdjules commented Mar 18, 2015

sam-github commented Mar 19, 2015

jshkurti commented Mar 19, 2015

gzurbach commented Jun 9, 2015

jshkurti commented Jun 9, 2015

gzurbach commented Jun 9, 2015

jasnell commented Aug 15, 2015

nullivex commented Oct 7, 2015

alsotang commented Oct 23, 2015

jshkurti commented Oct 30, 2015