-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket timeout - race condition #57
Comments
Did you ever get demonstrable code? Is this a confirmed issue or not? |
@tony-gutierrez haven’t got a code which reproduces this. It happens sporadically on production environment. So hard to add debug there. |
So I played with the code, trying to generate this race condition, but could not. Is there any node documentation on the socket "free" event? I do not see it in the docs for net.Socket: https://nodejs.org/api/net.html#net_class_net_socket |
See this new impl code, the reuse free socket will reset |
I can confirm that we have an issue that sounds very similar to this. Sporadic connection reset just at the start of requests. Had to stop using agentkeepalive for now. Node 10.11, agentkeepalive 4.0.0. |
@ronag May I ask, what are you using now? |
@admirkadriu we are not using keep-alive anymore. The risk out-weights the gains at the moment. |
Confirm the issue, |
@loris can you show me the error message and stack? |
@fengmk2 Sure, here it is:
|
@loris do you handle the error event on using https://github.com/szmarczak/http-timer#usage ? You need to handle request error event by yourself. const request = https.get('https://httpbin.org/anything');
const timings = timer(request);
request.on('response', response => {
response.on('data', () => {}); // Consume the data somehow
response.on('end', () => {
console.log(timings);
});
});
request.on('error', err => {
// handle request error here
console.error(err);
}); |
@fengmk2 Actually i'm not using
|
I'm also getting same error, when using agentkeepalive with got.
|
I think we have the similar issue with I can reproduce the problem using our integration test (it does not happen every time, but very regularly). Switching to 8.x silences the error. Because the integration tests are using several emulators and Redis I also cannot easily create small code sample. I've tried to check what really happens with debug for the lib enabled but I was not able to get anywhere, but I can share the logs if needed. I was trying to make a fix but without success. I'm able to change the nodejs crash because of unhandled exception, but then there are 'socket hung up' errors. There is a else part of
After this change there is no unhandled errors, but then I've also tried to check what node 10.x change may be the cause and I have found that I can help testing any other idea anyone have. |
I used this module on a project years ago but with recent node updates I started getting socket timeout errors. After switching over to the standard const agentOptions = {
keepAlive: true,
timeout: 30000
};
const httpAgent = new http.Agent(agentOptions);
const httpsAgent = new https.Agent(agentOptions); |
If I understand correctly, the problem with going native is that Node fires an event on timeout, but doesn't actually close the socket: https://nodejs.org/api/net.html#net_event_timeout This is why a lib like agentkeepalive makes a difference in environments like Azure which have really low available ports (~140 per app service), correct? |
Original Agent does not have the same behavior as this lib AFAIK. If there is no waiting requests, connection is closed when if keepAlive = true. This lib waits defined time even if there is no new requests, so there is no problem with available ports. Of course all depends on the traffic. If there is always some request waiting to send, both Agents work the same way, maybe except ECONNRESET errors. |
I see a few potential issues here:
It also seems as if |
Atomics are only relevant for worker threads... not sure how that is relevant here? |
@gluwer This seems to indicate that the native agent does keep unused sockets around?? |
@tony-gutierrez Yes. This is also likely the reason that all works fine in most situations, but because of the way free sockets are handled there is socket hang out (other side is not closing the socket normally). Native keep alive agent does not handles that automatically, but the agentkeepalive makes a retry in this case I suppose. |
I can reproduce this issue 100% of time with following configure: const options = {
maxSockets: 100,
maxFreeSockets: 10,
timeout: 60000, // active socket keepalive for 60 seconds
freeSocketTimeout: 30000, // free socket keepalive for 30 seconds
}
const httpAgent = new HttpAgent(options)
const httpsAgent = new HttpsAgent(options)
const client = axios.create({ httpAgent, httpsAgent })
await client.get('..')
await sleep(5000) // wait for exact 5 seconds, sleep is our wrapper for setTimeout as promise
await client.get('..') change sleep time to 4 second or 6 second, it went without issue, however, at 5 seconds, it is 100% run into connection reset error version 4.1.3 |
Update: I tried same code using node's http.Agent, instead of use agentkeepalive wrapper, it has exactly same issue, and I tried both node 12 and node 14, same so, this bug, as least which I observed, is underline bug of node.js |
https://nodejs.org/api/http.html#http_request_reusedsocket Is your server a node server using the default timeout? |
Hey, got more logs (activated debug mode):
Seems that a request was inflight when socket timeout cas called |
For those of you still struggling with this I can recommend trying https://www.npmjs.com/package/undici which does not suffer from the same problem. |
thanks, but we cant refacto all our code to use another http client, + we are using this keepaliveagent with http-proxy and got client |
@cschockaert: This has been open for almost 3 years now. I don't think it's likely to get fixed. |
well i try it, seeing that there is still some activity :) @SimenB ;) |
Wooch, was in node 10, and not node 12... retrying with node 12.19.0 :) |
Update: our pod is running since 17 hours, and still no errors / crash at all. Seems that bumping node from 10.X to 12.19.0 resolved my issue |
Not sure why I'm tagged? 😅 |
still appear to be having this in node 16. My situation is somewhat unique though. I'm creating a connection over a unix socket rather than a url |
For your situation, wrt 5 seconds, is it possible it's due to the same issue described in this post? https://medium.com/ssense-tech/reduce-networking-errors-in-nodejs-23b4eb9f2d83. where the default nodejs's keepAliveTimeout is 5 seconds. |
The default server-side timeout is 5000 milliseconds, to avoid ECONNRESET exceptions, we set the default value to `4000` milliseconds. See 'race condition' detail on #57
The default server-side timeout is 5000 milliseconds, to avoid ECONNRESET exceptions, we set the default value to `4000` milliseconds. See 'race condition' detail on #57
Fixed no [email protected] |
Thanks all. |
While using the library in my code, I get sporadic connection reset caused by the use of the agent.
The problem is with the onTimeout() function, let’s say we have a socket which has been idle for 4.99 seconds and the freeSocketTimeout is 5 seconds, the agent is giving me this specific free socket to use on my next request, once the socket is given to the application it does what it needs and the timeout has been fired, but because there is no preemption, the onTimout will get fired once my request is on flight, hence getting destroyed by the agent while my app is using it.
I suggest you’ll add a flag to the socket ‘inUse’ and on a timeout event destroy only when inUse is false.
I’ll try to create a code which reproduces this 100% and attach it here.
Latest version of the module, node 8.9 (happens on previous versions as well).
The text was updated successfully, but these errors were encountered: