-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http.Agent tcp keepalive not working #41965
Comments
Currently, the agent is not setting |
@theanarkh, thanks for looking. I agree with your code analysis. I just got back to this late last week. I have completed some experimentation, so far focusing on Windows out of convenience, with Linux testing to follow. The following two snippets behave very differently. Again, the problem I am trying to solve is to engage keepalive for long running http requests. The code immediately below starts TCP keepalive, but only AFTER the first request completes (consistent with theanarkh's comment). async function issueRequest() {
const request = http.request({
host : process.env.SERVER_HOST,
port : process.env.SERVER_PORT,
path : "/tcp-keepalive-test",
method : "GET",
keepAlive : true,
agent : new http.Agent({
keepAlive : true,
keepAliveMsecs : 1000 * 10,
}),
});
const start = new Date();
const result = await new Promise(resolve => {
request.on("response", response => {
let responseBody = "";
response.on("data", data => {
responseBody += data;
});
response.on("end", () => {
resolve(responseBody);
});
});
request.end();
});
const stop = new Date();
console.log(`start: ${start.toISOString()}, stop: ${stop.toISOString()}, result: ${result}`);
} And the following code behaves as desired, initiating keepalive pings while waiting for the long running request to complete. However, on Windows at least, the initialDelay value passed in to setKeepAlive appears to be ignored, with keepAlive packets always initiating 1 minute after the initial request was sent. async function issueRequest() {
const request = http.request({
host : process.env.SERVER_HOST,
port : process.env.SERVER_PORT,
path : "/tcp-keepalive-test",
method : "GET",
});
const start = new Date();
const result = await new Promise(resolve => {
request.on("socket", (sock) => {
sock.setKeepAlive(true, /* 1000 * 10 */ 1);
});
request.on("response", response => {
let responseBody = "";
response.on("data", data => {
responseBody += data;
});
response.on("end", () => {
resolve(responseBody);
});
});
request.end();
});
const stop = new Date();
console.log(`start: ${start.toISOString()}, stop: ${stop.toISOString()}, result: ${result}`);
} I'm hoping that the Linux implementation honors the initialDelay parameter. This portion of the documentation indicates keepalive won't support a 15 minute request if initialDelay is tied to 1 minute like Windows (https://nodejs.org/api/net.html#socketsetkeepaliveenable-initialdelay):
Next step for me is Linux experimentation. I'm guessing that it is best to set Linux defaults (eg https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html) as nodejs does not offer socket support for TCP_KEEPCNT or TCP_KEEPINTVL. |
Some more progress. We have deployed v16.14.0 via AWS Elastic Beanstalk, which is the most recent version supported by beanstalk. Nodejs hard codes the delay in this version as per >https://github.com/nodejs/node/blob/b6b65101873c32655c8d71b4d73363d624f58770/deps/uv/src/unix/stream.c>: if (stream->type == UV_TCP) {
if ((stream->flags & UV_HANDLE_TCP_NODELAY) && uv__tcp_nodelay(fd, 1))
return UV__ERR(errno);
/* TODO Use delay the user passed in. */
if ((stream->flags & UV_HANDLE_TCP_KEEPALIVE) &&
uv__tcp_keepalive(fd, 1, 60)) {
return UV__ERR(errno);
}
} So that would explain the hard coded 60 second delay I witnessed on both Linux and Windows. Further the 16.14.0 version of node/deps/uv/src/unix/tcp.c leaves TCP_KEEPINTVL and TCP_KEEPCNT at system defaults. It seems that https://nodejs.org/docs/latest-v16.x/api/net.html#socketsetkeepaliveenable-initialdelay does not document setKeepAlive correctly. The code I shared works identically for Linux and Windows, and it does support 15 minute http requests because, contrary to the reference I was using, TCP_KEEPCNT is the number of consecutive failures that must accrue before socket teardown and NOT the total number of probes that will be sent. So I should be good as of now, though still need to deploy to production and verify. That said, there is a bug preventing keepalive from working on the first request when passing keepalive parameters into new http.Agent(). Note: I ran across #38445 which appears to be related. |
The |
Version
v16.14.0
Platform
Linux ip-192-168-4-97.us-west-2.compute.internal 5.10.96-90.460.amzn2.x86_64 #1 SMP Fri Feb 4 17:12:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
http
What steps will reproduce the bug?
Client:
Server:
How often does it reproduce? Is there a required condition?
100% reproducible.
What is the expected behavior?
TCP keepalive packets should be seen in the TCP stream every 10 seconds. This shows all of the packets between client and server and the keepalives are absent:
This shows all captured keepalive packets, and the only keepalives present are associated with the ssh session used to invoke the client:
What do you see instead?
No keepalive packets are captured.
Additional information
The motivation for this is to solve a problem we have in AWS where long running transactions complete successfully, yet the result is not returned to the client. Packet captures in this environment show that there are no keepalives.
The text was updated successfully, but these errors were encountered: