Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set a default (http|https).Agent with keep-alive #37184

Closed
mcollina opened this issue Feb 2, 2021 · 18 comments
Closed

Set a default (http|https).Agent with keep-alive #37184

mcollina opened this issue Feb 2, 2021 · 18 comments
Labels
http Issues or PRs related to the http subsystem.

Comments

@mcollina
Copy link
Member

mcollina commented Feb 2, 2021

In almost every deployment of Node.js, my first configuration is to set an Agent whenever we are doing an http call.
The overhead of creating new sockets or even TLS context is really high, and we should tend to reuse those as much as possible. This mostly impact users of fetch, which mostly run with a global polyfill and they do not want to set custom Node.js options when calling fetch itself.

Historically Node.js was limited to 4 outgoing connections. This created many issues and the default agent was removed entirely.

One of the problems of keep alive connections is that the socket might be timed out without Node.js knowing about it, resulting in a timeout error to the end user. The new scheduling: 'lifo' option greatly reduced the number of timeouts.

@nodejs/http @nodejs/tsc wdyt?

@dnlup
Copy link
Contributor

dnlup commented Feb 2, 2021

I have been setting up keepAlive agents almost all the time too, so I tend to agree here.

@ronag
Copy link
Member

ronag commented Feb 2, 2021

I think we need to add support for keep-alive timeout hints (similar to undici) before we do this.

Also maybe only enable this behavior by default if the server provides this hint.

@ronag
Copy link
Member

ronag commented Feb 2, 2021

Just to clarify the problem with this is that there is a race between the client and the server and the only way to avoid this is by using this hint. scheduling: 'lifo' reduces the likelihood of encountering this race but it can still happen.

@trivikr
Copy link
Member

trivikr commented Feb 2, 2021

In AWS SDK for JavaScript (v3), we also enabled keep-alive by default after customers requested it.

As per the HTTP/1.1 spec explained in RFC 2616, the persistent connections are the default behavior. I agree that the HTTP client of Node.js should enable persistent connections by default.

@mcollina
Copy link
Member Author

mcollina commented Feb 2, 2021

In AWS SDK for JavaScript (v3), we also enabled keep-alive by default after customers requested it.

@trivikr I turn that on also for v2 :).

@mcollina
Copy link
Member Author

mcollina commented Feb 2, 2021

@ronag should we add a new option to mitigate this? Essentially consider connections without a 'hint' with an configurable hint of 5 seconds by default?

@PoojaDurgad PoojaDurgad added the http Issues or PRs related to the http subsystem. label Feb 9, 2021
@ShogunPanda
Copy link
Contributor

I'm currently working on this issue. Enabling was pretty easy (I just had to fix tests mostly)

@mcollina @ronag 2 questions for you:

  1. So far I've set a default timeout of 1 minute for the global agent. Should I change it to something less?
  2. Can you confirm at the moment the HTTP client or agent don't handle Keep-Alive hints from the server? If this is affermative, I'll implement this as well.

@ronag
Copy link
Member

ronag commented Jun 17, 2022

So far I've set a default timeout of 1 minute for the global agent. Should I change it to something less?

I think that might be a good idea. I believe we use 4s default for undici. Might also make sense to try and parse the keep-alive response hint.

Can you confirm at the moment the HTTP client or agent don't handle Keep-Alive hints from the server? If this is affermative, I'll implement this as well.

It doesn't.

@mcollina
Copy link
Member Author

I think we can ship this if we are able to support a couple of "shutdown test" without waiting.

http.request() against an http server on the same process, that respond and call server.close() in the handler or after the response has been received. In both those cases, Node.js should not wait 4s to close the process.

@ShogunPanda
Copy link
Contributor

I agree.
A second similar case is when the server hints a second while the agent uses the default. In that case Node should disconnect within a sec.
I will take care of both cases.

@ShogunPanda
Copy link
Contributor

ShogunPanda commented Jun 20, 2022

@mcollina I tried to implement the check you talked about.
I ended up with this:

'use strict';

const common = require('../common');
const assert = require('assert');
const http = require('http');

const server = http.createServer(function (req, res) {
  res.writeHead(200);
  res.end();
});

server.listen(0, common.mustCall(() => {
  const req = http.get({ port: server.address().port }, (res) => {
    assert.strictEqual(res.statusCode, 200);
    server.closeIdleConnections();
    server.close();
  });

  req.end();
}));

// This timer should never go off as the server will close the socket
setTimeout(common.mustNotCall(), 10000).unref();

Note that .closeIdleConnections is needed otherwise .close will not act on existing connections.
Do you agree?

@mcollina
Copy link
Member Author

I think we should change the behavior of httpServer.close() to closeIdleConnection() by default if we want to do this. There is quite a lot of code out there that is not using closeIdleConnections() and that will now wait 4s before closing the process.

In other term, the following should pass:

'use strict';

const common = require('../common');
const assert = require('assert');
const http = require('http');

const server = http.createServer(function (req, res) {
  res.writeHead(200);
  res.end();
});

server.listen(0, common.mustCall(() => {
  const req = http.get({ port: server.address().port }, (res) => {
    assert.strictEqual(res.statusCode, 200);

    res.resume();
    server.close();
  });

  req.end();
}));

// This timer should never go off as the server will close the socket
setTimeout(common.mustNotCall(), 1000).unref();

@ShogunPanda
Copy link
Contributor

I've just added the test above in the PR.

trentm added a commit to elastic/apm-agent-nodejs that referenced this issue Jul 4, 2022
… to use keep-alive

As of nodejs/node#37184 node v19 and later
defaults the http,https.globalAgent to use keep-alive:

  - globalAgent: new Agent()
  + globalAgent: new Agent({ keepAlive: true, scheduling: 'lifo', timeout: 5000 })

This change in default broke a couple tests. The agent itself is
unaffected because our http client already uses a custom http agent
using keep-alive.
trentm added a commit to elastic/apm-agent-nodejs that referenced this issue Jul 4, 2022
… to use keep-alive (#2803)

As of nodejs/node#37184 node v19 and later
defaults the http,https.globalAgent to use keep-alive:

  - globalAgent: new Agent()
  + globalAgent: new Agent({ keepAlive: true, scheduling: 'lifo', timeout: 5000 })

This change in default broke a couple tests. The agent itself is
unaffected because our http client already uses a custom http agent
using keep-alive.
mabaasit pushed a commit to mabaasit/node that referenced this issue Jul 6, 2022
PR-URL: nodejs#43522
Fixes: nodejs#37184
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
@nickserv
Copy link

Is this supported when using Node's experimental Unidic-based fetch?

@ShogunPanda
Copy link
Contributor

@nickmccurdy I think it should in the affected Node's version, which is 19.

@vvo
Copy link

vvo commented Apr 2, 2024

Hey there, I cannot find any mention in the Node.js documentation that this is now the default. Is this still on? Thanks!

@ShogunPanda
Copy link
Contributor

@vvo https://nodejs.org/dist/latest-v20.x/docs/api/http.html#httpglobalagent - If you expand the history you will see the change.

@alexkli
Copy link

alexkli commented Jun 6, 2024

There is an unfortunate consequence of this default change: if keep-alive is on, and client code making requests is not consuming the response body in all cases (e.g. including error responses), the nodejs process will hang (!) on HTTP/1.1 connections.

See adobe/fetch#472 and this comment for a simple reproducable test case with no extra dependencies.

This sounds a bit concerning to me because:

  • In my experience, it is fairly common that code might read response bodies in successful cases (e.g. 2xx), but not in error cases or 3xx redirects, where it might only care about headers returned.
  • Unit tests might not catch this if they mock responses.
  • Existing nodejs applications in production might only slowly roll out node 20+ now. I wouldn't be surprised if the majority of potentially affected code out there has no idea yet.

The way this might manifest is that you first don't see obvious issues but only if you get higher request error rates in production (for whatever reason) and you wonder why your servers or serverless functions start to hang / or throughput going down.

I don't know what the best remedy is. One could argue that this default change (for convenience reasons IIUC) might be a too risky breaking change - given the entire nodejs process will hang.

cc @ShogunPanda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants