Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agentkeepalive spurious failure #187

Open
ronag opened this issue Mar 31, 2019 · 8 comments
Open

agentkeepalive spurious failure #187

ronag opened this issue Mar 31, 2019 · 8 comments
Labels
discussion The decision process is still ongoing question Further information is requested

Comments

@ronag
Copy link
Member

ronag commented Mar 31, 2019

I would like to give some attention to a long standing issue in a very popular ecosystem module agentkeepalive (1.5M weekly downloads).

The issue has made no progress for the past year and probably causes a lot of weird, spurious and intermittent failures in the wild where connections can reset just at the start of requests.

We confirmed this ourselves a few months ago and after being unable to fix it we stopped using the module.

I think the community needs a little help on this one.

@tony-gutierrez
Copy link

For the love of god please yes. Can recycling free sockets with a timeout option be added to the native agent? I hate depending on this library.

In environments with low and slow socket availability (Azure App Services being one of the worst), Node apps are basically unusable at any decent traffic level without proper socket recycling.

@sam-github
Copy link

@tony-gutierrez

Can recycling free sockets with a timeout option be added to the native agent?

Node.js feature requests should get raised on https://github.com/nodejs/node/issues, I don't think doing it here is going to attract much attention.

@wesleytodd
Copy link
Member

I think we can add this to list for phase 2 as documented here #143

@Eomm Eomm added discussion The decision process is still ongoing question Further information is requested labels Aug 31, 2019
@phawxby
Copy link

phawxby commented Sep 16, 2019

@ronag what did you replace it with? We're suffering the same thing.

@ronag
Copy link
Member Author

ronag commented Sep 16, 2019

@phawxby we use the native keepalive agent.

@phawxby
Copy link

phawxby commented Sep 16, 2019

Thanks. I'll give that a go and hope Azure doesn't explode.

@tony-gutierrez
Copy link

Agentkeepalive has a ttl for the socket from the moment of creation. This is useful because a lot of Microsoft servers close sockets after 2 minutes (despite reuse, keep alive, etc. No socket can ever live more than 2 min). The native agent does not have this kind of ttl.

@MacL3an
Copy link

MacL3an commented Mar 3, 2020

Not sure if this is the same problem, but I'll give it a go anyway:
We have a NodeJS app running in a Docker container (Node Alpine) in a Linux App Service. During high load we get super bad response times between the NodeJS app and our Java backend. According to the Azure portal we are suffering from SNAT Port Exhaustion, where I can see we are trying and failing to open >1000 simultaneous requests.

I have tried both agentkeepalive and the native Agent using HTTP KeepAlive and setting MaxSockets to 160, which I would have thought should limit the no of requests to 160 and not >1000?
This is our code (using the Got HTTP library):

  const agentOptions = {
    maxSockets: 160,
    maxFreeSockets: 10,
    keepAlive: true,
    timeout: 30000
  }
  const httpAgent = new http.Agent(agentOptions)
  const httpsAgent = new https.Agent(agentOptions)

  const gotClient = got.extend({
    headers: headers,
    agent: {
      http: httpAgent,
      https: httpsAgent
    }
  })

  const client = {
    get: async (url = '') => {
      const response = await gotClient.get(url, {
        headers,
        json: true
      })
      return response.body
    }
   ...
}

Is there anything I'm missing? Why are we aren't the connections being re-used and why are trying top open >1000 new requests? I've also tried to find a way to log the no of outgoing requests but haven't found a way.

@tony-gutierrez , is this similar to your problem? How did you end up solving them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion The decision process is still ongoing question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants