Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Npm opens many connections when installing #7072

Closed
2 tasks done
Larsjep opened this issue Dec 11, 2023 · 40 comments · Fixed by #7324
Closed
2 tasks done

[BUG] Npm opens many connections when installing #7072

Larsjep opened this issue Dec 11, 2023 · 40 comments · Fixed by #7324
Assignees
Labels
Bug thing that needs fixing Priority 0 will get attention right away Release 10.x

Comments

@Larsjep
Copy link

Larsjep commented Dec 11, 2023

Is there an existing issue for this?

  • I have searched the existing issues

This issue exists in the latest npm version

  • I am using the latest npm

Current Behavior

My project needs around 2000 packages. When running "npm install" it starts out by opening 2000 connections and then when all the connections are open it downloads 1 package on each connection.

Is this intended?, because it causes a quite high load on our package proxy. Is there a way to limit the number of connections?

With npm 9.9 it only uses 18/20 connections.

Expected Behavior

"npm install" only opens a few connetions to the package server.

Steps To Reproduce

  1. Clear cache
  2. npm install or npm ci

Environment

  • npm: 10.2.5
  • Node.js: 18.19.0
  • OS Name: Windows 11
  • System Model Name:
  • npm config:
registry = "http://fosspackages/npm/default-npm/"
save-exact = true
save-prefix = ""

; node bin location = C:\Users\lj\.nvm\versions\node\v18.19.0\bin\node.exe
; node version = v18.19.0
; npm local prefix = C:\work\Dexter\web-client
; npm version = 10.2.5
; cwd = C:\work\Dexter\web-client
; HOME = C:\Users\lj
; Run `npm config ls -l` to show all defaults.
@Larsjep Larsjep added Bug thing that needs fixing Needs Triage needs review for next steps Release 10.x labels Dec 11, 2023
@Larsjep
Copy link
Author

Larsjep commented Dec 12, 2023

I noticed that there is a config option called maxsockets. But the default is 15 and
npm config get maxsockets also returns 15.
So if this should limit the number of connections it doesn't seem to work in npm 10.x.

@MartinFalatic
Copy link

@Larsjep Just curious what npm config get agent shows. Supposedly if that's set, maxsockets will be ignored in npm-registry-fetch.

Could also be that's something that's buried in NPM and is somehow overridden in some other way, but it caught my eye when I searched the org for maxsockets.... might also help narrow down the problem at least.

@Larsjep
Copy link
Author

Larsjep commented Dec 20, 2023

@MartinFalatic npm config get agent returns undefined

I have also looked in the code base, but I can't find anywhere it refers to the maxsockets settings.
I can find it in the 9.x codebase, but the download part seem to radically changed in 10.x

@rosen-dimitrov
Copy link

rosen-dimitrov commented Dec 20, 2023

+1 here.

We are using self-hosted runners(ARC) on our own cluster and when using npm 10 we noticed a lot of dropped packets due to NAT ports exhaustion all coming from the arc runner executing npm install with npm 10. We have dynamic port allocation, which means that npm creates a lot of connections in a burst at the same time before the new ports can be allocated to the worker node running the runner pod.

Npm 9 does not seem to have this issue. Looks an issue with the new @npmcli/agent http client introduced in npm 10.

What is also interesting is that the workflow does not fail. npm seems to use whatever number of connections it managed to establish to install all packages.

@mirayashi
Copy link

Hello there,

I would like to add that this is actually a bigger issue for very large projects (3000+ dependencies):

  • CPU and RAM usage is a lot higher than with npm 9
  • It is a lot slower than npm 9
  • npm install randomly fails with ECONNRESET and ETIMEDOUT errors (even more consistently on Windows), and seems to ignore most of .npmrc options (maxsockets, fetch-retries, etc)

@Th3S4mur41
Copy link

We're affected by the same issue...
Our Jenkins CI is constantly failing to install private dependencies from GitHub (through our Proxy).

Two potential workarounds are disabling the proxy (can't do that in corporate environment) or downgrading to npm 9

@silverwind
Copy link

silverwind commented Feb 5, 2024

It seems the maxsockets option (default value 15) has no effect at all in npm@10, at least in situations where there is no proxy involved.

When I run a npm install with around 1k dependencies inside a local VMWare VM, I'm consistently hitting a bug in VMWare where npm exhausts the CPU resources of vmnet-natd, which leads to npm hanging for minutes and finally terminating with random network errors. Downgrading to npm@9 completely resolves it.

@ajones513
Copy link

+1 here
My company is having to hold back NPM 10 because of this issue when connecting to a private self-hosted registry. To help with our decision making, is there any indication when this might be triaged?

@greg-md
Copy link

greg-md commented Feb 21, 2024

Same issue encountered on our vm lima. After upgrading to node v20 and npm v10, npm install freezes the entire vm. Downgrading to v9 fixed the issue.

@llimllib
Copy link

llimllib commented Feb 23, 2024

We are seeing about half of our npm ci runs with npm version 10 on github actions fail; downgrading to npm version 9 speeds up install by about 45 seconds (2m45 -> 2m) and doesn't seem to fail at all.

I can reproduce easily so please let me know if I can help debug this somehow

@dharmeshgordhan
Copy link

Potentially being looked at under #7272.

@melroy89
Copy link

melroy89 commented Mar 10, 2024

My gitlab ci/cd is also very unstable due to this behavior. Often causing time outs due to the large amount of connections. Especially since I have some jobs running in parallel as well.

@neelance
Copy link

neelance commented Mar 16, 2024

So... I went into the rabbit hole... It went very very deep... But I finally found the root cause!

  • npm-reqistry-fetch uses make-fetch-happen, see here.
  • make-fetch-happen uses minipass-fetch, see here.
  • minipass-fetch uses Node's builtin http/https, see here.
  • The maxSockets parameter gets passed through all these layers just fine, no issue here.
  • However, the implementation of http.Agent in Node.js is suboptimal: It checks for maxSockets here, but does not immediately increase the checked value. Instead it calls this.createSocket and assumes the function pushes to this.sockets synchronously. Typical risk for a race condition.
  • Now back up the stack again: make-fetch-happen uses @npmcli/agent, see here
  • @npmcli/agent uses agent-base, see here
  • agent-base subclasses http.Agent and overwrites createSocket with an implementation that is asynchronous, see here.
  • 🫠

@melroy89
Copy link

melroy89 commented Mar 16, 2024

* However, the implementation of `http.Agent` in Node.js is suboptimal: It checks for `maxSockets` [here](https://github.com/nodejs/node/blob/fec7e505fc6f256522eb0031515757f2e58daa61/lib/_http_agent.js#L284), but does not immediately increase the checked value. Instead it calls `this.createSocket` and assumes the function pushes to `this.sockets` **synchronously**. Typical risk for a race condition.

I didn't look very long at the code.

But it seems that this.freeSockets[name].length + freeSockets.length is used to determine the current length of sockets validated again the this.maxSockets option.

This array is increased via: ArrayPrototypePush(this.sockets[name], s); a bit down in the code.

Then there is also something like "max socket count" as well. Which has an easier implementation, the code uses: this.totalSocketCount to keep track of that value, which is validated against the this.maxTotalSocket option.

Anyway there is a lot of complex state keeping going on in the _http_agent.js file. Keep in mind that http_agent.js file is using a local createSocket method, defined below.

@silverwind
Copy link

silverwind commented Mar 16, 2024

I recall similar issue npm/make-fetch-happen#59, but in that case it was only for proxied connections, while the current issue affects definitely affects non-proxied connections.

@neelance
Copy link

neelance commented Mar 17, 2024

Here is a workaround I just applied to our CI system. I am using the request as a placeholder in the this.sockets array.

agent-base.patch

--- a/dist/index.js
+++ b/dist/index.js
@@ -65,6 +65,8 @@
             l.indexOf('node:https:') !== -1);
     }
     createSocket(req, options, cb) {
+        const sockets = this.sockets[this.getName(options)];
+        sockets.push(req);
         const connectOpts = {
             ...options,
             secureEndpoint: this.isSecureEndpoint(options),
\ No newline at end of file
@@ -77,6 +79,7 @@
                 return socket.addRequest(req, connectOpts);
             }
             this[INTERNAL].currentSocket = socket;
+            sockets.splice(sockets.indexOf(req), 1);
             // @ts-expect-error `createSocket()` isn't defined in `@types/node`
             super.createSocket(req, options, cb);
         }, cb);
\ No newline at end of file

We're using Earthly, so I added this to our Earthfile:

  # patch for https://github.com/npm/cli/issues/7072
  COPY agent-base.patch ./
  RUN patch -p 1 -d /usr/local/lib/node_modules/npm/node_modules/agent-base/ < agent-base.patch

@ipikiiskinen
Copy link

What is the status of this issue?
In practice npm, especially in CI systems, is currently creating dos attacks to repository services. In universal repository services such as Artifactory, this affects to all repositories because the service is busy serving floods from npm clients.

I see issue #7272 has been closed as duplicate. Not sure how labels work in this project, but I see #7272 has "Priority 1" label, and if that makes any difference and adds priority, could you please add that label in this issue?

@m0ps
Copy link

m0ps commented Mar 21, 2024

@lukekarrys. Cuz #7272 is closed now (as a duplicate of the current issue), could you please take a look at this?

@melroy89
Copy link

I believe this a big issue in NPM for both individuals, small and large companies alike. Please, consider increasing the priority of this ticket (eg. label as as prio 1: Priority 1 high priority issue or higher).

@lukekarrys lukekarrys added Priority 0 will get attention right away and removed Needs Triage needs review for next steps labels Mar 25, 2024
@wraithgar
Copy link
Member

Luke has created an issue in the upstream package TooTallNate/proxy-agents#299. If this can't be fixed quickly we may also look into mitigating it somehow from the npm codebase.

@dominicfraser
Copy link

Thank you for the help ❤️

@melroy89
Copy link

melroy89 commented Apr 2, 2024

Indeed. Thank you so much! ❤️ Appreciate the prio shift, I know you are very busy.

@dominicfraser
Copy link

Released in https://github.com/npm/cli/releases/tag/v10.5.1

@dominicfraser
Copy link

Updating in Node in nodejs/node#52351

@berkon
Copy link

berkon commented Apr 11, 2024

For all having issues running the workflows in Docker on the GitHub CI, adding these lines to the workflow (which uses Docker) installs the latest NPM version into the Docker container:

      - name: 'Manually install NPM version >= v10.5.1 (fix for ETIMEDOUT)'
        env:
          HUSKY: 0
        run: curl -qL https://www.npmjs.com/install.sh | sh

@melroy89
Copy link

melroy89 commented Apr 12, 2024

I don't understand the down vote of the above workaround, as long as there is no new nodejs version with the latest npm version. There is no other solution to pull in the latest npm package yourself until a new nodejs version is related that fixed this npm issue.

Edit: another reason again to split npm from node. But I digress.

@ljharb
Copy link
Contributor

ljharb commented Apr 12, 2024

You’re supposed to update npm, not just use the version installed with node. Unbundling it would make this problem worse, not better.

@melroy89
Copy link

Wait, so each time I run my CICD jobs in docker (github/gitlab) I need to first update npm?? That doesn't make sense.

@ljharb
Copy link
Contributor

ljharb commented Apr 12, 2024

Yep. Or, you do it in your docker image periodically so your jobs don’t have to also do it.

@melroy89
Copy link

melroy89 commented Apr 12, 2024

We got a bit off-topic now with this discussion. But it's important nonetheless.

But isn't that whole point of docker tags that you version already your docker image to a specific version, the whole point of tagging a version of Docker images. Updating npm each time you run a workflow still doesn't make sense to me. Sorry.

Yes, you could create a new docker image based on the official node image, instead of directly using the official node image. Where you manually periodically update your npm version :. But nobody is really gonna create a separate Dockerfile just for that... Let's be honest here. Plus this will create additional maintenance of your local Dockerfile and keep that also up to date.

Maybe making npm part of your package.json (and package-lock.json) makes much more sense, so it becomes just a dependency, something I still believe it should have been (so not shipped by default). Which allows you to update npm also much easier.
Another benefit would then be do to the lock file and the potential CI/CD NPM cache you have configured, to install npm from the local npm cache (aka .npm) depending again on your CI/CD configuration/solution. Meaning a faster workflow and less strain on the npm infrastructure.

@ljharb
Copy link
Contributor

ljharb commented Apr 12, 2024

I maintain 500+ open source projects, each that have 200+ CI jobs, and they all upgrade npm in every CI run. It's not that big a deal.

@melroy89
Copy link

melroy89 commented Apr 12, 2024

I see. OK, I think it's a bad practice to do it like that. But I understand now fully how it should be done, thanks for sharing your insides. Also don't get me wrong, I love all your open-source projects ;), including Node & npm, qs and more.

I'm just thinking out loud here and understand the full context, I hope you didn't mind. Hopefully others will follow and now update npm as well in all their CI/CD runs as their first step.

EDIT: I checked your CI jobs. I noticed you run nvm install-latest-npm however you are not using Docker I believe. But you use plain GitHub Ubuntu runners, without leveraging docker.

EDIT EDIT: npm install -g npm@latest will use npm which still has this socket bug, so updating npm still fails due to network issues of too many open connections. chicken and egg problem

@ljharb
Copy link
Contributor

ljharb commented Apr 12, 2024

Yes, that's true, you need npm to install npm, but I wouldn't expect it to fail installing npm itself. If it does, then you are indeed stuck using the install.sh script upthread.

@mirayashi
Copy link

Wouldn't just installing the project with npx npm@latest install instead of npm install work as well? It has the advantage of not requiring any extra preparation command or image modification

ermshiperete added a commit to keymanapp/keyman that referenced this issue May 8, 2024
This updates the minimum required node version to Node 20. The latest
version of Node 18 still contains the npm bug
(npm/cli#7072) whereas Node 20 got updated to
a npm version that contains a fix. Node 20 is known to work with our
current code, so this change updates to that as an intermediate step
before we investigate if we can update to Node 22 as discussed for
Keyman 18.
ermshiperete added a commit to keymanapp/keyman that referenced this issue Jun 14, 2024
This updates the minimum required node version to Node 20. The latest
version of Node 18 still contains the npm bug
(npm/cli#7072) whereas Node 20 got updated to
a npm version that contains a fix. Node 20 is known to work with our
current code, so this change updates to that as an intermediate step
before we investigate if we can update to Node 22 as discussed for
Keyman 18.
ermshiperete added a commit to keymanapp/keyman that referenced this issue Jul 4, 2024
This updates the minimum required node version to Node 20. The latest
version of Node 18 still contains the npm bug
(npm/cli#7072) whereas Node 20 got updated to
a npm version that contains a fix. Node 20 is known to work with our
current code, so this change updates to that as an intermediate step
before we investigate if we can update to Node 22 as discussed for
Keyman 18.
sjd78 added a commit to sjd78/tackle2-ui that referenced this issue Aug 22, 2024
Upgrade the project to use nodejs-20 in the `Dockerfile`,
and package.json engine block.  More specific entries
for `.dockerignore` will help keep the `COPY` step quick.

Reasoning:
  - nodejs-18 is in LTS until May 2025, but nodejs-20 will
    be in LTS until May 2026. [^1]

  - node and npm are distributed together, and the
    `ubi9/nodejs-20` container has a newer version of
    npm included (10.7.0) than the current `ubi9/nodejs-18`
    container (10.5.0)

  - Given github issues [^2] and [^3], we can assume that
    the only versions of npm that will be stable in CI
    during the `npm install` phase are `npm@9` and `npm@>=10.5.2`.
    Any npm version <10.5.2 is at severe risk of encountering
    network errors when installing packages, especially when
    running in a qemu environment for non-native architecture
    builds.

  - Being very specific in the `package.json` `engines` block
    will help keep everyone on working versions of npm.

[1]: https://nodejs.org/en/about/previous-releases
[2]: npm/cli#7231
[3]: npm/cli#7072

Signed-off-by: Scott J Dickerson <[email protected]>
sjd78 added a commit to sjd78/tackle2-ui that referenced this issue Aug 22, 2024
Upgrade the project to use nodejs-20 in the `Dockerfile`,
and package.json engine block.  More specific entries
for `.dockerignore` will help keep the `COPY` step quick.

Reasoning:
  - nodejs-18 is in LTS until May 2025, but nodejs-20 will
    be in LTS until May 2026. [^1]

  - node and npm are distributed together, and the
    `ubi9/nodejs-20` container has a newer version of
    npm included (10.7.0) than the current `ubi9/nodejs-18`
    container (10.5.0)

  - Given github issues [^2] and [^3], we can assume that
    the only versions of npm that will be stable in CI
    during the `npm install` phase are `npm@9` and `npm@>=10.5.2`.
    Any npm version <10.5.2 is at severe risk of encountering
    network errors when installing packages, especially when
    running in a qemu environment for non-native architecture
    builds.

  - Being very specific in the `package.json` `engines` block
    will help keep everyone on working versions of npm.

[^1]: https://nodejs.org/en/about/previous-releases
[^2]: npm/cli#7231
[^3]: npm/cli#7072

Signed-off-by: Scott J Dickerson <[email protected]>
sjd78 added a commit to sjd78/tackle2-ui that referenced this issue Aug 27, 2024
Upgrade the project to use nodejs-20 in the `Dockerfile`,
and package.json engine block.  More specific entries
for `.dockerignore` will help keep the `COPY` step quick.

Reasoning:
  - nodejs-18 is in LTS until May 2025, but nodejs-20 will
    be in LTS until May 2026. [^1]

  - node and npm are distributed together, and the
    `ubi9/nodejs-20` container has a newer version of
    npm included (10.7.0) than the current `ubi9/nodejs-18`
    container (10.5.0)

  - Given github issues [^2] and [^3], we can assume that
    the only versions of npm that will be stable in CI
    during the `npm install` phase are `npm@9` and `npm@>=10.5.2`.
    Any npm version <10.5.2 is at severe risk of encountering
    network errors when installing packages, especially when
    running in a qemu environment for non-native architecture
    builds.

  - Being very specific in the `package.json` `engines` block
    will help keep everyone on working versions of npm.

[^1]: https://nodejs.org/en/about/previous-releases
[^2]: npm/cli#7231
[^3]: npm/cli#7072

Signed-off-by: Scott J Dickerson <[email protected]>
sjd78 added a commit to konveyor/tackle2-ui that referenced this issue Aug 27, 2024
Upgrade the project to use nodejs-20 in the `Dockerfile`, and
package.json engine block. More specific entries for `.dockerignore`
will help keep the `COPY` step quick (especially for local dev image
builds).

Reasoning:
  - nodejs-18 is in LTS until May 2025, but nodejs-20 will be in LTS until
    May 2026. [^1]

  - node and npm are distributed together, and the `ubi9/nodejs-20`
    container has a newer version of npm included (10.7.0) than the current
    `ubi9/nodejs-18` container (10.5.0)

  - Given github issues [^2] and [^3], we can assume that the only
    versions of npm that will be stable in CI during the `npm install` phase
    are `npm@9` and `npm@>=10.5.2`. Any npm version <10.5.2 is at severe
    risk of encountering network errors when installing packages, especially
    when running in a qemu environment for non-native architecture builds.

  - Being very specific in the `package.json` `engines` block will help
    keep everyone on working versions of npm.

[^1]: https://nodejs.org/en/about/previous-releases
[^2]: npm/cli#7231
[^3]: npm/cli#7072

Signed-off-by: Scott J Dickerson <[email protected]>
sjd78 added a commit to sjd78/tackle2-ui that referenced this issue Aug 27, 2024
Upgrade the project to use nodejs-20 in the `Dockerfile`, and
package.json engine block. More specific entries for `.dockerignore`
will help keep the `COPY` step quick (especially for local dev image
builds).

Reasoning:
  - nodejs-18 is in LTS until May 2025, but nodejs-20 will be in LTS until
    May 2026. [^1]

  - node and npm are distributed together, and the `ubi9/nodejs-20`
    container has a newer version of npm included (10.7.0) than the current
    `ubi9/nodejs-18` container (10.5.0)

  - Given github issues [^2] and [^3], we can assume that the only
    versions of npm that will be stable in CI during the `npm install` phase
    are `npm@9` and `npm@>=10.5.2`. Any npm version <10.5.2 is at severe
    risk of encountering network errors when installing packages, especially
    when running in a qemu environment for non-native architecture builds.

  - Being very specific in the `package.json` `engines` block will help
    keep everyone on working versions of npm.

[^1]: https://nodejs.org/en/about/previous-releases
[^2]: npm/cli#7231
[^3]: npm/cli#7072

Backport-of: konveyor#2062
Signed-off-by: Scott J Dickerson <[email protected]>
sjd78 added a commit to konveyor/tackle2-ui that referenced this issue Aug 28, 2024
Upgrade the project to use nodejs-20 in the `Dockerfile`, and
package.json engine block. More specific entries for `.dockerignore`
will help keep the `COPY` step quick (especially for local dev image
builds).

Reasoning:
  - nodejs-18 is in LTS until May 2025, but nodejs-20 will be in LTS until
    May 2026. [^1]

  - node and npm are distributed together, and the `ubi9/nodejs-20`
    container has a newer version of npm included (10.7.0) than the current
    `ubi9/nodejs-18` container (10.5.0)

  - Given github issues [^2] and [^3], we can assume that the only
    versions of npm that will be stable in CI during the `npm install` phase
    are `npm@9` and `npm@>=10.5.2`. Any npm version <10.5.2 is at severe
    risk of encountering network errors when installing packages, especially
    when running in a qemu environment for non-native architecture builds.

  - Being very specific in the `package.json` `engines` block will help
    keep everyone on working versions of npm.

[^1]: https://nodejs.org/en/about/previous-releases
[^2]: npm/cli#7231
[^3]: npm/cli#7072

Backport-of: #2062
Signed-off-by: Scott J Dickerson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug thing that needs fixing Priority 0 will get attention right away Release 10.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.