Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Centralized package repository that support multiple versions of a package #25581

Closed
KSXGitHub opened this issue Jan 19, 2019 · 28 comments
Labels
feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem.

Comments

@KSXGitHub
Copy link

KSXGitHub commented Jan 19, 2019

Problem

Projects that use Node.js likely also use npm packages, and thus contain a node_modules folder. Having multiple projects like this leads to having multiple node_modules folder which likely contain duplicated packages. This not only wastes disk space, it also wastes bandwidth and time to install these packages.

(Other platforms (such as Haskell, Rust, Java) avoid this by having a centralized package repository, I was quite surprised at Node's design decision)

Description

  • When user executes require("pkg-name"), if "pkg-name" is not found in module.paths, Node.js should proceed to search in a fixed location (let's call it $PREFIX/.node_package_store until we find a better name) for a "pkg-name" that matches criteria specified in a manifest file within the project (preferably but not necessary package.json).
  • $PREFIX/.node_package_store is not a node_modules and $NODE_PATH (i.e. module.paths) does not affect it.
  • $PREFIX/.node_package_store should support multiple runtime versions (es5, es6, nodejs versions, bundlers, etc.), multiple package versions and multiple registries.

Example structure of a .node_package_store

When use npm to install React from registry.npmjs.org

$PREFIX/.node_package_store
└── [email protected]
    └── react
        └── 16.0.0
            ├── content
            └── metadata

Alternatives I've considered

  • /node_modules, $HOME/node_modules and the likes: Does not support multiple versions, as a result, different projects still require different/separated node_modules.
  • I can create a loader myself, but package managers (npm, yarn) don't support it, so it is useless.
  • I can use pnpm — a package manager that uses hardlinks and symlinks to solve this problem, but it also makes things more complicated.
@bnoordhuis bnoordhuis added module Issues and PRs related to the module subsystem. feature request Issues that request new features to be added to Node.js. labels Jan 21, 2019
@bnoordhuis
Copy link
Member

This not only wastes disk space, it also wastes bandwidth and time to install these packages.

That's up to the package manager (npm, yarn, etc), that's not really within node's remit. The popular ones cache packages locally however (e.g. $HOME/.npm)

The disk space argument isn't that strong in this age of multi-TB hard drives. (It's come up before.)

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 21, 2019

@bnoordhuis

That's up to the package manager (npm, yarn, etc),

What could a package manager do if node does not load modules that the package manager installed? (I'm refering to a centralized package repository)

The popular ones cache packages locally however (e.g. $HOME/.npm)

The popular ones copy files from cache to local node_modules.

The disk space argument isn't that strong in this age of multi-TB hard drives. (It's come up before.)

Why handicap your users? You do you assume/demand every user to have TB of disk space? Even if they have TB of disk space, why prevent them to use that space more efficiently? Efficiency is a feature, your counter-argument is not as strong.

(BTW, there's a recurring meme portraying node_modules to be more massive than a black hole, so you know how strong of an argument this is)

(It's come up before.)

So I guess it has been there for ages. It's good to know that I'm not the only one who wants this.

@sam-github
Copy link
Contributor

npm is experimenting with approaches to this in some of their recent tooling, IIRC, so I'm not sure this is an area that package managers can't innovate in.

Most languages only allow a single version of a package to be used at a time, process wide. Node's approach of allowing multiple versions of a dependency, possibly multiple incompatible versions, to all exist and be used in the same process is not so common, and makes the search you describe much more complex.

Have you taken a shot at implementing this, or are you hoping to motivate someone else to do it?

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 21, 2019

npm is experimenting with approaches to this in some of their recent tooling

That approach still wastes disk space and bandwidth.

Most languages only allow a single version of a package to be used at a time, process wide.

This isn't true with Rust (Cargo) and Haskell (Hackage). Rust uses semantic version (just like npm packages), which has major version indicating backward incompatibility. Haskell also has its own versioning system that support backward incompatibility. What this means is, their compiler has to pick the correct version specified by manifest files.

to all exist and be used in the same process is not so common

What would be the problem?

@KSXGitHub
Copy link
Author

Another benefit this feature would provide is implementing a package manager should become far easier.

@sam-github
Copy link
Contributor

Have you taken a shot at implementing this, or are you hoping to motivate someone else to do it?

@richardlau
Copy link
Member

* When user executes `require("module")`, if `"module"` is not found in `module.paths`, Node.js should proceed to search in a fixed location (let's call it `~/.node_package_store` until we find a better name) for a `"module"` that matches criteria specified in a manifest file within the project (preferably but not necessary `package.json`).

* `~/.node_package_store` is **not** a `node_modules` and `$NODE_PATH` does not affect it.

* `~/.node_package_store` should support multiple runtime versions (es5, es6, nodejs versions, bundlers, etc.), multiple package versions and multiple registries.

Some notes:
module.paths already contains three global fallback locations at the end ($HOME/.node_modules, $HOME/.node_libraries and $PREFIX/lib/node) from which it will attempt to load modules (i.e. if not found earlier in module.paths):
https://nodejs.org/dist/latest-v11.x/docs/api/modules.html#modules_loading_from_the_global_folders
In practice I'm not aware of any current package managers that would actually install into those locations (e.g. npm is hard-coded to assume the folder is called node_modules). Lack of package manager support is going to be the biggest issue with any changes to the module resolving algorithm.

Packages and modules are not the same and Node.js currently doesn't interpret any package metadata other than the main field (e.g. it doesn't even look at the version field in package.json). It has no idea if a module (e.g. an addon) is compatible before attempting to load it).

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 22, 2019

module.paths already contains three global fallback locations at the end ($HOME/.node_modules, $HOME/.node_libraries and $PREFIX/lib/node) from which it will attempt to load modules (i.e. if not found earlier in module.paths):
https://nodejs.org/dist/latest-v11.x/docs/api/modules.html#modules_loading_from_the_global_folders

They are basically node_modules with different names.

Lack of package manager support is going to be the biggest issue with any changes to the module resolving algorithm.

Once node add this feature (behind an experiment flag), I'm pretty sure package managers will start supporting this.

Packages and modules are not the same

I've updated my comment, thanks for pointing this out.

Node.js currently doesn't interpret any package metadata other than the main field (e.g. it doesn't even look at the version field in package.json).

Node.js doesn't have read version field in package.json, version value can be in package path. However, Node.js still has to read dependency list from a manifest file (which might be package.json or a lock file).

It has no idea if a module (e.g. an addon) is compatible before attempting to load it).

I've removed "multiple runtime versions (es5, es6, nodejs versions, bundlers, etc.)" part from the first comment.

@bnoordhuis
Copy link
Member

I'd say the general consensus is that even reading "main" was, in retrospect, a mistake. It's unlikely Node.js will start parsing more of package.json, or any other file.

The current system works well enough; your proposal is at best a marginal improvement, worst case it's a regression because it slows down the common case. A lot of effort has been sunk in making the module loader fast.

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 22, 2019

worst case it's a regression because it slows down the common case.

Node.js can hide this feature behind a flag. Even when enabled, Node.js will always try to load from module.paths first. Beside, it's not like there's anyone with a sane mind would try calling 1000 require() in a for loop.

@bnoordhuis
Copy link
Member

I think you might be underestimating how many modules some apps require. :)

One I work on loads over 1,400 modules at startup and it's not even that big and enterprise-y.

@KSXGitHub
Copy link
Author

@bnoordhuis

I think you might be underestimating how many modules some apps require. :)

One I work on loads over 1,400 modules at startup and it's not even that big and enterprise-y.

An act of loading a module comprises of 3 steps: (1) resolving module path, (2) reading module file into memory, and (3) "eval"-ing content of the file. How significant can step (1) be compared to the rest?

@KSXGitHub
Copy link
Author

@sam-github

Have you taken a shot at implementing this, or are you hoping to motivate someone else to do it?

I will try creating a loader in form of a npm package as a proof of concept when I have time. I will not touch the Node.js repo itself.

@edmorley
Copy link

@KSXGitHub you may be interested in Yarn's 'Plug and Play' feature (RFC) which effectively turns Yarn's package cache into the central package store, via use of a custom resolver.

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 22, 2019

@edmorley It is yarn-specific, and it requires changing require.resolve algorithm either by adding code to enable pnp (which is limited) or changing require.resolve itself (which is what this issue about).

@ljharb
Copy link
Member

ljharb commented Jan 22, 2019

what about the use case for npm link, or being able to edit files locally on disk without affecting other projects? Certainly for the simple cases you could use the package manager to install a copy in node_modules, but when the package you wish to edit is a singleton or part of a plugin ecosystem (like that of Resct, eslint, babel, etc), what happens?

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 22, 2019

what about the use case for npm link

npm link is to link local packages that isn't in npm registry and thus only works in one machine. Share your repo over GitHub and it won't work on someone else's.

npm install does not invoke npm link.

or being able to edit files locally on disk without affecting other projects?

This feature does not prevent user from using good 'ol node_modules.

@ljharb
Copy link
Member

ljharb commented Jan 22, 2019

again, though, that all seems like things the package manager, not the platform, need to address. node already has a mechanism to require from a central place, if the package manager installs there.

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 22, 2019

@ljharb

node already has a mechanism to require from a central place, if the package manager installs there

That central place does not support multiple versions of the same package.

@ljharb
Copy link
Member

ljharb commented Jan 22, 2019

ah, that’s a fair point.

@KSXGitHub KSXGitHub changed the title Suggestion: Centralized package repository Suggestion: Centralized package repository that support multiple versions of a package Jan 22, 2019
@devsnek
Copy link
Member

devsnek commented Jan 22, 2019

That's still just a packaging thing... if i npm i thing@1 then it can create ~/.npm/packages/thing/1.0.0/... and link it locally. then if i install thing@2 somewhere else it could create ~/.npm/packages/thing/2.0.0/... and link it.

The key here is that versions are tied to distribution, not loading. Node itself doesn't know what the thing its loading is version 1 or version 2, nor should it need to. These systems always end being tied to the package manager, not to the runtime.

@KSXGitHub
Copy link
Author

KSXGitHub commented Jan 22, 2019

@devsnek

The key here is that versions are tied to distribution, not loading. Node itself doesn't know what the thing its loading is version 1 or version 2

Versions in Cargo (Rust) and Cabal (Haskell) are tied to loading. It is Node that is being unconventional. Node can read manifest files to learn about versions.

nor should it need to.

...unless there's gain in doing it.

@devsnek
Copy link
Member

devsnek commented Jan 22, 2019

Versions in Cargo (Rust) and Cabal (Haskell) are tied to loading. It is Node that is being unconventional. Node can read manifest files to learn about versions.

Cargo and Cabal are not the language, they are package managers. the rust compiler doesn't know or care about the version being used, it just uses whatever linking information cargo gives it when you use cargo build. it's the same story with ghc (haskell), which just grabs whatever module happens to be there, while cabal actually deals with the versioning of it. This is the same story with java, python, c++, c, etc.

@seishun
Copy link
Contributor

seishun commented Jan 23, 2019

wastes disk space

When I first used Node.js, I really liked how once I'm done with a project, I can just delete the whole directory and get rid of all the modules I installed for it in one swoop. It felt like a huge improvement compared to Python, where you can easily end up with dozens of globally installed packages that you can't remove because you don't know if some script somewhere depends on them.

So your argument about disk space is a double-edged sword. Sure, you will have multiple copies of the same module if you have many projects depending on it. But I'd rather have many copies of a module that are actually used, rather than a pile of globally installed packages that might be long obsolete.

@bnoordhuis
Copy link
Member

It doesn't seem like there is broad acceptance for this proposal. I'm leaving it open for now but unless something significant happens, I'll close it out in a few days.

An act of loading a module comprises of 3 steps: (1) resolving module path, (2) reading module file into memory, and (3) "eval"-ing content of the file. How significant can step (1) be compared to the rest?

@KSXGitHub Enough that several people (including yours truly) invested plenty of time in making it faster. Take a look at the history of lib/module.js and src/node_file.cc; the commit logs are informative.

@KSXGitHub
Copy link
Author

@seishun

  1. You are not obligated to use the global store. You can use node_modules if you want to.

  2. If you use latest version of npm, chances are you forget about npm caches. Same goes for yarn. If you don't mind caches, why do you mind global repository? If you can delete caches, what prevents you from doing the same to global repository?

@ljharb
Copy link
Member

ljharb commented Jan 24, 2019

Because you’re not requiring from the cache (also, the cache has existed for many years; it’s not new in either npm or yarn) - the cache just makes installs faster. This makes the cache always safe to delete, since it can’t possibly have any negative effect except making the next installs take longer.

@bnoordhuis
Copy link
Member

Per my previous comment, I'll go ahead and close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem.
Projects
None yet
Development

No branches or pull requests

8 participants