Zero install can lead to runaway repo size, doing permanent damage #4845
Replies: 7 comments 13 replies
-
Post discussion thread (github discussions does not allow comments on the original post) |
Beta Was this translation helpful? Give feedback.
-
Note that I've expanded #4839 to include the remediations I could think of (partial clone, sparse checkout, history pruning, lfs), with their pros and cons. I might have missed some 🤔 |
Beta Was this translation helpful? Give feedback.
-
I wonder if anyone has had success making the yarn cache a submodule? That way you don't need to rewrite the main repo's history, only the cache repo's history. A downside is that submodules can be annoying to use, and a user would have to remember to do things like |
Beta Was this translation helpful? Give feedback.
-
Since nobody is saying it, one great option to avoid this problem (assuming you haven't already created it) is simply not checking in the cache! You could do this either by adding The only significant drawback of this approach is "the left-pad problem". It is possible for npm authors to delete their packages. Few people do this, but if it happens you may be left without a copy of package you depend on. For this reason it is advisable for the most risk-averse developers like major corporations to configure their package registry to be a caching proxy of the main npm package registry. Doing this will (may) require you to run |
Beta Was this translation helpful? Give feedback.
-
An alternative that I'm personally working on making viable is storing your transpiled code in git. This is one step short of zero-install, and while you could do both, I'd recommend against it. While checking in transpiled code may make your repo some constant size factor larger than it would have been, it will not create exponential size runaway like zero-install can, and it still has significant benefits:
The biggest drawback to the approach is that it requires some tooling that doesn't exist yet in order to avoid making a mess of your repo. I am working on building that tooling as macrome. |
Beta Was this translation helpful? Give feedback.
-
Does committing uncompressed files behave different over time? Right now a small change in a package results in a new file but when committing package files uncompressed this is not the case? Also git does some compression out of the box perhaps that's enough? Perhaps the additional storage at the start becomes worth it over time by having smaller diffs on package upgrades? I guess the main problem in this case would be minified files which I don't think git can diff because it compares line-by-line if I remember correctly. Although it's probably smarter than that I really don't know 😁 Or another way might be to make git aware of how to diff the archives: https://stackoverflow.com/a/8001900/1918818 |
Beta Was this translation helpful? Give feedback.
-
While fiber optic connections are on the rise, upload speed over cable connections is going to be a bottle neck for many years to come. By uploading our own copies of package zip files, via git, we've committed a horrendous sin when it comes to optimal networking. Not only is that upload going to be 100x slower than the download, it causes everyone else to need to download the pseudo-unique copy we uploaded. I think having a caching npm proxy that never clears the cache would be a fair solution. |
Beta Was this translation helpful? Give feedback.
-
Yarn recommends zero-installs. On the zero installs page, runaway repository size is not mentioned as a drawback of zero installs, but this is a condition encountered by some people who have opted into zero installs for repos the meet certain conditions such as having large dependencies and/or numerous dependencies which change regularly. If such a repo opts into zero installs, repo size may become orders of magnitude larger than it would otherwise be, making full repo clones inconvenient or even prohibitively costly, depending on the available network and storage resources.
This problem cannot be truly fixed without eliminating or rewriting the entire history of the repository, both of which are options that cause significant disruptions to development, e.g. by destroying the bases for existing branches.
Does yarn have any responsibility to help people avoid this pitfall or help them recover if they encounter it?
Note: Existing discussion of the topic has occurred on the now-locked #180.
Beta Was this translation helpful? Give feedback.
All reactions