-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetchGit/fetchTree: add support for shallow cloning #9376
base: master
Are you sure you want to change the base?
Conversation
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/stabilising-the-new-nix-command-line-interface/35531/6 |
I believe Another alternative is to fetch all the commits without fetching the corresponding tree. See #5313 (comment) Also we might not need to remove it, if we add support for laziness in |
OK, but what's the use case for it? Just being friendly doesn't seem to be a strong reason to have it.
|
I need to correct myself. I actually did find a solution to get the full commit history but only blobs of the given revision without breaking
|
revCount increases in a predictable way, characters from rev are random. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-11-20-nix-team-meeting-minutes-105/35902/1 |
Discussed in the Nix maintenance meeting. The tentative plan forward is written down in #9402 |
This PR should mainly serve as a POC and a basis for discussions.
It implements a significantly more efficient way of fetching git sources (100x in many cases), hidden behind a flag. But rather than implementing this as an optional feature it would be much better if this became the default as part of the stabilization work on fetchTree. If I get green light by the nix team I'm happy to adapt these changes and make it the default for fetchTree.
fixes #5119
fixes #4455 (This was closed, but not fixed, probably due to a misunderstanding, see the
'shallow = true' option confusion
paragraph below)Motivation
TL;DR;
In lang2nix scenarios it is common to have lock files with entries pointing to git repos. Usually git revisions sha1 hashes are used to pin down such sources and verify their integrity. Fetching these lock file closures via nix is currently rather expensive, because the fetchGit/fetchTree builtins do not support shallow fetching for most git repos.
For example fetching a single revision of the php static analyzer
phpstan
currently requires fetching 4.17 GiB from the network, while with this PR, it only requires fetching 13.40 MiB, which is a reduction by ~99.7%.The 'shallow = true' option confusion
The current fetchGit/fetchTree builtin already offers a
shallow
flag, but this only allows fetching remote shallow repos. It does not support shallow fetching from non-shallow remote repos. This has been confusing to a number of users, because they expected theshallow
flag to behave similar to anix clone https://example.com --branch $ref --depth 1
. It's not obvious at first glance, but this behavior wouldn't make much sense. It would not be reproducible, because the selected$ref
is not static and might progress over time on the remote. The number for--depth {number}
would have to be increased over time in order to still reach the same desired revision.Fixing the problem
In order to really do a shallow fetch from a non-shallow remote, we have to fetch shallowly by revision ignoring any
ref
. Git supports this since version 2.5, but the nix builtin doesn't support it yet, as it always enforces fetching viaref
(nix will automatically select aref
if the user didn't specify one). This PR fixes that by introducing a new optionshallowRev = true
, which enforcesref
being unset and makes git fetch only the specified revision usingfetch --depth 1
on the specific revision sha1 hash.The way forward
Shallow cloning results in the exact same output in the nix store compared to full cloning. The contents can still be verified either via the revision sha1 hash, or a narHash. The only problem why shallow fetching can not become the default right now is because it cannot reliably return a
revCount
. As a result, it would break the API.Therefore, my question to the nix team: What was the original motivation behind having a
revCount
? It doesn't seem to provide any guarantees. It is not needed for verifying the file contents either. The simplest fix seems to deprecaterevCount
infetchTree
, or make it optional. Is this change feasible? This way, shallow cloning can be made the default.Considerations
There is no guarantee if a remote supports or allows shallow cloning by revision. Therefore if shallow cloning became the default, a fallback to full cloning should be implemented which triggers whenever shallow cloning fails.
Also, the current POC implementation of this PR does not cache the git repos, which doesn't hurt mach, as the amount of data transmitted is already much lower than before. But caching could be added here as well. Even if all fetching is done shallowly, caching can still be beneficial, because two different revisions can still contain identical files, in which case re-downloading these git blobs could be omitted. Though, I'm not sure how smart the git protocol really is in omitting unnecessary objects during transfer. Maybe it's not worth it. I can look into this more.
Priorities
Add 👍 to pull requests you find important.