-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shallowness of git fetching #9402
Comments
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-11-20-nix-team-meeting-minutes-105/35902/1 |
I'll start working on the benchmarks and post here as soon as I have some results. |
Git shallow fetching benchmarkGoalsCollect data to decide if fetching the git history can be optimized so that it becomes viable to have this as the only strategy for fetching git repos, or if it is better to make fetching history conditional via a flag. The two variants that are raced against each other in this benchmark:
Using the git cli tool:
Setup of the benchmarkRepo: nixpkgs Remotes:
Tested operations (measured in seconds):
Test Machines:
Measured properties
Computed properties
Resultshttps://docs.google.com/spreadsheets/d/1yLo4XMCKj-4xZVHBz2z4BwxBWI76xjVe7gtNgaNG7co/edit?usp=sharing Reproducenix build github:DavHau/nix-git-research#benchmark-fod -L ObservationsSupport for commit fetching variesFetching the commit history in an optimized way requires using a Overhead of Variant 2 is significantEven in the best cases, fetching the commit history slows down the fetching by at least a factor of 3.7. For example, from a cloud VM fetching a nixpkgs revision from github takes 8 seconds, while fetching the commit history and computing the revCount takes an additional 24 seconds, which is a slowdown by factor of 4. In the case of nixpkgs, the number of objects that need to be fetched for the history is far larger than the number of tree objects for a single revision. It's ~500k objects for the history vs. ~60k for the tree. Inconsistent overhead of Variant 2While fetching the tree gives relatively consistent results, fetching the commit history is fast on some runs, but much slower on others, even for the same provider. This was only experienced with github on the 5G connection, and not from within the cloud. It might be an outlier, but it might be possible that github doesn't serve all kinds of requests with the same priority. OpinionMy opinion after these observations is that we should go with approach 2/ii from above, meaning the history fetching becomes optional and with it the revCount attribute as well. Always fetching the history eagerly as described in 2/i would impose:
Roadmap Proposal
After the nix team adopts and/or confirms the plan, I can start working on it. EDIT: an alternative plan could be to first optimize the current case, which means adding filtering support for the commit history right away. If libgit2 supports this, then this might be easy to do and it already provides quite a significant performance gain in many cases. |
Thanks for the detailed analysis @DavHau ! Pursuing 2.ii seems to be the right call indeed given your benchmarks (I didn't know about The roadmap makes sense too, although I wouldn't be too strict about the ordering of the steps (different people will probably have different priorities here). I'm not sure about the |
I was on holiday and will pick this up now. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-12-15-nix-team-meeting-minutes-112/38155/1 |
Thanks to @DavHau for bringing up these issues in #9240 & #9376
TODO write problem
Tentative solution (two alternatives):
treeless
route if reasonably fast and supportedrevCount
supported because have commit historytreeless
clones locally rather than traditionalshallow
to not hit this restriction.revCount
)fetchTree
is norevCount
, but forfetchGit
for backwards-compat we have to keeprevCount
by default.--depth 1
fetchFromGithub
and friends if the performance is good enoughWith either version of (2) we never need to fetch any trees of other commits. The only question is whether we fetch all the commits and support
revCount
.The text was updated successfully, but these errors were encountered: