-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsmudge fetchGit for reproducibility and security #4635
Conversation
446ca7c
to
f8fb7f8
Compare
@edolstra This is ready for review now. |
f8fb7f8
to
387f324
Compare
Rebased and updated the docs and test of #4676 to reflect the actual behavior when the |
387f324
to
e9e14d8
Compare
Given this proposal ignores git filters, how might Git LFS be supported I am quite new to Nix and not sure if I understand you correctly, so I
Therefore, you propose never to include LFS files at
The big advantage of For instance, I have some cases where LFS is used during the build How would you imagine { stdenv }:
stdenv.mkDerivation {
name = "myPackage";
src = builtins.fetchGitLfs {
src = builtins.fetchGit {
url = "git+ssh://[email protected]/my-secret/repository.git";
ref = "master";
rev = "adab8b916a45068c044658c4158d81878f9ed1c3";
};
};
} How would a |
It seems that we can choose between two directions. @edolstra which direction do you think we should take? In this PR I haven't chosen a specific direction. In the simple case, it's like (a), but for submodules it behaves more like (b). @panicgh Regardless of which direction, it seems more consistent to add an |
Understood. So there would be an optional control flow within Maybe an implementation could wrap the default LFS smudge/process hooks (modifying repo/.git/config?) and replace the LFS files with sym-/hardlinks to /nix/store? It may be necessary to disable the global/user-wide LFS config when
Maybe submodules could also be refactored out to separate /nix/store entries: instead of relying on git to clone submodules, |
auto ss = std::stringstream{result.second}; | ||
StringSet filters; | ||
|
||
for (std::string line; std::getline(ss, line, '\n');) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git config values can have multiple lines (try git -c $'foo=two\nlines' config -l
to see for yourself), so splitting lines here can cause incorrect results. I suggest to use git config --name-only --get-regexp '^filter\..*\.(smudge|process)$'
to avoid this issue (config keys cannot contain newlines so splitting on newlines is now acceptable) and, as a bonus, only return relevant names.
Additionally, it isn't necessary to set the smudge filter to cat
. If the filter is set to an empty string then no filtering is done.
With these two changes, noSmudgeOptions
can be constructed by just appending =
to the names returned from the git config
invocation.
I think a pathway to explicit git-lfs support can be found by avoiding the smudge filters as done here and then, when selected, running |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/using-git-crypt-with-flakes/15372/5 |
I marked this as stale due to inactivity. → More info |
Discussed on the Nix team meeting 2023-01-16:
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-01-13-nix-team-meeting-minutes-23/24644/1 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2021-01-30-nix-team-meeting-minutes-25/25094/1 |
Closing since we'd rather use |
Discussed in Nix team meeting:
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-04-14-nix-team-meeting-minutes-48/27358/1 |
Motivation
This PR makes
builtins.fetchGit
more secure and reproducible by using the clean version of the files.Previously, only remotely fetched files would be in their clean state. With this change, files from local repositories will also be clean instead of smudged.
This also fixes the documentation which was wrong about the default behavior for
ref
(not HEAD when local).Background
Git filters allow the user to change how files are represented
in the worktree.
On checkout, the configured smudge converts blobs to files in the worktree.
On checkin, the configured clean command converts files back into blobs.
Notable uses include
See also https://git-scm.com/docs/gitattributes#_filter
Why ignore filters
To quote the git docs
So the feature was designed to be optional. This confirms that we have a
choice. Let's look at the individual use cases.
Allow the user to work with a platform-specific representation
While this might seem convenient, any such processing can also be done in
postUnpack
, so it isn't necessary here.Tarballs from GitHub and such don't apply the smudge filter either, so if
the project is going to be packaged in Nixpkgs, it will have to process its
files like this anyway.
The real kicker here is that running the smudge filter creates an
unreproducible dependency, because the filter does not come from a pinned
immutable source and it could inject information from arbitrary sources.
Git-crypt
The nix store can be read by any process on the system, or in some cases,
when using a cache, literally world-readable.
Running the filters in fetchGit would essentially make impossible the use of
git-crypt and Nix flakes in the same repository.
Even without flakes (or with changes to the flakes feature for that matter),
the software you want to build generally does not depend on credentials, so
not decrypting is not only a secure default, but a good one.
In a rare case where a build does not to decrypt a git-crypted file, one could
still pass the decrypted file or the git-crypt key explicitly (at the cost of
exposing it in the store, which is inevitable for nix-built paths).
Git LFS
Git LFS was designed to prevent excessive bloat in a repository, so the
"smudged" versions of these files will be huge.
If we were to include these directly in the
fetchGit
output, this createscopies of all the large files for each commit we check out, or even for
each uncommitted but built local change (with fetchGit ./.).
In many cases, those files may not even be used in the build process. If
they are required, it seems feasible to fetch them explicitly with a
fetcher that fetches from LFS based on the sha256 in the unsmudged files.
It is more fine grained than downloading all LFS files and it does not even
require IFD because it happens after fetchGit, which runs at evaluation time.
If for some reason LFS support can not be achieved in Nix expressions, we
should add support for LFS itself, without running any other filters.
Conclusion
Not running the filters is more reproducible, secure and potentially more
efficient than running them.