-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate tiny compiled binary for wrapping executables #124556
Conversation
see #95569 for prior art |
… style of other setup-hooks. Add compilation step with gcc. Embed the entire generated source code into the binary for troubleshooting.
I love this! It took a moment of fiddling about but I used the default
so while this is significantly heavier than the shell-script version (unsurprisingly — the debug string literal is going to be around the size of the script) it's significantly smaller than the one that depends on On the other hand, that one has much more features. Looking at the current uses of |
@hrhino Thanks for the feedback! I turned on -Os (optimize for small size) to keep the binary size small as well. Although the compiler will automatically throw away functions that are not being used - the debug string won't - so in order to keep the debug string clean - I only include the C-functions used by --prefix and --suffix if they are actually used. |
…al to improve purity of makeCWrapper. Refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks promising.
Could you add tests?
- some examples to test the behavior
- some means of ensuring memory safety
- running the examples in
valgrind
; might not catch everything - running a static analysis on the generated code for the examples (don't know which tool to use. Any suggestions?)
- adding some golden tests, comparing the generated code to known-correct generated code. This way memory safety of the generated code can be checked by hand, at least for some inputs
- running the examples in
You can add a derivation that runs these tests in the same directory as the implementation and callPackage
it from pkgs/test/default.nix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using echo instead of printf with a new line at the end.
In terms of memory leak risksAs long as the wrapper is executed as its own process, the OS will take care of freeing any remaining memory when the process exits. So if there are memory leaks in the wrapper, it will simply add an additional constant overhead to the process that has been wrapped while it is running. It is not possible to free things like the We can only safely free memory that is not used by the wrapped program. That essentially means only "temporary" variables can be freed. From the code I've written so far, the only temporary variable that is not consumed by the wrapped process is the result of The reason In contrast to setenv, putenv does not create its own internal copy - which might mean it has slightly less memory overhead. This is why I've used |
I'm not concerned about memory leaks either, but I am concerned about undefined behavior now and in the future. Seemingly innocent mistakes can lead to security problems, amplified by the fact that such a simple core component would be assumed to be trustworthy by many. Even just the "golden test" idea goes a long way when it comes to this responsibility, because it will let maintainers check the generated programs manually and help us check them again when the code generation inevitably needs to change. |
… Switch to strncpy in concat3. Use multiline strings to print C functions. Switch from if/elif to case.
This looks so great :) A few comments, first of all: diff --git i/pkgs/top-level/all-packages.nix w/pkgs/top-level/all-packages.nix
index c53485f5dea..30b38909952 100644
--- i/pkgs/top-level/all-packages.nix
+++ w/pkgs/top-level/all-packages.nix
@@ -642,6 +642,8 @@ with pkgs;
makeWrapper = makeSetupHook { deps = [ dieHook ]; substitutions = { shell = targetPackages.runtimeShell; }; }
../build-support/setup-hooks/make-wrapper.sh;
+ makeBinaryWrapper = makeSetupHook { } ../build-support/setup-hooks/make-binary-wrapper.sh;
+
makeModulesClosure = { kernel, firmware, rootModules, allowMissing ? false }:
callPackage ../build-support/kernel/modules-closure.nix {
inherit kernel firmware rootModules allowMissing; And then it's easy to test this with:
Or even without the need to clone and checkout your branch:
It'd be nice to put that info in the first comment of the thread :). Other then that, When giving a try to |
I've created a new PR to fix the Darwin issues we discussed now: #150079 |
makeSetupHook { deps = [ dieHook ]; } script; | ||
in | ||
lib.makeOverridable f { | ||
cc = stdenv.cc.cc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails because stdenv.cc.cc
is unwrapped and has no /bin/cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is addressed by #150079, but note that it is also a problem on Linux
Alternatively, you could prove correctness using a sound static analyzer. Also, perhaps the functions could form a memory-safe interface, such that the generated code is as simple as possible. That would make it feasible to have a high assurance without performance impact; without even having to run the analyzer as a build dependency in practice (because the functions have already been checked and that check has been cached). |
Yeah, static analyzers are an attractive alternative to using sanitizers - since they move the checking from run-time to build time. This means problems are also detected earlier. A static analyzer can never prove safety in the general case, since this is an undecidable/uncomputable problem. So I guess sanitizers might provide a somewhat stronger protection against undefined behaviour (although at a high cost to performance). It seems like it is common to combine multiple static analyzers - since they tend to specialize in different areas/detect different kinds of problems. When I tested infer static analyzer and the clang static analyzer on the binary wrapper code - there was for the most part no overlap in the problems they would detect. infer seemed a lot better than clang at detecting buffer overruns and potential NULL dereferencing for example: |
I don't think we need the general case.
Only if the analysis is unsound. Undecidability does not exclude soundness and our programs are very simple here, so checking with an analyzer that is sound (but incomplete, as per undecidability) seems feasible. CodeHawk-C is sound but I don't have experience with it and it hasn't been packaged yet. Maybe other analyzers exist that have the soundness property. |
Yes, this is a good point. By having an analyzer that marks everything as potentially unsafe - it would be trivially sound (although not particularly useful). Rather than adding suspicious cases, a sound analyzer would remove things that are provably safe from the pool of potential problems (where everything starts out as potential problems). Undecidability just prevents it from removing everything that is safe - meaning it will tag some safe things as potentially unsafe. |
It seems like the compilation/tests of makeBinaryWrapper fails on aarch64-darwin, due to some codesign issues. Anyone here with an M1 macbook/that know what this might be about? See the OfBorg logs and #150079 for more info. To run the tests locally: $ git clone https://github.com/bergkvist/nixpkgs bergkvist-nixpkgs
$ cd bergkvist-nixpkgs
$ git checkout darwin-binary-wrapper-fixes
$ nix-build -A makeBinaryWrapper.passthru.tests |
Very cool to see this make it into Nixpkgs in some form. Thanks for getting this first version in and for continuing to work to improve this feature! Also thanks to all the reviewers, since this seems like one of the harder changes to review in Nixpkgs and has gotten a lot of work from reviewers. You guys are awesome! |
First, thanks for this PR. But since it's time to change the way we handle wrappers, I created a new issue to mention another problem that can be problematic with wrappers (namely that Note that as it, this PR does not solve it (nor does #150079). Would be great if both issues could be solved at once. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
I haven't had a chance to poke at these wrappers yet, but this seems like a fair place to register that I may need a programmatic way to associate them with their wrapped executables. I noted this tentatively back in NixOS/rfcs#75, and at the time I was imagining a metadata dotfile alongside the wrapper. I didn't realize there was work under-way here, and the feature I needed this for has been in nixpkgs for a few months. Luckily I noticed toonn mention this on discourse. 😊 Basically:
|
https://hydra.nixos.org/build/240805256/nixlog/1 https://hydra.nixos.org/build/240805170/nixlog/2 Failure is a bit obscured but long story short, a script within bazel gets custom nixpkgs shebang which in turn makes shell run in POSIX-compatible mode. Bazel expects bash in non-POSIX mode and osx-specific script starts to fail due to `set -e` and subshell interaction differences in those modes (sub-shells and functions suddently start inheriting `set -e` and fail to produce desired output). More debug info is available in NixOS#267670 Shell scripts aren't guaranteed to work as interpreters in shebang. In particular thin shell wrappers aren't shebang-ready on MacOS. It may work sometimes depending on what exactly would try to execute a script with such shebang, but generally it's not guaranteed to work. See NixOS#124556 Bash wrapper was introduced in NixOS#266847 and so far seems like the issue only affects darwin builds: hydra failure is in osx-specific script, also shebang issue is usually darwin-specific. Let's wrap it as a native binary to make it shebang-compatible. The wrapper is only currently added to `bazel_6` so no need for changes in other versions. ZHF: NixOS#265948
https://hydra.nixos.org/build/240805256/nixlog/1 https://hydra.nixos.org/build/240805170/nixlog/2 Failure is a bit obscured but long story short, a script within bazel gets custom nixpkgs shebang which in turn makes shell run in POSIX-compatible mode. Bazel expects bash in non-POSIX mode and osx-specific script starts to fail due to `set -e` and subshell interaction differences in those modes (sub-shells and functions suddently start inheriting `set -e` and fail to produce desired output). More debug info is available in #267670 Shell scripts aren't guaranteed to work as interpreters in shebang. In particular thin shell wrappers aren't shebang-ready on MacOS. It may work sometimes depending on what exactly would try to execute a script with such shebang, but generally it's not guaranteed to work. See #124556 Bash wrapper was introduced in #266847 and so far seems like the issue only affects darwin builds: hydra failure is in osx-specific script, also shebang issue is usually darwin-specific. Let's wrap it as a native binary to make it shebang-compatible. The wrapper is only currently added to `bazel_6` so no need for changes in other versions. ZHF: #265948 (cherry picked from commit 7377bba)
https://hydra.nixos.org/build/240805256/nixlog/1 https://hydra.nixos.org/build/240805170/nixlog/2 Failure is a bit obscured but long story short, a script within bazel gets custom nixpkgs shebang which in turn makes shell run in POSIX-compatible mode. Bazel expects bash in non-POSIX mode and osx-specific script starts to fail due to `set -e` and subshell interaction differences in those modes (sub-shells and functions suddently start inheriting `set -e` and fail to produce desired output). More debug info is available in #267670 Shell scripts aren't guaranteed to work as interpreters in shebang. In particular thin shell wrappers aren't shebang-ready on MacOS. It may work sometimes depending on what exactly would try to execute a script with such shebang, but generally it's not guaranteed to work. See #124556 Bash wrapper was introduced in #266847 and so far seems like the issue only affects darwin builds: hydra failure is in osx-specific script, also shebang issue is usually darwin-specific. Let's wrap it as a native binary to make it shebang-compatible. The wrapper is only currently added to `bazel_6` so no need for changes in other versions. ZHF: #265948 (cherry picked from commit 7377bba)
Generate tiny compiled binary for wrapping executables
Try out the latest version
env NIX_PATH=nixpkgs=https://github.com/bergkvist/nixpkgs/archive/bergkvist/make-c-wrapper.tar.gz \ nix-shell -p makeBinaryWrapper --run "makeWrapper [EXECUTABLE] [OUT_PATH] [ARGS]"
Supported ARGS:
--argv0 NAME
: set name of executed process to NAME (defaults to EXECUTABLE)--inherit-argv0
: the executable inherits argv0 from the wrapper. (use instead of--argv0 '$0'
)--set VAR VAL
: add VAR with value VAL to the executable’s environment--set-default VAR VAL
: like --set, but only adds VAR if not already set in the environment--unset VAR
: remove VAR from the environment--chdir DIR
: change working directory (use instead of--run "cd DIR"
)--add-flags FLAGS
: add FLAGS to invocation of executable--prefix ENV SEP VAL
: prefix ENV with VAL, separated by SEP--suffix ENV SEP VAL
: suffix ENV with VAL, separated by SEPExplore generated code
If you are curious about the generated code, and not interested in actually creating a wrapper, you can use the following, which will simply print out the C-code:
env NIX_PATH=nixpkgs=https://github.com/bergkvist/nixpkgs/archive/bergkvist/make-c-wrapper.tar.gz \ nix-shell -p makeBinaryWrapper --run "makeCWrapper [EXECUTABLE] [ARGS]"
Motivation for this change
Nix uses bash-wrappers to inject custom environment variables into executables. In the cases where this executable is an interpreter (like Python or Perl), it can be placed in the shebang line of a script. On MacOS, you can't put a script (with its own shebang) in the shebang line of an executable due to a limitation with the
execve
-syscall.If we had some way of generating tiny compiled binaries instead of bash-scripts as wrappers then this would work on both Linux and MacOS.
See #23018 for more discussion.
A focus in this PR has been to minimize the number of dependencies - and also keep the implementation itself as minimal as possible.
Dependencies
bash
awk
gcc
on Linux orclang
on MacOSunistd
+stdlib
+assert
+stdio
(C POSIX libraries)asprintf
(GNU extension for stdio which is available on macOS/OpenBSD 2.3+)Consider the following wrapper shell script:
Putting this script in a shebang works fine on Linux, but doesn't work on MacOS. If we want to write C-code that replaces it (and works on MacOS+Linux), we can do it like this in C:
This PR has creates a simple bash function that generates C-code for such a tiny compiled binary (and compiles it), with an interface similar to the existing makeWrapper. There are some features of the original makeWrapper which is not yet implemented here.
Debuggability
A big concern with using a binary wrapper is that people can't just open up the file to see what it is doing when debugging their own problems. This is fixed by embedding the arguments used to create the wrapper as a string variable into the source code itself. The result is that the binary file will contain the wrapper arguments in human readable format when opening the file in a plain text editor or using the
strings
command on MacOS or Linux.Example of how it looks right now:
makeBinaryWrapper /usr/bin/python3 ./wrapper \ --set HELLO WORLD --set-default X $'Y\n"' --unset Z --argv0 python3 cat ./wrapper
C String literals in the generated code (including the documentation) are properly escaped. I got some help with how to do this properly on StackOverflow: https://stackoverflow.com/questions/67710149/how-can-i-sanitize-user-input-into-valid-c-string-literals.
Generated source code example
makeDocumentedCWrapper /usr/bin/python3 --set HELLO WORLD --set-default X $'Y\n"' --unset Z --argv0 python3
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)Issues that this PR would likely impact: