Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel_6, bazel_5, bazel_4: add cpp darwin bash set -e workaround #267670

Closed

Conversation

boltzmannrain
Copy link
Contributor

@boltzmannrain boltzmannrain commented Nov 15, 2023

Description of changes

Submitting upstream here
bazelbuild/bazel#20212

Looks like with some bash versions and under some conditions sub-shell would propagate errors to parent shell under set -e even though it shouldn't.

https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2

Was able to reproduce the Hydra issue on x86_64-darwin with bazel_6, but not with aarch64-darwin having same numeric bash version.

ZHF: #265948

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Submitting upstream here
bazelbuild/bazel#20212

Looks like with some bash versions and under some conditions
sub-shell would propagate errors to parent shell under `set -e`
even though it shouldn't.

https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2

Was able to reproduce the Hydra issue on `x86_64-darwin` with `bazel_6`,
but not with `aarch64-darwin` having same numeric bash version.

ZHF: NixOS#265948
@boltzmannrain
Copy link
Contributor Author

@ofborg build bazel_4
@ofborg build bazel_5
@ofborg build bazel_6

@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Nov 15, 2023
@boltzmannrain
Copy link
Contributor Author

@ofborg build bazel_4
@ofborg build bazel_5
@ofborg build bazel_6

@boltzmannrain
Copy link
Contributor Author

boltzmannrain commented Nov 15, 2023

bazel_5 on aarch64-darwin failure looks weird

       … while evaluating the attribute 'patches' of the derivation 'bazel-5.4.1'
       at /private/var/lib/ofborg/checkout/repo/38dca4e3aa6bca43ea96d2fcc04e8229/builder/macstadium-m1-44911104/pkgs/stdenv/generic/make-derivation.nix:348:7:
          347|     // (optionalAttrs (attrs ? name || (attrs ? pname && attrs ? version)) {
          348|       name =
             |       ^
          349|         let
       error: getting status of '/private/var/lib/ofborg/checkout/repo/38dca4e3aa6bca43ea96d2fcc04e8229/builder/macstadium-m1-44911104/pkgs/development/tools/build-managers/bazel/bazel_cpp_darwin_bash_set_e.patch': No such file or directory

bazel_6 on aarch64-darwin probably hits some other issue (maybe happens when xcode isn't installed?)

       > ERROR: /private/tmp/nix-build-bazel-6.4.0.drv-7/bazel_src/src/tools/execlog/BUILD:12:12: Target '//src/tools/execlog:parser' depends on toolchain '@local_config_cc//:cc-compiler-darwin', which cannot be found: no such target '@local_config_cc//:cc-compiler-darwin': target 'cc-compiler-darwin' not declared in package '' defined by /private/var/tmp/_bazel__nixbld2/1e0a3ebf7161247fa1d28ebf971ba1ff/external/local_config_cc/BUILD (Tip: use `query "@local_config_cc//:*"` to see all the targets in that package)'

Both did compile locally.

bazel-watcher tests also fail for unrelated reasons and it may make sense to remove bazel-watcher from passthru.tests like in https://github.com/NixOS/nixpkgs/pull/231839/files because it will still be using bazel_5, not the bazel_4 or bazel_6 and also bazel-watcher doesn't have the highest level of support

@uri-canva
Copy link
Contributor

Is this a difference between gnu readlink and bsd readlink?

@uri-canva
Copy link
Contributor

Also does it matter? get_realpath is only called in a pipe, and I don't see the script setting pipefail.

@uri-canva
Copy link
Contributor

I can't repro the failure locally (nix-build -A bazel_6.tests.cpp), so I'm not sure about the fix. Also the tests still fail on ofborg, though with a different error message, that happens before the failure in hydra, so it's hard to tell whether this will fix it on hydra or not.

@boltzmannrain
Copy link
Contributor Author

I can't repro the failure locally (nix-build -A bazel_6.tests.cpp), so I'm not sure about the fix.

Saw it on hydra, and by chance it was reproducible on my x86-64 machine (and fixed with the change)

% sw_vers 
ProductName:	macOS
ProductVersion:	12.7.1
BuildVersion:	21G920

% bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21)
Copyright (C) 2007 Free Software Foundation, Inc.

% man readlink | tail
     file(1), ls(1), lstat(2), readlink(2), stat(2), printf(3), strftime(3)

HISTORY
     The stat utility appeared in NetBSD 1.6 and FreeBSD 4.10.

AUTHORS
     The stat utility was written by Andrew Brown <[email protected]>.  This man
     page was written by Jan Schaumann <[email protected]>.

macOS 12.7                        June 22, 2017                       macOS 12.7

It also started to appear on hydra without any obvious related changes to bazel build expressions, but may potentially be related to recent upgrades like

12.7.1 (21G920)
(October 25, 2023) 

Though agree that it would be even better with more detailed investigation. It should still be reproducible for me, so can run extra tests if you have any suggestions for those

@boltzmannrain
Copy link
Contributor Author

Also does it matter? get_realpath is only called in a pipe, and I don't see the script setting pipefail.

The top-level script doesn't fail here when issue is triggered.
Manifestation is that get_realpath returns empty string and no error for a path that is resolvable and all hops are accessible.
It looks like it does so when next= as expected but then instead of ignoring readlink exit code, checking for empty string and returning previous the get_realpath decides to fail on sub-shell failure so prints nothing, and then as you mention with lack of pipefail the failure is ignored and output is treated as empty string further

@boltzmannrain
Copy link
Contributor Author

Converting to draft after more debugging seemingly invalidating the hypothesis on why the patch helps bazelbuild/bazel#20212 (comment)

@boltzmannrain boltzmannrain marked this pull request as draft November 17, 2023 01:15
@boltzmannrain
Copy link
Contributor Author

Looks like #266847 might be the triggering change.
Before in the cpp test bash 5.2.15 was running the test, after it switched to 3.2.57
Though still don't see the full picture about why did 5.2 work and why 3.2 doesn't

@boltzmannrain
Copy link
Contributor Author

Next clue is that bash 5.2 runs in non-posix mode but bash 3.2 runs in posix mode and also propagates errexit to subshells. That's the difference. But not yet clear why bash 3.2 decides to run in posix mode (it'd be the case if it is invoked via /bin/sh but cc_wrapper.sh has bin/bash shebang and bazel code shouldn't be calling it through bin/sh either)

@boltzmannrain
Copy link
Contributor Author

Does get invoked via $somebash/bin/sh 🤔

ERROR: /private/tmp/nix-build-bazel-6.4.0.drv-4/bazel_src/examples/cpp/BUILD:13:8: Linking examples/cpp/hello-success_test failed: (Exit 1): cc_wrapper.sh failed: error executing command (from target //examples/cpp:hello-success_test) 
  (cd /private/tmp/nix-build-bazel-6.4.0.drv-4/_bazel__nixbld1/28750aefb4827f6009387cd4b2525548/sandbox/processwrapper-sandbox/16/execroot/io_bazel && \
  exec env - \
    PATH=/nix/store/6hy9npmilhl8aydbsrs849p7b819sv4v-python3-3.11.5/bin:/nix/store/c6y6v83alrb8hg2p14z5zgmhdk0pvymw-unzip-6.0/bin:/nix/store/ngm27lv1ck9az307yi4rhg267im4lg3p-which-2.21/bin:/nix/store/gy1p16mqpw5flsr65jdm932hqjzqy7yb-zip-3.0/bin:/nix/store/y71hc5fjg3ax4dn6lxh6sqixrbjk96ap-cctools-port-973.0.1/bin:/nix/store/6q4hlgbdjzp5lzr3b41wmaj036zggmd6-clang-wrapper-11.1.0/bin:/nix/store/m9ss04fbwfd8544pdk6aan7k8sk85zgm-clang-11.1.0/bin:/nix/store/hcil3fgcjav0y458ff4m98zgcqky00gk-coreutils-9.3/bin:/nix/store/96vvg322k5lvrl2d0sp5brdsxyl6l7f6-cctools-binutils-darwin-wrapper-11.1.0-973.0.1/bin:/nix/store/nshpjjndkd8x0nkjpl8p2pljjfplz1h7-cctools-binutils-darwin-11.1.0-973.0.1/bin:/nix/store/h3x4dj4xb84qmpd57zvyd7a369ysazbh-zulu11.66.15-ca-jdk-11.0.20/bin:/nix/store/ppkzx8va3lb1kbnxlr276kv6vkmnadk9-bash/bin:/nix/store/v2s6lw54klm8qvff4209125bhf99yifk-bash-5.2-p15/bin:/nix/store/x9h23ixd3x35n0sm244qc3b64ppg67dp-diffutils-3.10/bin:/nix/store/162rzwan1c3s1czwdjf8mxb4hafr1wih-file-5.45/bin:/nix/store/xyaf13n6x0q7m5wvvlqjvsqyqjpw10b0-findutils-4.9.0/bin:/nix/store/4pvhzm0pd1skad8wkszdc0g77zy70vh7-gawk-5.2.2/bin:/nix/store/hfjfxp1x0rc6d7vizwrhsx4y6dhf9pda-gnugrep-3.11/bin:/nix/store/pb8fw4s6fhcffm127dpif2dmihp78rn2-patch-2.7.6/bin:/nix/store/4qm1gw0w2zpdg16jya88r82f8bxl33np-gnused-4.9/bin:/nix/store/4ifdvs7fjczlvyssp1h91139n4yb57d3-gnutar-1.35/bin:/nix/store/7mq4n3fjyg42qsaxd8yk4w0z4x8yrhms-gzip-1.13/bin:/nix/store/hcil3fgcjav0y458ff4m98zgcqky00gk-coreutils-9.3/bin:/nix/store/xyaf13n6x0q7m5wvvlqjvsqyqjpw10b0-findutils-4.9.0/bin:/nix/store/x9h23ixd3x35n0sm244qc3b64ppg67dp-diffutils-3.10/bin:/nix/store/4qm1gw0w2zpdg16jya88r82f8bxl33np-gnused-4.9/bin:/nix/store/hfjfxp1x0rc6d7vizwrhsx4y6dhf9pda-gnugrep-3.11/bin:/nix/store/4pvhzm0pd1skad8wkszdc0g77zy70vh7-gawk-5.2.2/bin:/nix/store/4ifdvs7fjczlvyssp1h91139n4yb57d3-gnutar-1.35/bin:/nix/store/7mq4n3fjyg42qsaxd8yk4w0z4x8yrhms-gzip-1.13/bin:/nix/store/n66a4pzpv5lqrvd3xi6v2gy947bcgb9v-bzip2-1.0.8-bin/bin:/nix/store/x5yxhfzpj4vwb42pxbvyxvijwz88gfiz-gnumake-4.4.1/bin:/nix/store/v2s6lw54klm8qvff4209125bhf99yifk-bash-5.2-p15/bin:/nix/store/z81njmbg6db5011g4hgi8m5r0yh81lpl-patch-2.7.6/bin:/nix/store/1595k4ndj99y757yf4rfssds357nypqz-xz-5.4.4-bin/bin:/nix/store/162rzwan1c3s1czwdjf8mxb4hafr1wih-file-5.45/bin \
    PWD=/proc/self/cwd \
  external/local_config_cc/cc_wrapper.sh @bazel-out/darwin-fastbuild/bin/examples/cpp/hello-success_test-2.params)
# Configuration: 124cd91dd0a1c71a0614efb9086912de1dc3d2ed63973b5d8bcadb010e0107ff
# Execution platform: //:default_host_platform

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
BASH=/nix/store/v2s6lw54klm8qvff4209125bhf99yifk-bash-5.2-p15/bin/sh

@boltzmannrain
Copy link
Contributor Author

Looks like somehow on MacOS ./some/script.sh can ignore the shebang depending on current shell and/or if env is command prepended (either MacOS one or from nixpkgs). Which in turn can cause using /bin/sh rather than bash (in some surroundings it'd even be nixpkgs version of bash/bin/sh 🤔 ). And looks this hijack only happens for scripts (#!/bin/echo 123 seems to be always respected)

So it does seem like #266847 is the trigger. But maybe it is MacOS or bazel to blame on breaking/relying that shebang is respected or maybe that /bin/sh and env would respect shebang on ./some/script.sh.

Looks like this behavior doesn't happen with NixOS /bin/sh&env, but didn't test on other distros yet.

@boltzmannrain
Copy link
Contributor Author

boltzmannrain commented Nov 19, 2023

MacOS behavior for following shells

% /bin/zsh --version
zsh 5.8.1 (x86_64-apple-darwin21.0)
% /bin/bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21)
Copyright (C) 2007 Free Software Foundation, Inc.
% /bin/sh --version 
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21)
Copyright (C) 2007 Free Software Foundation, Inc.
  1. Missing path in shebang - all reject
% cat test.sh
#!/missing

echo "I run"
% for x in sh bash zsh; do /bin/${x} -c './test.sh'; done
/bin/sh: ./test.sh: /missing: bad interpreter: No such file or directory
/bin/bash: ./test.sh: /missing: bad interpreter: No such file or directory
zsh:1: ./test.sh: bad interpreter: /missing: no such file or directory
  1. File without executable flag in shebang - all reject
% touch /tmp/unexec
% cat test.sh
#!/tmp/unexec

echo "I run"
% for x in sh bash zsh; do /bin/${x} -c './test.sh'; done
/bin/sh: ./test.sh: /tmp/unexec: bad interpreter: Permission denied
/bin/bash: ./test.sh: /tmp/unexec: bad interpreter: Permission denied
zsh:1: permission denied: ./test.sh
  1. /bin/echo in shebang - all respect (also works with cp /bin/echo /tmp/echo and /tmp/echo in shebang)
% cat test.sh 
#!/bin/echo echo

echo "I run"
% for x in sh bash zsh; do /bin/${x} -c './test.sh'; done
echo ./test.sh
echo ./test.sh
echo ./test.sh
  1. non-binary file in shebang - only zsh respects (both bad and good case, other shells just use themselves again, zsh respects; same for -c 'exec ./test.sh')
% touch /tmp/somefile
% chmod +x /tmp/somefile
% for x in sh bash zsh; do /bin/${x} -c './test.sh'; done
I run
I run
zsh:1: exec format error: ./test.sh
  1. env A=B ./script.sh - runs a posix-shell frankenstein (both bash and zsh shell parameters are set, seems to be zsh masking as others based on SHLVL), also ignores the shebang
% cat test.sh 
#!/tmp/somefile

/bin/echo "set:"
set | /usr/bin/grep "SH[^_]*="
/bin/echo "env:"
/usr/bin/env | /usr/bin/grep "SH[^_]*="

echo "I run"
% for x in sh bash zsh; do env - /bin/${x} -c 'env A=B ./test.sh'; done
set:
BASH=/bin/sh
SHELL=/bin/zsh
SHELLOPTS=braceexpand:hashall:interactive-comments:posix
SHLVL=2
env:
SHLVL=2
I run
set:
BASH=/bin/sh
SHELL=/bin/zsh
SHELLOPTS=braceexpand:hashall:interactive-comments:posix
SHLVL=2
env:
SHLVL=2
I run
set:
BASH=/bin/sh
SHELL=/bin/zsh
SHELLOPTS=braceexpand:hashall:interactive-comments:posix
SHLVL=1
env:
SHLVL=1
I run

@uri-canva
Copy link
Contributor

bazelbuild/bazel#20212 (comment)

Turns out not exactly bazel or bash fault here. The problem is in NixOS shebang replacement that used non-binary interpreter and that isn't guaranteed to work (XNU kernel can deny execution of scripts and calling shell or env don't necessarily have code to handle interpreter that is a script itself)

Whoops, I even was aware of the fact, we have some binaries in nixpkgs for that express reason, but I forgot about it when reviewing #266847.

@uri-canva
Copy link
Contributor

See #23018.

@boltzmannrain
Copy link
Contributor Author

Yep, already preparing a PR with fix via makeBinaryWrapper

boltzmannrain added a commit to boltzmannrain/nixpkgs that referenced this pull request Nov 20, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in NixOS#267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See NixOS#124556

Bash wrapper was introduced in NixOS#266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: NixOS#265948
github-actions bot pushed a commit that referenced this pull request Dec 7, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in #267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See #124556

Bash wrapper was introduced in #266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: #265948
(cherry picked from commit 7377bba)
bjornfor pushed a commit that referenced this pull request Dec 7, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in #267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See #124556

Bash wrapper was introduced in #266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: #265948
(cherry picked from commit 7377bba)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants