Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NixOS 23.11 — Zero Hydra Failures #265948

Closed
figsoda opened this issue Nov 6, 2023 · 19 comments
Closed

NixOS 23.11 — Zero Hydra Failures #265948

figsoda opened this issue Nov 6, 2023 · 19 comments
Labels
0.kind: ZHF Fixes Fixes during the ZHF campaign

Comments

@figsoda
Copy link
Member

figsoda commented Nov 6, 2023

Hi, we are figsoda & Ryan Lahfa, the release managers for NixOS 23.11 ("Tapir").

Today we want to invite everyone to participate in the Zero Hydra Failures Project, wherein we prepare the package set for the upcoming release, up until its public release at the end of November.

There are only two more upcoming dates that we'd like to mention in that context:

  • 2023-11-20: Branch-off
  • 2023-11-29: 23.05 Release

The complete timeline can be found in the

The mission

Every time we plan to do a release, we take time to stabilize the master branch and later on the release branch.
Our goal here is to reduce the number of failing jobs on the nixpkgs:trunk nixos:trunk-combined jobsets as much as possible before branch-off. We call this the "Zero Hydra Failure" campaign.

Besides aiming for zero failed jobs, we also strive to again provide all packages that were available in the previous release.

Changes need to always target the master branch. Take note that the branch-off will occur on 2023-11-20, after which ZHF changes will need to be tagged with the backport: release-23.11 label to land in the stable release.

Jobsets

The relevant jobsets to check for failing jobs are:

Workflow

Finding broken packages

Eval reports

Evaluation reports provide a structural overview of the most impactful failing builds. They originated at https://github.com/nix-community/nix-review-tools and were automated over at https://github.com/malob/nix-review-tools-reports.

  1. Navigate to https://malob.github.io/nix-review-tools-reports/
  2. Open the relevant jobset: see the previous section on which jobset to select based on what you are looking at (Linux, Darwin or NixOS tests).
  3. Browse the latest reports for build failures
  4. Follow the links to the build failure on hydra

ZERO Hydra Failures

The platform automatically crawls Hydra and lists packages by maintainer and lists the most important dependencies (failing packages with the most dependants). It also graphs the general trend per platform.

  1. Navigate to https://zh.fail

For the record, we started ZHF here:

Latest Linux evaluation (completely built):	[1801356](https://hydra.nixos.org/eval/1801356) on 2023-11-04 14:18:21 (UTC)
Latest Darwin evaluation (completely built):	[1801390](https://hydra.nixos.org/eval/1801390) on 2023-11-05 12:54:55 (UTC)
Failing builds on aarch64-darwin:	915
Failing builds on aarch64-linux:	1240
Failing builds on i686-linux:	586
Failing builds on x86_64-darwin:	1142
Failing builds on x86_64-linux:	1491
Total failed builds	5374

For comparison, last's release ZHF started there:

Latest Linux evaluation (completely built):	[1794693](https://hydra.nixos.org/eval/1794693) on 2023-05-07 15:51:32 (UTC)
Latest Darwin evaluation (completely built):	[1794694](https://hydra.nixos.org/eval/1794694) on 2023-05-07 16:12:27 (UTC)
Failing builds on aarch64-darwin:	739
Failing builds on aarch64-linux:	1781
Failing builds on i686-linux:	612
Failing builds on x86_64-darwin:	825
Failing builds on x86_64-linux:	1909
Total failed builds	5866

Check on packages you maintain

  1. Clone nixpkgs and checkout the master branch
  2. Run
    nix-build maintainers/scripts/build.nix --argstr maintainer <name>
    

Alternatively: you can check https://zh.fail/failed/overview.html also.

Hydra

Hydra is nixpkgs CI platform, where all active branches are built and pushed into the cache, after which channels can originate from its build results.

  1. Open the nixpkgs:trunk jobset
  2. Select the latest evaluation
  3. Directly failing jobs are marked with a red cross, while transitively failing ones are greyed out.
  4. Use the search form to scope the package list to things relevant to you and that you can test.

Submit fixes

  1. Search through PRs to make sure none provided a fix yet. If there is one, please take the time and help review the change.

  2. If there is no open PR, troubleshoot why it's failing and fix it.

  3. Pull Request the fix against the master branch and wait potential review & change requests

    • Add the 0.kind: ZHF Fixes label, so people can better browse these fixes
    • If your PR causes more than ~500 rebuilds, it is generally preferred to target staging to avoid compute churn for users on master.
    • If no reviewer is automatically added to your PR, check the Git history or the maintainers and ping them (in the pull request) or add them (if you have the rights) as reviewers
    • If, after a while, no one reviewed the PR, you can post it in https://discourse.nixos.org/t/prs-ready-for-review/3032/2183 to get more attention
    • If, after an (extra) while, nothing really happened, you can drop a line in the NixOS development channel or mention @NixOS/nixos-release-managers on the PR

Backporting

After 2023-11-20

  1. Apply the relevant backport label to land the fix in the release branch

    • Changes to master get backported into release-23.11
    • Changes to staging get backported into staging-23.11
  2. If the backport action fails, follow the manual backporting steps. Make sure to use git cherry-pick -x <rev> on all commits intended for backport.


Always link back to this issue by mentioning the issue number in the description of your pull request:

ZHF: #265948

If your PR receives no reviews or does not get merged, feel free to

  • add the 0.kind: ZHF Fixes label, so people can better browse these fixes
  • request or mention @NixOS/nixos-release-managers on the PR

Broken packages

Everything we cannot fix in time will need to be marked broken on the respective platforms, so that Hydra will not retry builds over and over, thereby wasting compute resources.

Set meta.broken and add a reference and/or explanation, like this:

meta = {
  # ref to issue/explanation
  broken = stdenv.isDarwin; # only broken on darwin
  # broken = true; # broken on all platforms.
};

Orphaned packages

You can read about failing packages without a maintainer here: https://zh.fail/failed/by-maintainer/_.html (orphaned packages).

If you're new to NixOS, adopting an orphaned package is a great way to get involved and contribute to the community. By doing so, you'll not only help improve the overall quality of the NixOS ecosystem, but you'll also gain valuable experience working with Nix, the language and tool that powers the package management system.

By adopting an orphaned package, you'll be taking on a responsibility that can be both challenging and rewarding. You'll need to understand the package's code and dependencies, make sure it builds and works correctly, and respond to any issues or pull requests that come up. This process can be a great learning experience, as you'll be exposed to a wide variety of programming languages and libraries.

Moreover, by adopting an orphaned package, you'll be making a tangible impact on the NixOS community. Your contributions will be greatly appreciated by users who depend on that package, and you'll be helping to ensure that NixOS releases are as stable and up-to-date as possible.

Closing

This is a great way to help NixOS, and it is a great time for new contributors to start their nixpkgs adventure. 🥳

As with the feature freeze issue, please keep the discussion here to a minimum so we don't ping all maintainers (although relevant comments can of course be added here if they are directly ZHF-related) and ping one of the release managers (@figsoda, @RaitoBezarius) in the respective issues.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/zero-hydra-failure-23-11-edition/35103/1

@paul-jewell
Copy link

paul-jewell commented Nov 7, 2023

@figsoda - You have a typo in line 3 - you mention release at end of "may" instead of "November". I guess the end of November release is also 23.11 a few lines later. (oh the joy of cut and paste - I feel for you!!)
I'm not sure I know enough to support this, but I won't know without trying, so I will follow the instructions later and get started.

@figsoda
Copy link
Member Author

figsoda commented Nov 7, 2023

@paul-jewell Good catch! thanks

@tu-maurice
Copy link
Contributor

I can't seem to find a way to add the ZHF Fixes label to my PRs. Can someone give me a hint?

@mfrischknecht
Copy link
Contributor

@infinisil

I still have a follow-up question to this latest ZHF :)

I decided to keep on looking if I can maybe create PRs for another couple of builds when/if I have the time to do so (with some success: #270233). Since ZHF is a thing, I assume such drive-by-PRs are still beneficial on the whole.

Since I'm still new to this: Is there some protocol to such fixes?
Do I simply tag the listed maintainers? Or will they be notified by some automation anyway?
Should I specify something along the lines of "I don't really care about this package, I'm simply trying to reduce Hydra failures"?

@infinisil
Copy link
Member

I decided to keep on looking if I can maybe create PRs for another couple of builds when/if I have the time to do so (with some success: #270233). Since ZHF is a thing, I assume such drive-by-PRs are still beneficial on the whole.

Awesome work! Yeah these contributions are always welcome. Generally I think we should prefer fixing more bugs over adding features. Though also some work can get rather tedious, so we should also invest some time in trying to improve automation (this generally requires rather deep knowledge though).

Since I'm still new to this: Is there some protocol to such fixes? Do I simply tag the listed maintainers? Or will they be notified by some automation anyway? Should I specify something along the lines of "I don't really care about this package, I'm simply trying to reduce Hydra failures"?

The listed maintainers should automatically be requested for a review if you use the <pkgname>: ... commit convention. But sometimes people effectively maintaining a package aren't listed in the maintainers field, so feel free to check the Git history to figure that out and ping other relevant people.

It can totally happen that PRs aren't getting reviewed though. It's not great but it's because Nixpkgs has a lot less reviewers than committers. So to combat that, I recommend reviewing others PRs, so that committers have an easier time merging it.

gepbird pushed a commit to gepbird/nixpkgs that referenced this issue Nov 27, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in NixOS#267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See NixOS#124556

Bash wrapper was introduced in NixOS#266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: NixOS#265948
gepbird pushed a commit to gepbird/nixpkgs that referenced this issue Nov 27, 2023
bazel_6 https://hydra.nixos.org/build/241090720/nixlog/1
```
external/upb/upb/upb.c:228:25: error: defining a type within '__builtin_offsetof' is a Clang extension [-Werror,-Wgnu-offsetof-extensions]
  n = UPB_ALIGN_DOWN(n, UPB_ALIGN_OF(upb_Arena));
                        ^~~~~~~~~~~~~~~~~~~~~~~
```
bazel_6 https://hydra.nixos.org/build/241127779/nixlog/1
```
In file included from external/com_google_absl/absl/algorithm/container.h:55:
external/com_google_absl/absl/meta/type_traits.h:560:8: error: builtin __has_trivial_assign is deprecated; use __is_trivially_assignable instead [-Werror,-Wdeprecated-builtins]
      (__has_trivial_assign(ExtentsRemoved) || !kIsCopyOrMoveAssignable) &&
       ^
```

Note: `bazel_5` and `bazel_4` require more work, for some reason extra
`-Wall` in combination with `-Werror` sneaks in and overrides `-Wno-`
settings, haven't managed yet to debug where exactly are the last
flags (last one wins) come from there.

ZHF: NixOS#265948
@azahi azahi mentioned this issue Nov 27, 2023
13 tasks
github-actions bot pushed a commit that referenced this issue Nov 30, 2023
bazel_6 https://hydra.nixos.org/build/241090720/nixlog/1
```
external/upb/upb/upb.c:228:25: error: defining a type within '__builtin_offsetof' is a Clang extension [-Werror,-Wgnu-offsetof-extensions]
  n = UPB_ALIGN_DOWN(n, UPB_ALIGN_OF(upb_Arena));
                        ^~~~~~~~~~~~~~~~~~~~~~~
```
bazel_6 https://hydra.nixos.org/build/241127779/nixlog/1
```
In file included from external/com_google_absl/absl/algorithm/container.h:55:
external/com_google_absl/absl/meta/type_traits.h:560:8: error: builtin __has_trivial_assign is deprecated; use __is_trivially_assignable instead [-Werror,-Wdeprecated-builtins]
      (__has_trivial_assign(ExtentsRemoved) || !kIsCopyOrMoveAssignable) &&
       ^
```

Note: `bazel_5` and `bazel_4` require more work, for some reason extra
`-Wall` in combination with `-Werror` sneaks in and overrides `-Wno-`
settings, haven't managed yet to debug where exactly are the last
flags (last one wins) come from there.

ZHF: #265948
(cherry picked from commit ed175a6)
@figsoda
Copy link
Member Author

figsoda commented Dec 1, 2023

NixOS 23.11 has been released on 2023-11-29, and ZHF has ended. We still greatly appreciate any fixes for broken packages, but I will go ahead and close this issue now.

@figsoda figsoda closed this as completed Dec 1, 2023
@figsoda figsoda unpinned this issue Dec 1, 2023
github-actions bot pushed a commit that referenced this issue Dec 7, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in #267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See #124556

Bash wrapper was introduced in #266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: #265948
(cherry picked from commit 7377bba)
bjornfor pushed a commit that referenced this issue Dec 7, 2023
https://hydra.nixos.org/build/240805256/nixlog/1
https://hydra.nixos.org/build/240805170/nixlog/2
Failure is a bit obscured but long story short, a script within
bazel gets custom nixpkgs shebang which in turn makes shell run
in POSIX-compatible mode. Bazel expects bash in non-POSIX mode
and osx-specific script starts to fail due to `set -e` and subshell
interaction differences in those modes (sub-shells and functions
suddently start inheriting `set -e` and fail to produce desired
output). More debug info is available in #267670

Shell scripts aren't guaranteed to work as interpreters in shebang.
In particular thin shell wrappers aren't shebang-ready on MacOS.
It may work sometimes depending on what exactly would try to execute
a script with such shebang, but generally it's not guaranteed to work.
See #124556

Bash wrapper was introduced in #266847 and so far seems like the
issue only affects darwin builds: hydra failure is in osx-specific
script, also shebang issue is usually darwin-specific.

Let's wrap it as a native binary to make it shebang-compatible.

The wrapper is only currently added to `bazel_6` so no need for
changes in other versions.

ZHF: #265948
(cherry picked from commit 7377bba)
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-is-not-dying-please-dont-spread-fear-actively/44310/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: ZHF Fixes Fixes during the ZHF campaign
Projects
None yet
Development

No branches or pull requests

8 participants