Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal/bootstrappable Linux stdenv #123095

Open
siraben opened this issue May 15, 2021 · 13 comments
Open

Minimal/bootstrappable Linux stdenv #123095

siraben opened this issue May 15, 2021 · 13 comments
Labels
0.kind: enhancement Add something new 1.severity: mass-rebuild This PR causes a large number of packages to rebuild 6.topic: reproducible builds 6.topic: stdenv Standard environment

Comments

@siraben
Copy link
Member

siraben commented May 15, 2021

Motivation

Currently, NixOS relies on a 130 MB (uncompressed) bootstrap for x86_64-linux. Thus, there is quite a big trusted computing base. With years of effort accumulated in projects such as live-bootstrap, it appears feasible to replace the Linux stdenv with a far smaller base of around 1 KB, while retaining the latest versions of autotools, bash, gcc and so on to bootstrap the rest of Nixpkgs.

See also bootstrap seed reduction carried out in Guix[0].

References

[0] https://guix.gnu.org/blog/2020/guix-further-reduces-bootstrap-seed-to-25/

@siraben siraben added 0.kind: enhancement Add something new 1.severity: mass-rebuild This PR causes a large number of packages to rebuild 6.topic: reproducible builds 6.topic: stdenv Standard environment labels May 15, 2021
@siraben
Copy link
Member Author

siraben commented May 15, 2021

This is by no means complete, but I have been able to build several of the base tools for live-bootstrap using Nix, see https://github.com/siraben/mes-overlay/tree/master/pkgs

@delroth
Copy link
Contributor

delroth commented May 15, 2021

A good reason to do this too is that a smaller bootstrap seed could be stored in nixpkgs itself instead of requiring someone (presumably with special permissions) to host a new binary bootstrap tarball.

@gytis-ivaskevicius
Copy link
Contributor

Here are few risks:

  • Support for other architectures (riscv is prob not supported, not sure bout arm)
  • For more exotic stuff like osx/windows/bsd we would still rely on the same tarballs?
  • There is a lot of work involved to bootstrap something, reproducibility probably would be a pain

@siraben
Copy link
Member Author

siraben commented May 26, 2021

Support for other architectures (riscv is prob not supported, not sure bout arm)

The same project also has the ARM bootstrap WIP, see for instance the seeds.

For more exotic stuff like osx/windows/bsd we would still rely on the same tarballs?

Indeed, this just for Linux bootstrap, but shouldn't affect the others much, since we separate stdenvs anyway.

There is a lot of work involved to bootstrap something, reproducibility probably would be a pain

The bootstrappable people put a lot of emphasis on reproducibility for their stages (up to and not including GCC), so I think this would help.

EDIT: for now, only the x86-linux bootstrap is demonstrably mature enough to be a viable replacement for our current tarball

@stikonas
Copy link

stikonas commented May 26, 2021

Right no even x86-linux bootstrap in live-bootstrap project is probably not mature enough, but it's getting there.
live-bootstrap can reproducibly bootstrap GCC 4.0.4 (C only) without using any pre-generated stuff (bison parsers, configure scripts from autotools, etc...). But we don't yet have newer GCC with C++.

Other arches are indeed less advanced. But that's probably not important, can do each arch separately.

And there are two choices:

  1. Integrate it inside nixpks.
  2. Build a bit of scripting on top of live-bootstrap to reproducibly create stdenv tarball.

@melg8
Copy link

melg8 commented May 26, 2021

If anybody interested, i had my own take on doing this (this is not production ready code, just an experiment) - we can build at least until M2-planet - without using stdenv/bash from host, just by using kaem-optional-seed as builder, generate script for it using nix and go from there.

See results and raw derivation

Problems are:

  • nix want derivation to have output, but initial kaem seed cant get value from $out, but it can pass env to it's childs - so we build until kaem from c is available - call it and than use it to copy outputs to $out.
  • we need mkdir so we don't repeat build for each of binaries
  • we need cp or catm + chomd_x (because catm can't preserve executable bit for files) so i added them as builtins for kaem source. but it can be added as separate tools from mescc-tools-extra.

From there - we have derivation which produces executables which can:

  • read env
  • set it's own env variables
  • save to $out
  • copy stuff around on per file bases
  • build tcc (and build small subset of c code)

nix can be used to generate different flavors of kaem scripts for each of arch.

For me questions are:

  • how we improve bootstrapping nix story without making users/hydra unhappy with build times
  • should be bootsrapping efforts strive to use as few as possible (like no stdenv/bash/coreutils from host system inside derivations), or just "emulation" of bootstrap process.
  • should it be split in nix packages, or used adjusted variant of what's already done in live-bootstrap project.
  • should it be applied to nixpkgs on side of make-bootstrap-tools.nix, or on the side of stageFun where stdenv creates from bootstrap tools?
  • should we make process in such way, so we could create no-nix version, which do same steps, but using just scripts - so it can be verified on trusted computing base not including nix itself at all.
  • what about licenses, because almost all code involved is GPL-3.0, and patches/kaem scripts from live-bootstrap - also under GPL, but nixpkgs have MIT license.
  • why this approach for using tcc in bootstrap by @edolstra didn't make to master?
  • should we instead take guix approach/path to bootstrapping?

Reading materials:

@davidak
Copy link
Member

davidak commented May 19, 2022

A cheap way for us to achieve a verifiable bootstrap seed would be to use GUIX to generate it. That might be less work than implementing full source bootstrap in nixpkgs. It can be replaced later.

This seem to be used to generate the bootstrap-tools in bootstrap-files/x86_64.nix.

Do you think that would make sense or should we just implement full source bootstrap as GUIX does?

It would be great to see some progress here!

@siraben
Copy link
Member Author

siraben commented Apr 26, 2023

Guix SD has achieved full source bootstrapping: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/

@siraben
Copy link
Member Author

siraben commented Jun 26, 2023

For those reading this issue, check out the PRs linked in #227914 to see what packages have been added via the minimal bootstrap. Great progress so far, with expansion to other platforms planned!

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/aux-foundational-packages/46707/4

@gytis-ivaskevicius
Copy link
Contributor

Has anyone looked into bootstrapping nixpkgs using zig? Sure, tarball would not be small enough to be committed into the repository but I feel like that would be a good solution all around, and it would not involve creating so many derivations to get a basic compiler

@06kellyjac
Copy link
Member

Well the initial post of the issue covers that. We could live with a large bootstrap tarball containing a relatively modern copy of gcc/clang/zig but that's then what we have to trust completely.

The objective is to have an auditable bootstrap from the smallest and most understandable binary blob.

The bootstrap process from hex0 is a bit arduous but that's also simply what it takes to rebuild to world from scratch.
That's not to say the current process can not be simplified further but there will always be a long process to get from a tiny mostly readable binary to a modern gcc/clang suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: enhancement Add something new 1.severity: mass-rebuild This PR causes a large number of packages to rebuild 6.topic: reproducible builds 6.topic: stdenv Standard environment
Projects
None yet
Development

No branches or pull requests

8 participants