Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nuke the C++ implementation of Zig from orbit using WASI #13560

Merged
merged 68 commits into from
Dec 6, 2022

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Nov 16, 2022

The idea here is to use a small WASI binary as a stage1 kernel that is committed to source control and therefore can be used to build any commit from source. We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code. The C code is then compiled & linked, again by the system C compiler, into a stage2 binary. The stage2 binary can then be used repeatedly with zig build to build from source from that point on.

The WASI stage1 blob only needs to be updated when a breaking change or new feature affects the self-hosted compiler when building itself. For example, a bug fix that the self-hosted compiler does not trigger when building itself can be ignored. However, if the bug fix is required for zig to build itself, then the stage1 WASI blob needs to be updated. Similarly when the language is changed and the compiler wants to use the changes to build itself, the blob needs to be updated.

The WASI blob is produced with zig build update-zig1 which uses the LLVM backend to produce a ReleaseSmall binary that targets wasm32-wasi with a CPU of generic+bulk_memory. This produces a 2.6 MiB file. It is then optimized with wasm-opt -Oz --enable-bulk-memory bringing the total down to 2.4 MiB. Finally, it is compressed with zstd, bringing the total down to 655 KB. This is offset by the size of the zstd decoder implementation in C, however it is worth it because the zstd implementation will change rarely if ever, saving a total of 1.8 MiB every time the blob is updated.

I built this branch and master branch from source at the same time and got these results:

compiling from source with `ninja install`,
configured with `-DCMAKE_BUILD_TYPE=Release -DZIG_NO_LIB=ON`:

        master branch: 13m20s with 10.3 GiB peak RSS
wasi-bootstrap branch: 10m53s with  3.2 GiB peak RSS

Big thanks to @jacobly0 who has done significant work on the C backend to enable this possibility, as well as helping write the WASI interpreter and make it go fast, over in the external zig-wasi repo. In fact, after rewriting the interpreter a few times he figured out how to make it run even faster by translating the wasm code to C instead of interpreting it directly.

Closes #5246
Closes #6378
Closes #6485

Prerequisites:

These are already merged inside this branch:

Enhancements Needed

Merge blockers:

  • make zig1.c detect the host -target and pass correct flags to WASI argv
  • avoid hard-coding --color on in CMake
  • fix the -target parameter computation in CMake

Nice to have:

@mlvzk
Copy link

mlvzk commented Nov 16, 2022

A further enhancement may introduce zstd compression, bringing the total down to 655 KB

Idea: you could create and store a dictionary from builds for different commits, so then you can store say a 400 KB dictionary and a 255 KB (likely a lot less) compressed file that uses the dictionary. Only the 255 KB file would change between commits, new binaries would be compressed using the same dictionary.

@andrewrk andrewrk force-pushed the wasi-bootstrap branch 2 times, most recently from d6d0246 to 851a7a1 Compare November 16, 2022 06:06
@misanthrop
Copy link

Anyone who would like using it needs to download a C compiler. Why not downloading a "stable" version of Zig compiler instead?
C compilers have tons of dependencies, but Zig is a static binary, so it's more convenient for bootstrapping.
Currently Zig is available for Linux, FreeBSD, Windows and MacOS. Theoretically WASI might unlock compiling from other OS/CPUs. But what are these exotic OS/CPUs and who is going to use them for development instead of cross-compiling from one of supported platforms?

@squeek502
Copy link
Collaborator

squeek502 commented Nov 16, 2022

@misanthrop see #853 (comment) (and related issues like #5246, #6378)

@misanthrop
Copy link

@misanthrop see #853 (comment) (and related issues like #5246, #6378)

How the huge WASM blob differs from just a binary blob for package maintainers? Do you think it will be easy to spot a virus injection in the 2.4 MiB of WASM?

@AntonioNoack
Copy link

AntonioNoack commented Nov 16, 2022

@misanthrop see #853 (comment) (and related issues like #5246, #6378)

Do you think it will be easy to spot a virus injection in the 2.4 MiB of WASM?

No, WASM is (virtual) machine code. Reading it needs decoding first, and then it's just assembly. Checking > 2.4 MB of assembly sounds hard.

@pfgithub
Copy link
Contributor

How the huge WASM blob differ from just a binary blob for package maintainers?

It does not. The plan for now is to temporarily make it harder for package maintainers to bootstrap zig: #6378

Using wasm instead of a binary blob was chosen because wasm is platform-independent, so only one wasm blob is needed to build zig on any platform from source.

@lambdadog
Copy link

lambdadog commented Nov 20, 2022

To clarify, #13383 is still blocked by #6025 but is already merged into this branch, is #6025 considered a blocker for this PR as a result or is the current stage1 planned to be removed before async is in stage2?

andrewrk added a commit that referenced this pull request Nov 22, 2022
This is to work around OOM on the CI server. Once #13560 is complete,
we can avoid having to replace the tarballs so often.
@andrewrk andrewrk force-pushed the wasi-bootstrap branch 2 times, most recently from 5917f81 to 0f165f0 Compare November 23, 2022 01:59
@andrewrk andrewrk force-pushed the wasi-bootstrap branch 16 times, most recently from 1e55972 to e69e2eb Compare December 4, 2022 23:52
andrewrk and others added 15 commits December 6, 2022 12:27
In the CI system, I copied the old tarball and then applied
05c21a2 to its compiler_rt
implementation.

After this is verified we can drop this commit and regenerate the
tarballs from a master branch commit.
In particular, these two changes are relevant:

 * zig cc: support -stack in addition to --stack for linker arg
   - Fixes stack overflow when running zig2 on aarch64-macos.
 * compiler_rt: avoid using weak aliases
   - Fixes duplicate symbol when linking zig2 on aarch64-linux.
I messed up the spelling of '-stack_size' making it '-stack' instead.
Will need to fix on master branch. But let's test this here before
making another master branch commit.
On windows we get:

    lld-link: error: undefined symbol: __stack_chk_fail
    >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(main)
    >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(main_main)
    >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(log_scoped_28_default_29_err__anon_2764)
    >>> referenced 36192 more times

    lld-link: error: undefined symbol: __stack_chk_guard
    >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(.refptr.__stack_chk_guard)
    >>> referenced by CMakeFiles/zig2.dir/compiler_rt.c.obj
This commit chickens out and reverts
02456a3, leaving it for a future
enhancement.
This commit adds a 637 KB binary file to the source repository. This
commit does nothing else, so it should be replaced with a different
commit before this branch is merged to avoid bloating the git
repository.
@andrewrk andrewrk merged commit e7d2834 into master Dec 6, 2022
@andrewrk andrewrk deleted the wasi-bootstrap branch December 6, 2022 23:52
@kangalio
Copy link

kangalio commented Dec 7, 2022

If the goal of the multi-stage, C-based bootstrap process is to prevent Trusting Trust attacks, how it it ok to use an opaque WASM blob as stage1? That blob can't be verified manually, and to recreate it, you already need a working Zig installation. How is this cycle resolved?

@ghost
Copy link

ghost commented Dec 7, 2022

If the goal of the multi-stage, C-based bootstrap process is to prevent Trusting Trust attacks,

It's a goal. There are others. The full rationale is explained here.

@ghost
Copy link

ghost commented Dec 8, 2022

Good to go. continue to improve it.

jacobly0 pushed a commit to jacobly0/zig that referenced this pull request Dec 10, 2022
Before, --color on would affect colored compile error printing but not
affect terminal progress bar printing. It was intended for this option
to affect both; now it does.

This causes a failure when building the language reference, which
contains code for parsing terminal output and rendering HTML. Now it
must be expanded to handle 'K' and 'D' codes to simulate a terminal
cursor moving, and the CI will fail until that capability is added in a
later commit of this branch.

I extracted this change from ziglang#13560 so that the idea is not lost but we
can solve this issue separately.
Vexu pushed a commit that referenced this pull request Jan 3, 2023
Before, --color on would affect colored compile error printing but not
affect terminal progress bar printing. It was intended for this option
to affect both; now it does.

This causes a failure when building the language reference, which
contains code for parsing terminal output and rendering HTML. Now it
must be expanded to handle 'K' and 'D' codes to simulate a terminal
cursor moving, and the CI will fail until that capability is added in a
later commit of this branch.

I extracted this change from #13560 so that the idea is not lost but we
can solve this issue separately.
andrewrk added a commit that referenced this pull request Jan 9, 2023
This is to work around OOM on the CI server. Once #13560 is complete,
we can avoid having to replace the tarballs so often.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet