-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nuke the C++ implementation of Zig from orbit using WASI #13560
Conversation
Idea: you could create and store a dictionary from builds for different commits, so then you can store say a 400 KB dictionary and a 255 KB (likely a lot less) compressed file that uses the dictionary. Only the 255 KB file would change between commits, new binaries would be compressed using the same dictionary. |
d6d0246
to
851a7a1
Compare
Anyone who would like using it needs to download a C compiler. Why not downloading a "stable" version of Zig compiler instead? |
@misanthrop see #853 (comment) (and related issues like #5246, #6378) |
How the huge WASM blob differs from just a binary blob for package maintainers? Do you think it will be easy to spot a virus injection in the 2.4 MiB of WASM? |
No, WASM is (virtual) machine code. Reading it needs decoding first, and then it's just assembly. Checking > 2.4 MB of assembly sounds hard. |
It does not. The plan for now is to temporarily make it harder for package maintainers to bootstrap zig: #6378 Using wasm instead of a binary blob was chosen because wasm is platform-independent, so only one wasm blob is needed to build zig on any platform from source. |
660965a
to
5598c0d
Compare
This is to work around OOM on the CI server. Once #13560 is complete, we can avoid having to replace the tarballs so often.
5917f81
to
0f165f0
Compare
1e55972
to
e69e2eb
Compare
Missing change from bcd4ea9
In the CI system, I copied the old tarball and then applied 05c21a2 to its compiler_rt implementation. After this is verified we can drop this commit and regenerate the tarballs from a master branch commit.
In particular, these two changes are relevant: * zig cc: support -stack in addition to --stack for linker arg - Fixes stack overflow when running zig2 on aarch64-macos. * compiler_rt: avoid using weak aliases - Fixes duplicate symbol when linking zig2 on aarch64-linux.
I messed up the spelling of '-stack_size' making it '-stack' instead. Will need to fix on master branch. But let's test this here before making another master branch commit.
On windows we get: lld-link: error: undefined symbol: __stack_chk_fail >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(main) >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(main_main) >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(log_scoped_28_default_29_err__anon_2764) >>> referenced 36192 more times lld-link: error: undefined symbol: __stack_chk_guard >>> referenced by CMakeFiles/zig2.dir/zig2.c.obj:(.refptr.__stack_chk_guard) >>> referenced by CMakeFiles/zig2.dir/compiler_rt.c.obj
This commit chickens out and reverts 02456a3, leaving it for a future enhancement.
This commit adds a 637 KB binary file to the source repository. This commit does nothing else, so it should be replaced with a different commit before this branch is merged to avoid bloating the git repository.
0a416a3
to
20d86d9
Compare
If the goal of the multi-stage, C-based bootstrap process is to prevent Trusting Trust attacks, how it it ok to use an opaque WASM blob as stage1? That blob can't be verified manually, and to recreate it, you already need a working Zig installation. How is this cycle resolved? |
It's a goal. There are others. The full rationale is explained here. |
Good to go. continue to improve it. |
Before, --color on would affect colored compile error printing but not affect terminal progress bar printing. It was intended for this option to affect both; now it does. This causes a failure when building the language reference, which contains code for parsing terminal output and rendering HTML. Now it must be expanded to handle 'K' and 'D' codes to simulate a terminal cursor moving, and the CI will fail until that capability is added in a later commit of this branch. I extracted this change from ziglang#13560 so that the idea is not lost but we can solve this issue separately.
Before, --color on would affect colored compile error printing but not affect terminal progress bar printing. It was intended for this option to affect both; now it does. This causes a failure when building the language reference, which contains code for parsing terminal output and rendering HTML. Now it must be expanded to handle 'K' and 'D' codes to simulate a terminal cursor moving, and the CI will fail until that capability is added in a later commit of this branch. I extracted this change from #13560 so that the idea is not lost but we can solve this issue separately.
This is to work around OOM on the CI server. Once #13560 is complete, we can avoid having to replace the tarballs so often.
The idea here is to use a small WASI binary as a stage1 kernel that is committed to source control and therefore can be used to build any commit from source. We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code. The C code is then compiled & linked, again by the system C compiler, into a stage2 binary. The stage2 binary can then be used repeatedly with
zig build
to build from source from that point on.The WASI stage1 blob only needs to be updated when a breaking change or new feature affects the self-hosted compiler when building itself. For example, a bug fix that the self-hosted compiler does not trigger when building itself can be ignored. However, if the bug fix is required for zig to build itself, then the stage1 WASI blob needs to be updated. Similarly when the language is changed and the compiler wants to use the changes to build itself, the blob needs to be updated.
The WASI blob is produced with
zig build update-zig1
which uses the LLVM backend to produce aReleaseSmall
binary that targetswasm32-wasi
with a CPU ofgeneric+bulk_memory
. This produces a 2.6 MiB file. It is then optimized withwasm-opt -Oz --enable-bulk-memory
bringing the total down to 2.4 MiB. Finally, it is compressed with zstd, bringing the total down to 655 KB. This is offset by the size of the zstd decoder implementation in C, however it is worth it because the zstd implementation will change rarely if ever, saving a total of 1.8 MiB every time the blob is updated.I built this branch and master branch from source at the same time and got these results:
Big thanks to @jacobly0 who has done significant work on the C backend to enable this possibility, as well as helping write the WASI interpreter and make it go fast, over in the external zig-wasi repo. In fact, after rewriting the interpreter a few times he figured out how to make it run even faster by translating the wasm code to C instead of interpreting it directly.
Closes #5246
Closes #6378
Closes #6485
Prerequisites:
These are already merged inside this branch:
-fstage1
option #13383Enhancements Needed
Merge blockers:
-target
and pass correct flags to WASI argv--color on
in CMakeNice to have: