Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching strategy for stage1 #1416

Closed
andrewrk opened this issue Aug 26, 2018 · 6 comments
Closed

caching strategy for stage1 #1416

andrewrk opened this issue Aug 26, 2018 · 6 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. stage1 The process of building from source via WebAssembly and the C backend.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Aug 26, 2018

Caching Strategy

This strategy is used for builds. This will cause compiler_rt.o,
builtin.o, zig build scripts, and zig run targets to be cached.

Compilations have input parameters usually given by command line parameters. These are:

  • Root source file (SHA-256 hash digest of the source text and the file path goes into the hash)
  • build-exe, build-lib, or build-obj
  • -isystem, --name, --release-fast, -L, etc
  • All parameters that affect the output artifacts
    Based on the SHA-256 hash of these parameters, this directory exists
    (Replace ~/.local/share/zig with the appropriate OS-specific directory):
    ~/.local/share/zig/stage1/p30HdtbYqWz6O4RVkv99fxAupondYDfqK6VXjEacuSs/

Output artifacts (one or more of these):

  • ./out/foo.o
  • ./out/foo.obj
  • ./out/foo.h
  • ./out/foo.a
  • ./out/foo.lib
  • ./out/libfoo.so
  • ./out/foo
  • ./out/foo.exe
  • ./out/advapi32.def
  • ./out/builtin.zig
  • ./out/foo.s

Manifest file: ./manifest:

1535254248689 KkOxuk3PGQnmWYWyNq0_XtPLcY5O8G_qQk9CPWSI_DA std/index.zig
1535254254033 1gBYXMIgSfehmA6NaInk15eyr1aShyov0+FoeemmnOY std/os.zig
1535254248689 YgMaKTAxvDE-FPK06Yixf6F_lFIGJbakWZWkw_QXSo0 a_file_that_was_embedded.png
1535254254257 aHFdyXgeIdOpVF7eWOOFrOcv4jj12SV0KbA6akqQBzA ~/.local/share/zig/stage1/.../builtin.o
1535254254257 uyVIWREGOgrsLRSWOZB_xnmfFflX5Avxb0b4eiTqvss /usr/include/stdlib.h
1535254248689 KkOxuk3PGQnmWYWyNq0_XtPLcY5O8G_qQk9CPWSI_DA linker_script.ld
  • The lines are all the files that this source file depends on, recursively.
  • The sha-256 is computed based on the sha-256 digests of the components.
    this is so that if one of the files is modified, recomputing the hash
    involves only processing the bytes of the modified file.
  • When you get a cache hit using a file, bump the mtime so that we can use LRU eviction.
  • The paths are relative to the .zig source file path that this cache file represents
  • The first part is an mtime, second part is sha256. if the mtime on disk matches, then
    calculating the sha256 of that file can be skipped.

Cache Eviction

After adding files to the cache, calculate how many bytes were added.
Use file locking and use ~/.local/share/zig/stage1/size as the storage for the byte count.
If the byte count is greater than the configured cache size, look at all the directories
in ~/.local/share/zig/stage1/ and sort them by Least Recently Used on the mtime of the manifest
files. Delete entire directories until the byte count gets lower than the configured cache size.
Delete files from ~/.local/share/zig/stage1/ that are older than the oldest manifest mtime left.

Default cache size: 10GB, user-configurable.

Compiler Id Cache

Contents of ~/.local/share/zig/stage1/zig/manifest

1535254248689 RLoSvwnmxxFm5LKJ5fND1UJ6uVOG1M-kgY7IWlNAS1w /usr/bin/zig
1535254254257 uyVIWREGOgrsLRSWOZB_xnmfFflX5Avxb0b4eiTqvss /home/andy/dev/zig/build/zig
1535254248689 KkOxuk3PGQnmWYWyNq0_XtPLcY5O8G_qQk9CPWSI_DA /home/andy/dev/zig/build/bin/zig

Has mtimes and sha256 and file paths. The sha256 is the id for that compiler, and refers
to, e.g. ~/.local/share/zig/stage1/zig/RLoSvwnmxxFm5LKJ5fND1UJ6uVOG1M-kgY7IWlNAS1w
Use file locking when modifying this file.

Contents of ~/.local/share/zig/stage1/zig/RLoSvwnmxxFm5LKJ5fND1UJ6uVOG1M-kgY7IWlNAS1w

1535254248689 KkOxuk3PGQnmWYWyNq0_XtPLcY5O8G_qQk9CPWSI_DA /usr/lib/libLLVM-6.0.so
1535254254033 1gBYXMIgSfehmA6NaInk15eyr1aShyov0+FoeemmnOY /usr/lib/libpthread.so.0
1535254254257 uyVIWREGOgrsLRSWOZB_xnmfFflX5Avxb0b4eiTqvss /usr/lib/libstdc++.so.6

This is recursively, the dynamic libraries that zig executable links against.
Bump the mtime on this file as well, after bumping the manifest file mtime, on a cache hit.

Benefits

  • This strategy has no false negatives
  • This will allow us to turn builtin.o into zig libc (See implement libc in zig #514) and allow it to grow as large as it needs to without compromising compilation performance of general code, because it can sit in the cache unmodified.
  • zig build and zig run get especially faster
  • Generally, compilation gets faster because compiler_rt.o and builtin.o no longer have to be built every time. Running the zig tests will be much faster. (although we will have to add new tests to test the caching behavior). Building stage2 takes upwards of 10 seconds, and involves unnecessarily building compiler_rt.o twice, builtin.o twice, and build.zig once.
@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. stage1 The process of building from source via WebAssembly and the C backend. labels Aug 26, 2018
@andrewrk andrewrk added this to the 0.4.0 milestone Aug 26, 2018
@kristate
Copy link
Contributor

I think that we should use Blake2b over sha256 -- it should be more performant.

@isaachier
Copy link
Contributor

If you use this strategy for compilation targets in general, won't you have issues when you hit the cache limit and delete the oldest, but potentially most relied-upon libraries? Like if I have link all my targets to library A, but rarely recompile A from source, won't my project break depending on how cache eviction works?

andrewrk added a commit that referenced this issue Sep 5, 2018
@kristate
Copy link
Contributor

kristate commented Sep 5, 2018

It looks like work has begun in stage1 -- I am excited.

@andrewrk andrewrk mentioned this issue Sep 6, 2018
6 tasks
andrewrk added a commit that referenced this issue Sep 6, 2018
@bmeh
Copy link

bmeh commented Sep 8, 2018

That's funny, I was just thinking about asking why not replace SHA-256 with BLAKE2b BEFORE reading the comments. :)

Big thumbs up!

Regarding performance: https://ziglang.org/download/0.2.0/release-notes.html :D

Side-note: my favorites are ed25519, curve25519, poly1305, xsalsa20, blake2b, arc4random from OpenBSD, and of course constant-time functions including memzero, etc. I am not up-to-date with Zig's cryptography library, but I sure as hell will implement those in Zig when it's more mature!

@andrewrk
Copy link
Member Author

screenshot_2018-09-10_13-43-08

@andrewrk andrewrk mentioned this issue Sep 10, 2018
11 tasks
@andrewrk andrewrk added the accepted This proposal is planned. label Sep 10, 2018
@andrewrk
Copy link
Member Author

This landed a few weeks ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. stage1 The process of building from source via WebAssembly and the C backend.
Projects
None yet
Development

No branches or pull requests

4 participants