perform AstGen on whole files at once (AST->ZIR) #8516
Labels
accepted
This proposal is planned.
breaking
Implementing this issue could cause existing code to no longer compile or have different behavior.
enhancement
Solving this issue will likely involve adding new logic or components to the codebase.
frontend
Tokenization, parsing, AstGen, Sema, and Liveness.
proposal
This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone
This is a language proposal as well as a concrete plan for how to implement it. It solves #335 and goes a long way towards making the problematic issue #3028 unneeded. The implementation plan simplifies the compiler and yet opens up straightforward opportunities for parallelism and caching.
In stage2 we have a concept of "AstGen" which stands for Abstract Syntax Tree Generation. This is the part where we input an AST and output Zig Intermediate Representation code.
Currently, this is done lazily as-needed per Decl (top level declaration). This requires code to orchestrate per-Decl ZIR code and independently manage memory lifetimes. It also means each Decl uses independent arrays of ZIR tags, instruction lists, string tables, and auxiliary lists. When a file is modified, the compiler checks which Decl source bytes differ, and repeats AstGen for the changed Decls to generate updated ZIR code.
One key design strategy is to make ZIR code immutable, typeless, and depend only on AST. This ensures that it can be re-used for multiple generic instantiations, comptime function calls, and inlined function calls.
This proposal takes that design strategy, and observes that it is possible to generate ZIR for an entire file indiscriminately, for all Decls, depending on AST alone and not introducing any type checking. Furthermore, it observes that this allows implementing the following compile errors:
All of these compile errors are possible with AstGen alone, and do not require types. In fact, trying to implement these compile errors with types is problematic because of conditional compilation. But there is no conditional compilation with AstGen. Doing entire files at once would make it possible to have compile errors for unused private functions and globals.
With the way that ZIR is encoded, doing all of a file into one piece of ZIR code is less overhead than splitting it by Decl. Less overhead of list capacity is wasted, and more strings in the string table will be shared.
This works great for caching. All source files independently need to be converted to ZIR, and once converted to ZIR, the original source, token list, and AST node list are all no longer needed. The relevant bytes will be stored directly in ZIR. So each .zig source file will have exactly one corresponding ZIR bytecode. It's easy to imagine a caching strategy for this. Consider also that the transformation from .zig to ZIR does not depend on the target options, or anything, other than the AST. So cached ZIR for std lib files and common used packages can be re-used between unrelated projects.
Furthermore, thanks to #2206, the compiler can optimistically look for all .zig source files in a project, and parallelize each tokenize->parse->ZIR transformation. The caching system can notice when .zig source files are unchanged, and load the .ZIR code directly instead of the source, skipping tokenization, parsing, and AstGen entirely, on a per-file basis. The AST tree would only need to be loaded in order to report compile errors.
Serialization of ZIR in binary form is straightforward. It consists only of:
Writing/reading this to/from a file is trivial.
The text was updated successfully, but these errors were encountered: