-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: namespace type equivalence based on AST node + captures #18816
Comments
Thanks for the writeup. Looking at this with fresh eyes, I'm confident to move forward with it. |
I don't see any real issue with this proposal, the only unfortunate side-effect I wanted to have pointed out somewhere is that by using the AST node as key, const E = enum{a, b};
fn foo(comptime e: E) type { //foo(.a) != foo(.b) will still hold under new semantics
return switch(e) {
.a => opaque {},
.b => opaque {},
};
}
fn bar(comptime e: E) type { //new semantics will make bar(.a) == bar(.b), unless e is used within it
return switch(e) {
inline else => opaque {},
};
} It would be possible to tweak the definition (inserting another synthetic key) to retain this equivalence, |
Yeah, if you need to generate distinct opaques without methods like that for some reason you could just do a |
Something like this I'm assuming? (for posterity) fn Bar(comptime e: E) type {
return opaque {
comptime {
_ = e;
}
};
} |
Yup, exactly. You don't need to really think about the technicalities of captures; you just ask yourself "would this struct have different semantics / a different meaning across these two instantiations?" |
I wonder if this will end up reducing binary size |
An open question heree: how does this interact with reification (
So, for instance, in the following, const A = struct { x: u32 };
const B = struct { y: u32 };
fn Foo(comptime T: type) type {
// Equivalent to `struct { x: usize = @sizeOf(T) }`, which would *not* be equivalent for distinct T
// However, `const sz = @sizeOf(T); return struct { x: usize = sz };` would be
return @Type(.{ .Struct = .{
.layout = .Auto,
.fields = &.{.{
.name = "x",
.type = usize,
.default_value = @sizeOf(T),
.is_comptime = false,
.alignment = 0,
}},
.decls = &.{},
.is_tuple = false,
} });
} This is what I'm going to go with for now, but of course, this is open to discussion (I'm interested to hear your thoughts @andrewrk). |
This changes the representation of closures in Zir and Sema. Rather than a pair of instructions `closure_capture` and `closure_get`, the system now works as follows: * Each ZIR type declaration (`struct_decl` etc) contains a list of captures in the form of ZIR indices (or, for efficiency, direct references to parent captures). This is an ordered list; indexes into it are used to refer to captured values. * The `extended(closure_get)` ZIR instruction refers to a value in this list via a 16-bit index (limiting this index to 16 bits allows us to store this in `extended`). * `Module.Namespace` has a new field `captures` which contains the list of values captured in a given namespace. This is initialized based on the ZIR capture list whenever a type declaration is analyzed. This change eliminates `CaptureScope` from semantic analysis, which is a nice simplification; but the main motivation here is that this change is a prerequisite for ziglang#18816.
These were previously associated with the type's namespace, but we need to store them directly in the InternPool for ziglang#18816.
This implements the accepted proposal ziglang#18816. Namespace-owning types (struct, enum, union, opaque) are no longer unique whenever analysed; instead, their identity is determined based on their AST node and the set of values they capture. Reified types (`@Type`) are deduplicated based on the structure of the type created. For instance, if two structs are created by the same reification with identical fields, layout, etc, they will be the same type. This commit does not produce a working compiler; the next commit, adding captures for decl references, is necessary. It felt appropriate to split this up. Resolves: ziglang#18816
…tures This fixes an issue with the implementation of ziglang#18816. Consider the following code: ```zig pub fn Wrap(comptime T: type) type { return struct { pub const T1 = T; inner: struct { x: T1 }, }; } ``` Previously, the type of `inner` was not considered to be "capturing" any value, as `T1` is a decl. However, since it is declared within a generic function, this decl reference depends on the context, and thus should be treated as a capture. AstGen has been augmented to tunnel references to decls through closure when the decl was declared in a potentially-generic context (i.e. within a function).
I changed an error messages and fixed a minor bug while implementing this proposal, which led to a few compile error cases failing.
This changes the representation of closures in Zir and Sema. Rather than a pair of instructions `closure_capture` and `closure_get`, the system now works as follows: * Each ZIR type declaration (`struct_decl` etc) contains a list of captures in the form of ZIR indices (or, for efficiency, direct references to parent captures). This is an ordered list; indexes into it are used to refer to captured values. * The `extended(closure_get)` ZIR instruction refers to a value in this list via a 16-bit index (limiting this index to 16 bits allows us to store this in `extended`). * `Module.Namespace` has a new field `captures` which contains the list of values captured in a given namespace. This is initialized based on the ZIR capture list whenever a type declaration is analyzed. This change eliminates `CaptureScope` from semantic analysis, which is a nice simplification; but the main motivation here is that this change is a prerequisite for ziglang#18816.
These were previously associated with the type's namespace, but we need to store them directly in the InternPool for ziglang#18816.
…tures This fixes an issue with the implementation of ziglang#18816. Consider the following code: ```zig pub fn Wrap(comptime T: type) type { return struct { pub const T1 = T; inner: struct { x: T1 }, }; } ``` Previously, the type of `inner` was not considered to be "capturing" any value, as `T1` is a decl. However, since it is declared within a generic function, this decl reference depends on the context, and thus should be treated as a capture. AstGen has been augmented to tunnel references to decls through closure when the decl was declared in a potentially-generic context (i.e. within a function).
I changed an error messages and fixed a minor bug while implementing this proposal, which led to a few compile error cases failing.
This changes the representation of closures in Zir and Sema. Rather than a pair of instructions `closure_capture` and `closure_get`, the system now works as follows: * Each ZIR type declaration (`struct_decl` etc) contains a list of captures in the form of ZIR indices (or, for efficiency, direct references to parent captures). This is an ordered list; indexes into it are used to refer to captured values. * The `extended(closure_get)` ZIR instruction refers to a value in this list via a 16-bit index (limiting this index to 16 bits allows us to store this in `extended`). * `Module.Namespace` has a new field `captures` which contains the list of values captured in a given namespace. This is initialized based on the ZIR capture list whenever a type declaration is analyzed. This change eliminates `CaptureScope` from semantic analysis, which is a nice simplification; but the main motivation here is that this change is a prerequisite for ziglang#18816.
These were previously associated with the type's namespace, but we need to store them directly in the InternPool for ziglang#18816.
This implements the accepted proposal ziglang#18816. Namespace-owning types (struct, enum, union, opaque) are no longer unique whenever analysed; instead, their identity is determined based on their AST node and the set of values they capture. Reified types (`@Type`) are deduplicated based on the structure of the type created. For instance, if two structs are created by the same reification with identical fields, layout, etc, they will be the same type. This commit does not produce a working compiler; the next commit, adding captures for decl references, is necessary. It felt appropriate to split this up. Resolves: ziglang#18816
…tures This fixes an issue with the implementation of ziglang#18816. Consider the following code: ```zig pub fn Wrap(comptime T: type) type { return struct { pub const T1 = T; inner: struct { x: T1 }, }; } ``` Previously, the type of `inner` was not considered to be "capturing" any value, as `T1` is a decl. However, since it is declared within a generic function, this decl reference depends on the context, and thus should be treated as a capture. AstGen has been augmented to tunnel references to decls through closure when the decl was declared in a potentially-generic context (i.e. within a function).
I changed an error messages and fixed a minor bug while implementing this proposal, which led to a few compile error cases failing.
I changed an error messages and fixed a minor bug while implementing this proposal, which led to a few compile error cases failing.
Background
In status quo Zig, namespace types are unique. Whenever a
struct { ... }
,union { ... }
,enum { ... }
, oropaque { ... }
is encountered, a new type - distinct from all others - is created. Broadly speaking, this is a reasonable design, but it has some unfortunate consequences.Firstly, it can result in overinstantiation. Consider the following code:
Here, we would naturally expect that there is only one instantion of
f
, and that it is called twice. However, sincef
is a generic function, all generic parameter types are evaluated at every call site. That means the parameter type for the first call tof
is a structurally equivalent but distinct type to the second call, because thestruct { y: T }
declaration is evaluated separately each time. This can result in significant over-instantiation if code is written like this.Secondly, this feature means that comptime call memoization has an impact on language semantics, and must therefore be codified into the language specification. This can be seen by considering the following trivial code:
It is important that this assertion passes, since this is fundamental to Zig's system of generics; and with memoization (status quo), it does pass. However, if calls were not memoized, the assertion would fail, as each call to
Empty
would create a distinct type. Having memoization impact language semantics complicates the specification, particularly since we can't memoize function calls whose arguments contain references to comptime-mutable memory (since such memory may be mutated by the function).Lastly, this design has unfortunate consequences for incremental compilation. Consider the following code:
When performing an incremental update of
g
, we wish to correlate eachstruct
declaration in this code with the declarations in the previous code. However, there isn't an easy way of doing that at the minute. We could do it based on AST node, but if these structs are created by a function call, the AST node might be the same, making the types distinct, and then there truly is no easy way for the compiler to figure out how to correlate these.Proposal
Define namespace types (
struct
etc) to be equivalent to any types defined at the same AST node with the same set of captures. Note that to do this, lazy capture values must be resolved, so this is a potentially breaking change.Captures are the only way a type with the same AST node can structurally differ; capturing values derived from generic arguments may allow field types / init values / etc to differ between evaluations of one source declaration. However, if we include these captures in the "key" for the equivalence relation, we get a consistent definition of equivalent struct types. This solves all of the problems listed above:
T
matches, so the same instantiation will be used.Empty
were not memoized, it would return the same type (there are no captures), so memoization is no longer required to be a part of the language specification.TrackedInst
abstraction) and captured values, and update it as needed.This proposal has been accepted by @andrewrk for now, but of course is open for reconsideration if it raises issues. However, I can't think of any potential problems; as far as I can tell, this proposal should be a strict win for language simplicity, binary sizes, and compilation speed.
The text was updated successfully, but these errors were encountered: