Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: namespace type equivalence based on AST node + captures #18816

Closed
mlugg opened this issue Feb 4, 2024 · 7 comments
Closed

Proposal: namespace type equivalence based on AST node + captures #18816

mlugg opened this issue Feb 4, 2024 · 7 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mlugg
Copy link
Member

mlugg commented Feb 4, 2024

Background

In status quo Zig, namespace types are unique. Whenever a struct { ... }, union { ... }, enum { ... }, or opaque { ... } is encountered, a new type - distinct from all others - is created. Broadly speaking, this is a reasonable design, but it has some unfortunate consequences.

Firstly, it can result in overinstantiation. Consider the following code:

fn f(comptime T: type, x: struct { y: T }) void {
    _ = x;
}
test {
    f(u32, .{ .y = 1 });
    f(u32, .{ .y = 1 });
}

Here, we would naturally expect that there is only one instantion of f, and that it is called twice. However, since f is a generic function, all generic parameter types are evaluated at every call site. That means the parameter type for the first call to f is a structurally equivalent but distinct type to the second call, because the struct { y: T } declaration is evaluated separately each time. This can result in significant over-instantiation if code is written like this.

Secondly, this feature means that comptime call memoization has an impact on language semantics, and must therefore be codified into the language specification. This can be seen by considering the following trivial code:

fn Empty() type {
    return struct {};
}
comptime {
    if (Empty() != Empty()) unreachable; // assertion failure
}

It is important that this assertion passes, since this is fundamental to Zig's system of generics; and with memoization (status quo), it does pass. However, if calls were not memoized, the assertion would fail, as each call to Empty would create a distinct type. Having memoization impact language semantics complicates the specification, particularly since we can't memoize function calls whose arguments contain references to comptime-mutable memory (since such memory may be mutated by the function).

Lastly, this design has unfortunate consequences for incremental compilation. Consider the following code:

fn g() void {
    _ = struct {};
    _ = struct {};
}

When performing an incremental update of g, we wish to correlate each struct declaration in this code with the declarations in the previous code. However, there isn't an easy way of doing that at the minute. We could do it based on AST node, but if these structs are created by a function call, the AST node might be the same, making the types distinct, and then there truly is no easy way for the compiler to figure out how to correlate these.

Proposal

Define namespace types (struct etc) to be equivalent to any types defined at the same AST node with the same set of captures. Note that to do this, lazy capture values must be resolved, so this is a potentially breaking change.

Captures are the only way a type with the same AST node can structurally differ; capturing values derived from generic arguments may allow field types / init values / etc to differ between evaluations of one source declaration. However, if we include these captures in the "key" for the equivalence relation, we get a consistent definition of equivalent struct types. This solves all of the problems listed above:

  • Generic calls will still re-evaluate their parameter types, but they will resolve to the same type if the captured field type T matches, so the same instantiation will be used.
  • If Empty were not memoized, it would return the same type (there are no captures), so memoization is no longer required to be a part of the language specification.
  • During an incremental update, the compiler can find an existing instantiation based on ZIR instruction index (already tracked by the TrackedInst abstraction) and captured values, and update it as needed.

This proposal has been accepted by @andrewrk for now, but of course is open for reconsideration if it raises issues. However, I can't think of any potential problems; as far as I can tell, this proposal should be a strict win for language simplicity, binary sizes, and compilation speed.

@mlugg mlugg added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. accepted This proposal is planned. labels Feb 4, 2024
@andrewrk andrewrk added this to the 0.13.0 milestone Feb 4, 2024
@andrewrk
Copy link
Member

andrewrk commented Feb 4, 2024

Thanks for the writeup. Looking at this with fresh eyes, I'm confident to move forward with it.

@rohlem
Copy link
Contributor

rohlem commented Feb 4, 2024

I don't see any real issue with this proposal, the only unfortunate side-effect I wanted to have pointed out somewhere is that by using the AST node as key, inline expansions (mainly inline for loops, inline else cases and the bodies of inline fn), which only appear in AST once AFAIK, no longer act exactly the same as when unrolled by hand:

const E = enum{a, b};
fn foo(comptime e: E) type { //foo(.a) != foo(.b) will still hold under new semantics
 return switch(e) {
  .a => opaque {},
  .b => opaque {},
 };
}
fn bar(comptime e: E) type { //new semantics will make bar(.a) == bar(.b), unless e is used within it
 return switch(e) {
  inline else => opaque {},
 };
}

It would be possible to tweak the definition (inserting another synthetic key) to retain this equivalence,
but it's probably not worth complicating the implementation just for this regularity edge-case.

@mlugg
Copy link
Member Author

mlugg commented Feb 4, 2024

Yeah, if you need to generate distinct opaques without methods like that for some reason you could just do a comptime { _ = differentiating_val; } within them. Definitely a good point though!

@InKryption
Copy link
Contributor

Yeah, if you need to generate distinct opaques without methods like that for some reason you could just do a comptime { _ = differentiating_val; } within them. Definitely a good point though!

Something like this I'm assuming? (for posterity)

fn Bar(comptime e: E) type {
    return opaque {
        comptime {
            _ = e;
        }
    };
}

@mlugg
Copy link
Member Author

mlugg commented Feb 5, 2024

Yup, exactly. You don't need to really think about the technicalities of captures; you just ask yourself "would this struct have different semantics / a different meaning across these two instantiations?"

@Jarred-Sumner
Copy link
Contributor

I wonder if this will end up reducing binary size

@mlugg
Copy link
Member Author

mlugg commented Feb 9, 2024

An open question heree: how does this interact with reification (@Type)? In my mind, the obvious thing is that two reified types are to be equivalent if they:

  • Are created at the same AST node
  • Are the same "kind" of type (struct vs union vs etc)
  • Have matching attributes, field types, default values, etc
    • Note that this does require resolving lazy default field values, so is again a breaking change

So, for instance, in the following, Foo(A) == Foo(B) despite A != B:

const A = struct { x: u32 };
const B = struct { y: u32 };
fn Foo(comptime T: type) type {
    // Equivalent to `struct { x: usize = @sizeOf(T) }`, which would *not* be equivalent for distinct T
    // However, `const sz = @sizeOf(T); return struct { x: usize = sz };` would be
    return @Type(.{ .Struct = .{
        .layout = .Auto,
        .fields = &.{.{
            .name = "x",
            .type = usize,
            .default_value = @sizeOf(T),
            .is_comptime = false,
            .alignment = 0,
        }},
        .decls = &.{},
        .is_tuple = false,
    } });
}

This is what I'm going to go with for now, but of course, this is open to discussion (I'm interested to hear your thoughts @andrewrk).

mlugg added a commit to mlugg/zig that referenced this issue Mar 5, 2024
This changes the representation of closures in Zir and Sema. Rather than
a pair of instructions `closure_capture` and `closure_get`, the system
now works as follows:

* Each ZIR type declaration (`struct_decl` etc) contains a list of
  captures in the form of ZIR indices (or, for efficiency, direct
  references to parent captures). This is an ordered list; indexes into
  it are used to refer to captured values.
* The `extended(closure_get)` ZIR instruction refers to a value in this
  list via a 16-bit index (limiting this index to 16 bits allows us to
  store this in `extended`).
* `Module.Namespace` has a new field `captures` which contains the list
  of values captured in a given namespace. This is initialized based on
  the ZIR capture list whenever a type declaration is analyzed.

This change eliminates `CaptureScope` from semantic analysis, which is a
nice simplification; but the main motivation here is that this change is
a prerequisite for ziglang#18816.
mlugg added a commit to mlugg/zig that referenced this issue Mar 5, 2024
These were previously associated with the type's namespace, but we need
to store them directly in the InternPool for ziglang#18816.
mlugg added a commit to mlugg/zig that referenced this issue Mar 5, 2024
This implements the accepted proposal ziglang#18816. Namespace-owning types
(struct, enum, union, opaque) are no longer unique whenever analysed;
instead, their identity is determined based on their AST node and the
set of values they capture.

Reified types (`@Type`) are deduplicated based on the structure of the
type created. For instance, if two structs are created by the same
reification with identical fields, layout, etc, they will be the same
type.

This commit does not produce a working compiler; the next commit, adding
captures for decl references, is necessary. It felt appropriate to split
this up.

Resolves: ziglang#18816
mlugg added a commit to mlugg/zig that referenced this issue Mar 5, 2024
…tures

This fixes an issue with the implementation of ziglang#18816. Consider the
following code:

```zig
pub fn Wrap(comptime T: type) type {
    return struct {
        pub const T1 = T;
        inner: struct { x: T1 },
    };
}
```

Previously, the type of `inner` was not considered to be "capturing" any
value, as `T1` is a decl. However, since it is declared within a generic
function, this decl reference depends on the context, and thus should be
treated as a capture.

AstGen has been augmented to tunnel references to decls through closure
when the decl was declared in a potentially-generic context (i.e. within
a function).
mlugg added a commit to mlugg/zig that referenced this issue Mar 5, 2024
mlugg added a commit to mlugg/zig that referenced this issue Mar 5, 2024
mlugg added a commit to mlugg/zig that referenced this issue Mar 5, 2024
I changed an error messages and fixed a minor bug while implementing
this proposal, which led to a few compile error cases failing.
mlugg added a commit to mlugg/zig that referenced this issue Mar 6, 2024
This changes the representation of closures in Zir and Sema. Rather than
a pair of instructions `closure_capture` and `closure_get`, the system
now works as follows:

* Each ZIR type declaration (`struct_decl` etc) contains a list of
  captures in the form of ZIR indices (or, for efficiency, direct
  references to parent captures). This is an ordered list; indexes into
  it are used to refer to captured values.
* The `extended(closure_get)` ZIR instruction refers to a value in this
  list via a 16-bit index (limiting this index to 16 bits allows us to
  store this in `extended`).
* `Module.Namespace` has a new field `captures` which contains the list
  of values captured in a given namespace. This is initialized based on
  the ZIR capture list whenever a type declaration is analyzed.

This change eliminates `CaptureScope` from semantic analysis, which is a
nice simplification; but the main motivation here is that this change is
a prerequisite for ziglang#18816.
mlugg added a commit to mlugg/zig that referenced this issue Mar 6, 2024
These were previously associated with the type's namespace, but we need
to store them directly in the InternPool for ziglang#18816.
mlugg added a commit to mlugg/zig that referenced this issue Mar 6, 2024
…tures

This fixes an issue with the implementation of ziglang#18816. Consider the
following code:

```zig
pub fn Wrap(comptime T: type) type {
    return struct {
        pub const T1 = T;
        inner: struct { x: T1 },
    };
}
```

Previously, the type of `inner` was not considered to be "capturing" any
value, as `T1` is a decl. However, since it is declared within a generic
function, this decl reference depends on the context, and thus should be
treated as a capture.

AstGen has been augmented to tunnel references to decls through closure
when the decl was declared in a potentially-generic context (i.e. within
a function).
mlugg added a commit to mlugg/zig that referenced this issue Mar 6, 2024
mlugg added a commit to mlugg/zig that referenced this issue Mar 6, 2024
mlugg added a commit to mlugg/zig that referenced this issue Mar 6, 2024
I changed an error messages and fixed a minor bug while implementing
this proposal, which led to a few compile error cases failing.
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Mar 8, 2024
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this issue Mar 20, 2024
This changes the representation of closures in Zir and Sema. Rather than
a pair of instructions `closure_capture` and `closure_get`, the system
now works as follows:

* Each ZIR type declaration (`struct_decl` etc) contains a list of
  captures in the form of ZIR indices (or, for efficiency, direct
  references to parent captures). This is an ordered list; indexes into
  it are used to refer to captured values.
* The `extended(closure_get)` ZIR instruction refers to a value in this
  list via a 16-bit index (limiting this index to 16 bits allows us to
  store this in `extended`).
* `Module.Namespace` has a new field `captures` which contains the list
  of values captured in a given namespace. This is initialized based on
  the ZIR capture list whenever a type declaration is analyzed.

This change eliminates `CaptureScope` from semantic analysis, which is a
nice simplification; but the main motivation here is that this change is
a prerequisite for ziglang#18816.
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this issue Mar 20, 2024
These were previously associated with the type's namespace, but we need
to store them directly in the InternPool for ziglang#18816.
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this issue Mar 20, 2024
This implements the accepted proposal ziglang#18816. Namespace-owning types
(struct, enum, union, opaque) are no longer unique whenever analysed;
instead, their identity is determined based on their AST node and the
set of values they capture.

Reified types (`@Type`) are deduplicated based on the structure of the
type created. For instance, if two structs are created by the same
reification with identical fields, layout, etc, they will be the same
type.

This commit does not produce a working compiler; the next commit, adding
captures for decl references, is necessary. It felt appropriate to split
this up.

Resolves: ziglang#18816
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this issue Mar 20, 2024
…tures

This fixes an issue with the implementation of ziglang#18816. Consider the
following code:

```zig
pub fn Wrap(comptime T: type) type {
    return struct {
        pub const T1 = T;
        inner: struct { x: T1 },
    };
}
```

Previously, the type of `inner` was not considered to be "capturing" any
value, as `T1` is a decl. However, since it is declared within a generic
function, this decl reference depends on the context, and thus should be
treated as a capture.

AstGen has been augmented to tunnel references to decls through closure
when the decl was declared in a potentially-generic context (i.e. within
a function).
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this issue Mar 20, 2024
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this issue Mar 20, 2024
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this issue Mar 20, 2024
I changed an error messages and fixed a minor bug while implementing
this proposal, which led to a few compile error cases failing.
TUSF pushed a commit to TUSF/zig that referenced this issue May 9, 2024
I changed an error messages and fixed a minor bug while implementing
this proposal, which led to a few compile error cases failing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

5 participants