Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: allow structs and unions to be const, similar to pointers #18964

Closed
Kiyoshi364 opened this issue Feb 16, 2024 · 10 comments
Closed

Proposal: allow structs and unions to be const, similar to pointers #18964

Kiyoshi364 opened this issue Feb 16, 2024 · 10 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.

Comments

@Kiyoshi364
Copy link
Contributor

PS: Before explaining the proposal, I'm a bit confused. When creating a new issue, I get to choose which kind of issue it is. The choice for "Language Proposal", points to here, saying that ziglang is not accepting new language proposals. But there are a few recent issues tagged with "Proposal".

I wrote the entire proposal before finding this out, so I may just leave it here, right? If ziglang is really not accepting new language proposals, just ignore it.


What would mean a struct/union to be const?

At first, it means that the struct's/union's pointer field's are const.

In the following example, the const Struct would mean the same as the ConstStruct.

const Struct = struct {
    a: u8,
    b: *u8,
    c: struct { sa: [3]u8, sb: [3]*u8 },
    d: union(enum) { ua: ?u8, ub: ?[*]u8 },
    e: error{A}!u8,
    f: error{A}!?*u8,
};

const ConstStruct = struct {
    a: u8,
    b: *const u8,
    c: struct { sa: [3]u8, sb: [3]*const u8 },
    d: union { ua: ?u8, ub: ?[*]const u8 },
    e: error{A}!u8,
    f: error{A}!?*const u8,
};

A hack around it is to use *const StructOrUnion in place of const StructOrUnion in function argument lists, ie: fn foo(a: *const StructOrUnion) void instead of fn foo(a: const StructOrUnion) void.

This hack is not that good for 2 reasons. First, it requires to add a & at every call site (it is just annoying, but fine). And second, (please correct me if I'm wrong) it locks the compiler into actually passing a pointer to the function in the compiled code, instead of choosing the best between passing via pointer or via value (passing via value is good for small structs/unions).

Here is another hack. It can also be used to explain the semantics of the proposal. The only drawback of this hack that I can think of is the extra verbosity of to_const. The compiler could have infer it (and maybe some runtime cost of shuffling data around, the compiler may be smart enough to avoid this).

pub fn Const(comptime T: type) type {
    return switch (@typeInfo(T)) {
        // I expect that it is what the compiler does
        .Pointer => |info| blk: {
            var new_info = info;
            new_info.is_const = true;
            break :blk @Type(.{ .Pointer = new_info });
        },
        inline .Array, .Optional, .ErrorUnion => |info, tag| blk: {
            var new_info = info;
            if (tag == .ErrorUnion) {
                new_info.payload = Const(info.payload);
            } else {
                new_info.child = Const(info.child);
            }
            break :blk @Type(@unionInit(
                Type,
                @tagName(tag),
                new_info,
            ));
        },
        // I am proposing this
        .Struct => |info| blk: {
            var new_fields = @as([info.fields.len]Type.StructField, undefined);
            break :blk for (info.fields, 0..) |f, i| {
                new_fields[i] = .{
                    .name = f.name,
                    .type = Const(f.type),
                    .default_value = f.default_value,
                    .is_comptime = f.is_comptime,
                    .alignment = f.alignment,
                };
            } else @Type(.{ .Struct = .{
                .layout = info.layout,
                .backing_integer = info.backing_integer,
                .fields = &new_fields,
                .decls = info.decls,
                .is_tuple = info.is_tuple,
            } });
        },
        .Union => |info| blk: {
            var new_fields = @as([info.fields.len]Type.UnionField, undefined);
            break :blk for (info.fields, 0..) |f, i| {
                new_fields[i] = .{
                    .name = f.name,
                    .type = Const(f.type),
                    .alignment = f.alignment,
                };
            } else @Type(.{ .Union = .{
                .layout = info.layout,
                .tag_type = info.tag_type,
                .fields = &new_fields,
                .decls = info.decls,
            } });
        },
        // These types cannot hold a pointer type, so do nothing
        else => T,
    };
}

pub fn to_const(val: anytype) Const(@TypeOf(val)) {
    const Value = @TypeOf(value);
    const ConstValue = Const(Value);
    return switch (@typeInfo(Value)) {
        .Struct => |struct_info| blk: {
            var const_value = @as(ConstValue, undefined);
            inline for (struct_info.fields) |f| {
                @field(const_value, f.name) = to_const(@field(value, f.name));
            }
            break :blk const_value;
        },
        .Union => switch (value) {
            inline else => |v, t| @unionInit(
                ConstValue,
                @tagName(t),
                to_const(v),
            ),
        },
        else => value,
    };
}

The implementation (with some tests) for the hack is in const.zig.txt.

For the compiler/type-checking, I believe that when accessing a struct, union or pointer field, the fields type should also be const. I may be wrong about it, I don't fully understand all the type inference rules and stuff.

Why I think it is a good idea

The use case I first think of is when writing a function which receives a struct/union argument with one or more pointer fields. If I (as function writer) want to guarantee that it does not write no any of the argument pointer fields.

The idea is "an extension" of receiving a *const T instead of *T. A note/remainder: arguments to functions are always semantically passed by value and they are "const identifiers" (see "## const/var identifier vs const/non-const type" bellow).

An immediate example is query functions (look at this struct/union and get some information out). Here is a mock example of a generic game board. The board uses a pointer to store its tiles. There are some functions that writes to the slice (move_piece) and other functions that don't (is_tile_blue).

const Board = struct {
    width: usize,
    height: usize,
    tiles: [*]Tile,

    pub fn move_piece(self: Board, from: usize, to: uzise) void {
        // with_piece returns the same Tile, but with another piece
        const new_to = self.tiles[to].with_piece(
            self.tiles[from].piece,
        );
        const new_from = self.tiles[from].with_piece(
            .no_piece,
        );
        // writing to self.tiles
        self.tiles[to] = new_to;
        self.tiles[from] = new_from;
    }

    pub fn is_tile_blue(self: const Board, pos: usize) bool {
        // No writing to self.tiles
        return self.tiles[pos].is_blue();
    }
};

Extra thoughts

const/var identifier vs const/not-const type

Most likely I'm just making the name of it up. This subsection is to remark the existence of 2 different uses of the const keyword.

An identifier (people might call it "variable") can be declared with either const or var. As far as I know, the only difference is that with const the programmer cannot mutate the underlying value (the value which the identifier identifies or refers/"points" to). While with var the programmer can mutate the underlying value.

In the type level/system, a pointer to T can be const (e.g. *const T) or not-const (e.g. *T). If it is const, the programmer cannot write to it.

This allow one to declare a pointer identifier as const and also change the value that the pointer points to:

const std = @import("std");

test {
    // num must be declared with var, otherwise it is a compiler error
    var num = @as(u8, 32);
    const num_ptr = #
    comptime std.debug.assert(@TypeOf(num_ptr) == *u8);
    num_ptr.* += 1;
    try std.testing.expectEqual(@as(u8, 33), num);
}

test {
    const num = @as(u8, 32);
    const num_ptr = #
    comptime std.debug.assert(@TypeOf(num_ptr) == *const u8);
    // This makes it not compile with
    // error: cannot assign to constant
    // num_ptr.* += 1;
    try std.testing.expectEqual(@as(u8, 32), num);
}

If this is confusing the article may help:
Pointers and constness in Zig and why it is confusing to a C programmer

Offtopic: maybe there should be two different keywords.
Example: let/var (for identifiers) and const/non-const (for types)

Type inference (Peer Type Resolution?)

The type *T can implicitly cast into *const T.
Similary, StructOrUnion should implicitly cast into const StructOrUnion.

var identifier of type const StructOrUnion

A var identifier (a variable) of type const StructOrUnion should behave similarly to a var identifier of type *const T: the programmer can alter the identifier's underlying value, but cannot write anything after a dereference.

test "var const" {
    const S = struct {
        foo: u8,
        ptr: *u8,
    };

    const num = @as(u8, 20);
    const num2 = @as(u8, 50);
    var s = Const(S){
        .foo = 2,
        .ptr = &num,
    };
    s.foo += 3;
    s.ptr = &num2;
    try std.testing.expectEqual(@as(u8, 5), s.foo);
    try std.testing.expectEqual(&num2, s.ptr);
}

The zig grammar

Currently, const can only appear in type expressions after * (e.g. * const u8) (I didn't look at the grammar specs, I tryed and the compiler complained). As a consequence, const struct{} and const u8 are invalid type expressions.

I didn't look at the grammar (as stated previously), so I don't know how hard it is to allow const struct{} and similar expressions.

Assuming that it is possible, identifiers also should be allowed (e.g. const Board). If it is allowed, I expect that the grammar would also allow primitive types (e.g. const u8) and multiple-consts (e.g const const Board). If these cases are undesirable, the compiler should make extra semantic checks (if one asks my opinion: it is undesirable).

@nektro
Copy link
Contributor

nektro commented Feb 16, 2024

In the following example, the const Struct would mean the same as the ConstStruct.

@Kiyoshi364
Copy link
Contributor Author

Kiyoshi364 commented Feb 17, 2024

  • ConstStruct is creatable via comptime

Yes. If this is the reason to reject the proposal, I find it reasonable.

  • by-value parameters are already passed as const

Yes, but there is a difference between a "const variable" (I called it const identifier) and a "const type".
This fact does not disallow modification.

const std = @import("std");
const S = struct { ptr: *u8 };
fn inc(s: S) void {
    s.ptr.* += 1;
}
test {
    var num = @as(u8, 10);
    const s = S{ .ptr = &num };
    inc(s);
    try std.testing.expectEqual(@as(u8, 11), num);
}

I don't understand why this point is related to the proposal.

  • const only works on pointers because container fields inherit the const-ness of their initialization location

I did not understand this. The following code does not compile because *u8 cannot implicitly cast to *const u8.
(EDIT: I swapped out stuff *const u8 -> *u8 is not possible, but *u8 -> *const u8 is possible)

test {
    const S = struct { ptr: *u8 };
    const num = @as(u8, 10);
    const s = S{ .ptr = &num };
    _ = s;
}

Error:

a.zig:4:25: error: expected type '*u8', found '*const u8'
    const s = S{ .ptr = &num };
                        ^~~~
a.zig:4:25: note: cast discards const qualifier

Thanks, I did not know about it. They are related but (in my view) different. That one is about restricting the permission to write some fields in a struct out side of a namespace (focusing on encapsulation). This one has the focus on marking a function argument or identifier/variable in a block as "write-through-pointers free".

@nektro
Copy link
Contributor

nektro commented Feb 17, 2024

I did not understand this. The following code does not compile because *u8 cannot implicitly cast to *const u8.

they do. that code is attempting to do the opposite as shown by the compile error.

@jacobly0 jacobly0 added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Feb 17, 2024
@rohlem
Copy link
Contributor

rohlem commented Feb 17, 2024

I think I understand the use case for making a type's pointers transitively pointing-to-const for more explicit interfaces.
In status-quo I would suggest writing a function that reconstructs any given type to fit this criterion
(using @typeInfo and @Type), so const S would instead be TransitivelyPointingToConst(S),
plus another function for converting between this new type and the original
(either by @bitCast, which should always be safe, or field-by-field).

untested example code of how I would write it (EDIT: added .Optional, .ErrorUnion, fixed .Pointer type memoization)

const std = @import("std");
/// helper function for deduplicating created optional types via comptime memoization
fn TransitivelyPointingToConstOptional(comptime optional_info: std.builtin.Type.Optional, comptime Child: type) {
 var o = optional_info;
 o.child = Child;
 return @Type(.{.Optional = o});
}
/// helper function for deduplicating created error union types via comptime memoization
fn TransitivelyPointingToConstErrorUnion(comptime error_union_info: std.builtin.Type.ErrorUnion, comptime Payload: type) {
 var e = error_union_info;
 e.payload = Payload;
 return @Type(.{.ErrorUnion = e});
}
/// helper function for deduplicating created pointer types via comptime memoization
fn TransitivelyPointingToConstPointer(comptime pointer_info: std.builtin.Type.Pointer, comptime Child: type) {
 var p = pointer_info;
 p.child = Child;
 return @Type(.{.Pointer = p});
}
/// helper function for deduplicating created struct types via comptime memoization
fn TransitivelyPointingToConstStruct(comptime struct_info: std.builtin.Type.Struct, comptime field_count: comptime_int, comptime fields: [field_count]std.builtin.Type.StructField) {
 var s = struct_info;
 s.fields = &fields;
 return @Type(.{.Struct = s});
}
/// helper function for deduplicate created union types via comptime memoization
fn TransitivelyPointingToConstUnion(comptime union_info: std.builtin.Type.Union, comptime field_count: comptime_int, comptime fields: [field_count]std.builtin.Type.UnionField) {
 var s = union_info;
 s.fields = &fields;
 return @Type(.{.Union = u});
}
pub fn TransitivelyPointingToConst(comptime T: type) type {
 return switch(@typeInfo(T)) {
  else => T,
  .Optional => |O| {
   var o = O;
   const Child = TransitivelyPointingToConst(o.child);
   o.child = undefined;
   return TransitivelyPointingToConstOptional(o, Child);
  },
  .ErrorUnion => |E| {
   var e = E;
   const Payload = TransitivelyPointingToConst(e.payload);
   e.payload = undefined;
   return TransitivelyPointingToConstErrorUnion(e, Payload);
  },
  .Pointer => |P| {
   var p = P;
   p.is_const = true;
   const Child = TransitivelyPointingToConst(p.child);
   p.child = undefined;
   return TransitivelyPointingToConstPointer(p, Child);
  },
  .Struct => |S| {
   var s = S;
   const fields_len = s.fields.len;
   var fields = s.fields[0..].*;
   s.fields = undefined;
   for (&fields) |*f| {
    f.type = TransitivelyPointingToConst(f.type);
   }
   return TransitivelyPointingToConstStruct(s, fields_len, fields);
  },
  .Union => |U| {
   var u = U;
   const fields_len = u.fields.len;
   var fields = u.fields[0..].*;
   u.fields = undefined;
   for (&fields) |*f| {
    f.type = TransitivelyPointingToConst(f.type);
   }
   return TransitivelyPointingToConstUnion(u, fields_len, fields);
  },
 };
}
pub fn asTransitivelyPointingToConst(value: anytype) TransitivelyPointingToConst(@TypeOf(value)) {
  return @bitCast(value);
}
// /// the other direction would look like this, but is generally unsafe,
// /// because it makes pointers that allow mutating what might be const
//pub fn fromTransitivelyPointingToConst(comptime Target: type, value: TransitivelyPointingToConst(Target)) Target {
//  return @bitCast(value);
//}

Have you considered/tried such a userspace solution?
Am I missing some use case where this wouldn't suffice?

Having a language feature would of course further reduce friction, which I'm not opposed to.
Note: As proposed, const S in * const S would require parentheses
to differentiate status-quo *const (S) from * (const S).
This could lead to user errors (although as I see it transitively-const-when-possible would be the safer default;),
maybe we can come up with different syntax for it.

@Kiyoshi364
Copy link
Contributor Author

I did not understand this. The following code does not compile because *u8 cannot implicitly cast to *const u8.

they do. that code is attempting to do the opposite as shown by the compile error.

Sorry, I swapped stuff. *u8 -> *const u8 is fine, but *const u8 -> *u8 is not fine (and that is what I ment).
I still don't understand the initial point:

  • const only works on pointers because container fields inherit the const-ness of their initialization location

@Kiyoshi364
Copy link
Contributor Author

Have you considered/tried such a userspace solution?

Yes. Your implementation looks the same as mine (I said it was a hack, and called Const and to_const).
As a note @bitcast does not work because structs and unions don't have guaranteed in-memory layout (unless they are packed). My to_const (equivalent of your asTransitivelyPointingToConst) does some comptime magic to make it work.

One of the reasons I called it a hack is that I have no guarantees that the to_const is free at runtime. I have no guarantees because the in-memory layout is not guaranteed to be the same (between S and Const(S)). I could make S be packed, but I don't want to optimize the order of the fields myself. Maybe the compiler is smart enough to make it free at runtime, but I'm not sure.

Am I missing some use case where this wouldn't suffice?

I think that is all. As time goes on (and more answers come) I'm considering that putting it somewhere in std (if it is useful to more people) may be sufficient.

Having a language feature would of course further reduce friction, which I'm not opposed to.

I think this is the main (maybe only) good reason to make it a language feature, instead of a user-level comptime magic.

Note: As proposed, const S in * const S would require parentheses
to differentiate status-quo *const (S) from * (const S).
This could lead to user errors (although as I see it transitively-const-when-possible would be the safer default;),
maybe we can come up with different syntax for it.

Makes sense.

  • *const (S) is a read-only pointer to S. Similar to a const s = @as(S, undefined).
  • *(const S) is a read-write pointer to const S. I can write to S but cannot write pass S's pointers. Similar to a var s = @as(Const(S), undefined)
  • *const (const S) is a read-only pointer to const S. Semantically the same as plain *const (S). Similar to a const s = @as(Const(S), undefined)

This is a point for keeping it at user level.

maybe we can come up with different syntax for it.

An easy change is to move all the pointer modifiers to before *. Examples:

  • *const T -> const *T
  • *volatile align(4) T -> volatile align(4) *T
  • *Const(T) -> *const T
  • *const Const(T) -> const *const T (syntactically valid, but I think it should be a compile error and the right type is const *T)

I'm not sure if I like it. Maybe I just got used to *const T. Maybe having const *T and *const T at the same time is too much complication.

@rohlem
Copy link
Contributor

rohlem commented Feb 17, 2024

Have you considered/tried such a userspace solution?

Yes. Your implementation looks the same as mine (I said it was a hack, and called Const and to_const).

Oh, sorry, not sure how I missed that in the OP.
You're right, the code looks mostly the same, though I think due to current identity semantics
separate helper functions for reifying types are required (like in my implementation) to make sure the operation is idempotent.
Otherwise Const(T) != Const(Const(T)) because their inputs are distinct.
This would also be resolved by #18816 though, which would make both implementations equivalent.

As a note @bitCast does not work because structs and unions don't have guaranteed in-memory layout

You're right, while I'm pretty sure the compiler currently has no reason to make the layouts incompatible,
the language doesn't guarantee it, and you're currently forced to work around it differently
(f.e. type-punning via @as(*Const(@TypeOf(value)), @ptrCast(&value)).* is an option).

maybe we can come up with different syntax for it.

An easy change is to move all the pointer modifiers to before *.

Type operators so far are prefix operators, meaning they are right-associative and read left-to-right.
This uniformity makes f.e. optional pointer ?*T vs *?T pointer-to-optional relatively easy to understand.
But with this as a basis, if const T already means transitively-const, why should const *T be the special case where it means non-transitive const?
Note that multi-pointers would also be affected: Status-quo understands *const *T as pointer-to-const-pointer-to-T, where const is non-transitive.
Breaking or changing this would have to be more regular than status-quo to be considered imo.

@Kiyoshi364
Copy link
Contributor Author

You're right, the code looks mostly the same, though I think due to current identity semantics
separate helper functions for reifying types are required (like in my implementation) to make sure the operation is idempotent.
Otherwise Const(T) != Const(Const(T)) because their inputs are distinct.

Agree, Const being idempotent is important.
Oh, thats why you separated into helper functions.
I ran some tests and my implementation and I cannot make it fully idempotent (looks like it works for types other than struct and union, but in those cases it works for my definition of type equality expectEqualTypes).

test "Const idempodent" {
    try expectEqual(Const(*u8), Const(Const(*u8)));
    try expectEqual(Const([]u8), Const(Const([]u8)));
    try expectEqual(Const([*]u8), Const(Const([*]u8)));
    try expectEqual(Const([3]*u8), Const(Const([3]*u8)));
    try expectEqual(Const(?*u8), Const(Const(?*u8)));
    try expectEqual(Const(????*u8), Const(Const(????*u8)));
    try expectEqual(Const(error{A}!*u8), Const(Const(error{A}!*u8)));
    try expectEqual(Const(error{A}!?*u8), Const(Const(error{A}!?*u8)));

    try expectEqualTypes(Const(Struct), Const(Const(Struct))); // My definition for equal types is fine
    try expectEqual(Const(Struct), Const(Const(Struct))); // This line fails
}

It would be nice to solve it, maybe it requires #18816 .
I tried to mix our code up, new source is in const.zig.txt.

You're right, while I'm pretty sure the compiler currently has no reason to make the layouts incompatible,
the language doesn't guarantee it, and you're currently forced to work around it differently
(f.e. type-punning via @as(Const(@typeof(value)), @ptrCast(&value)). is an option).

Similarly, I think that the language does not guarantees that type-punning works.
Note that I already have a conversion that works (to_const) but does not have any runtime guarantees (as far as I know).

Type operators so far are prefix operators, meaning they are right-associative and read left-to-right.
This uniformity makes f.e. optional pointer ?*T vs *?T pointer-to-optional relatively easy to understand.
But with this as a basis, if const T already means transitively-const, why should const *T be the special case where it means non-transitive const?
Note that multi-pointers would also be affected: Status-quo understands *const *T as pointer-to-const-pointer-to-T, where const is non-transitive.
Breaking or changing this would have to be more regular than status-quo to be considered imo.

Agree. My made-up syntax is bad.
Maybe a new keyword like readonly or immut. But I'm feeling it should be a user-comptime-hack or somewhere in stdlib.

@Vexu Vexu added this to the 0.13.0 milestone Feb 18, 2024
@rohlem
Copy link
Contributor

rohlem commented Feb 18, 2024

@Kiyoshi364 The missing element for idempotency was only re-creating struct and union types when necessary -
otherwise the types always get a new identity and keep getting re-created each time.

New version with passing tests: const.zig.txt

EDIT: Looking at this again, maybe the separate helper functions aren't even necessary / what you want anymore,
because now I think Const(struct{a: u8}) == Const(struct{a: u8}) even though the original input types are distinct.
I believe passing the original T along with the field types to the ConstHelp* functions (just as identity for memoization) would fix this.
EDIT2: Actually, I just tested it and this already works in this current version -
though I admittedly don't know why.
(Hypothesis: Maybe the compiler currently doesn't deduplicate the internal addresses -> identities of field names across types?
In this case it might break and need a proper fix in a future version.)

Test case for distinct identities (EDIT3: Updated to include fields):

const expect = std.testing.expect;
test "Const identity" {
	const A = struct{a: *u8};
	const B = struct{a: *u8};
	const C = union{a: *u8};
	const D = union{a: *u8};
	try expect(A != B);
	try expect(Const(A) != Const(B));
	try expect(C != D);
	try expect(Const(C) != Const(D));
	const E = struct{a: *A};
	const F = struct{a: *B};
	const G = union{a: *C};
	const H = union{a: *D};
	try expect(E != F);
	try expect(Const(E) != Const(F));
	try expect(G != H);
	try expect(Const(G) != Const(H));
}

@Kiyoshi364
Copy link
Contributor Author

EDIT: Looking at this again, maybe the separate helper functions aren't even necessary / what you want anymore,
because now I think Const(struct{a: u8}) == Const(struct{a: u8}) even though the original input types are distinct.
I believe passing the original T along with the field types to the ConstHelp* functions (just as identity for memoization) would fix this.

I inlined the helper functions, and it worked.
I'm on version 0.11.0

I have updated with all of that: const.zig.txt

I'm happy with the current state of Const.
It is not a new language feature.
It may be on stdlib (maybe std.meta?), but I will not push for it.

I'm closing the issue.
Thanks everyone : )

@Vexu Vexu removed this from the 0.13.0 milestone Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

5 participants