Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@hide: a way to manually/prematurely end the scope of identifiers. #9792

Closed
InKryption opened this issue Sep 17, 2021 · 9 comments
Closed

@hide: a way to manually/prematurely end the scope of identifiers. #9792

InKryption opened this issue Sep 17, 2021 · 9 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@InKryption
Copy link
Contributor

InKryption commented Sep 17, 2021

This idea was originally brought to my attention by @marler8997 here: #9696 (comment).
The basic premise would be that using @hide on an identifier would disallow further use of it, and return its value, like so:

// based off of the original comment
pub fn calculate(x: i32) i32 {
    var mutable_x = @hide(x);
    mutable_x += 50;
    // Compiler Error: Referenced a hidden identifier
    // mutable_x += x;
    return mutable_x;
}

I believe this could be of some use, as besides remedying the original concern of the issue it was mentioned in, it could also potentially solve a few current pain points, like the scope of temporary index variables in while loops:

var idx: usize = 0;
while (idx < N) : (idx += 1) {
    // ...
}
_ = @hide(idx);

which is a lot easier to write, and is much more explicit in stating that idx should specifically only be used within the while loop, which would be especially beneficial in longer sections of code that are being refactored, seeing as where many might omit wrapping the above in its own block out of negligence/laziness, writing _ = @hide(idx); is but a few keystrokes, and gives the same security that making the block would do.

Another example where this could remedy some pains would be when working with functions that use out-parameters, like this:

var mutable_device_features: c.VkPhysicalDeviceFeatures = undefined;
_ = c.vkGetPhysicalDeviceFeatures(device, &mutable_device_features);
const device_features = @hide(mutable_device_features);

Other considerations:

  • Now, the idea by itself is nice, and I think it could have its place alongside the usual method of using blocks to control scopes, but one thing that would make it more powerful is if it were to enable legal identifier shadowing, where in the previous example, instead of using "mutable_device_features" as the preliminary identifier to the final "device_features", we could simply use "device_features", e.g. const device_features = @hide(device_features);.
    For sanity's sake, I think that if this were to be allowed, it should be limited explicitly to cases where the hidden identifier's value is being assigned to an identifier of the same name, making it illegal to do something like:
var x: u1 = 1;
const y = @hide(x);
const x = 2; // This could lead to the same issues as in the original proposal where this idea was brought up.

That said, this could be a way to resolve the issues #498 attempts to address, without adding other special syntax; although to be fair, it would still technically be a special case.

EDIT: The following was addressed by @SpexGuy, and I now consider it to be irrelevant. Focus should remain on how this will affect the language and best practices.

  • Iirc, proving a variable is not referenced by pointer is usually undecidable, meaning this would be a pretty uncommon, but I will put this here anyway:
    the other, more secondary, thought I had about this was optimization. Now, I'm nowhere near knowledgeable enough about compilers to make any substantive arguments or statements regarding this, but in my layman's mind, it would make sense that by hiding an identifier, you would make it easier for the compiler to prove that a mutable variable is not modified after a certain point, meaning that when doing const new = @hide(old);, the compiler could then reuse the memory location of "old" for "new" instead of copying the memory.

  • On that note, being that I am no expert in compilers, I also have no idea how complex it would be to "remove" an identifier from scope, so feedback or explanation from anyone who understands more about how this might be done, or how it would be hard to do, would be appreciated.

@InKryption
Copy link
Contributor Author

This is also my first proposal here after having being an observer for more than a few months, so any pointers or critiques on anything I've done wrong here, or missed out on, would be much appreciated.

@SpexGuy
Copy link
Contributor

SpexGuy commented Sep 17, 2021

in my layman's mind, it would make sense that by hiding an identifier, you would make it easier for the compiler to prove that a mutable variable is not modified after a certain point, meaning that when doing const new = @hide(old);, the compiler could then reuse the memory location of "old" for "new" instead of copying the memory.

This isn't quite true, for a few reasons.

  • The need to disallow re-declaring a hidden variable means we can't get rid of the old entry in the name lookup. We have to instead store a sentinel value that indicates it's been renamed.
  • A complication of scoping is that if you declare a variable, push a scope, hide the variable, then pop the scope, the variable needs to come back. So we would either need to duplicate the name table into every scope, or push hide as a special entry into the inner scope. In practice we would probably do the second one.
  • The current implementation uses a chain of stack-allocated values to track scoped variables, so we couldn't really reclaim the memory even if we wanted to.

However, these are not really problems IMO. This feature would be resolved entirely within astgen, which is the most parallel part of the compilation process, and also highly cacheable. It's also really not a large cost, hides would be a simple piece of data. So in this case I think performance considerations can be mostly ignored and we should instead focus on the ways in which this proposal would affect the language and best practices.

@InKryption
Copy link
Contributor Author

@SpexGuy thank you, that's very insightful, and good to hear. I shall edit the proposal to reflect this.

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 17, 2021
@Vexu Vexu added this to the 0.9.0 milestone Sep 17, 2021
@marler8997
Copy link
Contributor

marler8997 commented Sep 17, 2021

One general use case that comes to mind is when you have a variable that is typed "generically" but you know it must be a more specific type. A perfect example of this is how zig handles inheritance, where you get a pointer to the "generic base type" but that pointer is only used to get a pointer to a more specific type, i.e.

const CustomAllocator = struct {
    allocator: Allocator,

    pub fn alloc(self: *Allocator) void {
          const self = @fieldParentPtr(CustomAllocator, "allocator", @hide(self));
    }
}

Another example, say you are looping through a json Array where you know all the items are strings. In this case each element of the json array is a "generic" json Node, but once you have the more "specific" string instance, you no longer need a symbol pointing to the generic node:

for (json_array.items) |*json_string| {
    const json_string = @hide(json_string).String;
}

Note that good error handling would still be possible:

for (json_array.items) |*json_string| {
    const json_string = verifyIsStringOrPrintGoodErrorAndExit(@hide(json_string));
}

Or another example is passing a generic callback that accepts a *void context.

pub fn myCallback(context: *void) {
    const context = @ptrCast(*MyContext, @hide(context));
}

I'm not sure this use case is as good as the "mutable" use case but it shows some ways a feature like this could be used.

@pfgithub
Copy link
Contributor

pfgithub commented Sep 17, 2021

This would be useful for one of my projects where consistent unique ids are needed frequently, which change inside of loops

fn someFunction(id: ID) {
    const id = @hide(id).pushFunction(@src());
    for(array) |item| {
        const id = @hide(id).pushPointer(@src(), item);
    }
}

Mutability can be used for this, but using constants feels much nicer and less error-prone

fn someFunction(id: *IDStack) {
    id.pushFunction(@src());
    defer id.pop();
    for(array) |item| {
        id.pushPointer(@src(), item);
        defer id.pop();
    }
}

Currently, using constants requires messier, more error-prone code like this

fn someFunction(id: ID) {
    const id_outer = id.pushFunction(@src);
    for(array) |item| {
        const id_inner = id_outer.pushPointer(@src(), item);
    }
}

@InKryption
Copy link
Contributor Author

@pfgithub I'm a little bit confused by your example, I can't quite imagine what the context for something like this would be. What does id.pushFunction(@src()) do in each version? Is the result itself, or is this supposed to be a linked-list, or something else? And can I presume that in the loop, there is more code working with "id_inner" that would necessitate storing its result? Sorry, just want to make sure we're on the same page here.

@pfgithub
Copy link
Contributor

@InKryption Here is a more realistic example, actually making use of ids:

Improved example (hopefully)

This is for generating unique ids for an imgui library, allowing storing data and computations across frames.

fn renderForm(id_arg: ID.Arg) Widget {
    const id_outer = id_arg.id;
    var v_layout = VerticalLayout{ .gap = 2 };
    v_layout.put(renderText(id_outer.push(@src()), "Contact Information:"));
    for (.{
        "Name", "Email Address", "Phone Number",
    }) |label, i| {
        const id_inner = id.pushIndex(@src(), i);
        v_layout.put(renderInput(id_inner.push(@src()), .{ .label = label }));
    }
    return v_layout.widget();
}

fn renderInput(id_arg:  ID.Arg, opts: InputOptions) Widget {
    const id = id_arg.id;

    const current_text: **[]const u8 = useState(id.push(@src()), *[]const u8, ...initializer);
    const input_key = textInputKey(id.push(@src()));
    
    if(input_key.on_commit) |added_text| {
        current_text.* = concatString(current_text.*, added_text);
    }

    var v_layout  = VerticalLayout{ .gap = 1 };
    v_layout.put(renderText(id.push(@src()), opts.label));
    v_layout.put(handleInput(input_key,
        renderBorder(
            .{.w = 1, .color = .black},
            renderText(id.push(@src()), current_text.*),
        ),
    ));
    return v_layout.widget();
}

Render functions are called every frame. Functions like useState() need to return the same pointer every frame, rather than recreating the content, so an id is used to key into a persistent hashmap. Inside of loops, the id has to be keyed so that the same item in an array will always make the same ids.

id.push is a pure function, it returns a new id containing [...parent id components, this component]

What does id.pushFunction(@src()) do in each version? Is the result itself, or is this supposed to be a linked-list, or something else?

pushFunction was a bad example, but pushIndex which is used in this example returns a new id containing [...parent components, @src, index]

And can I presume that in the loop, there is more code working with "id_inner" that would necessitate storing its result?

Yeah.

Essentially, any time I need to loop over anything, a new id is required that should shadow the parent id, as using the parent id is usually an error

@InKryption
Copy link
Contributor Author

@pfgithub Righy-ho, that all makes a lot more sense, thanks for taking the time to clear that up; that certainly is a use-case I would like for this proposal to cater to.

@andrewrk
Copy link
Member

This was proposed and rejected in #594, however, I think the conversation (in particular my own comments) did not adequately address the proposal, so I want to say thank you to @InKryption for putting effort into making this a high quality, concrete proposal.

However, I do think the decision to reject this is the best design decision for Zig, and I am confident enough in this assessment to close this issue. The value this provides is ability to prevent someone from accidentally using a variable that has been deleted. However, let's consider the use case in which this happens: it's when someone is looking at a long function, and may not be aware of the whole thing. So the person reading the code only sees subsets of the function's body while making edits.

In such case, declaring a variable with a conflicting name is already a compile error, so that can't be a problem. It would just be copy-pasting code or making a typo that this would catch. In practice I have found it better to name things properly, and then everything works out fine. I can address all 3 motivating use cases this way:

pub fn calculate(starting_x: i32) i32 {
    var x = starting_x;
    x += 50;
    return x;
}

With the parameter named this way, it's not really possible to make this mistake.

{
    var idx: usize = 0;
    while (idx < N) : (idx += 1) {
        // ...
    }
}

We already have scopes to solve this problem. Often there is more than just the index variable that should be limited in scope, and the programmer is encouraged to put all relevant declarations in the inner scope.

I can foresee @hide being a kind of C++-ism where it's considered "best practice" to hide everything but then people get stuck in this unproductive meta-work where they're spending all their time hiding and unhiding variables, and not working on their actual logic.

var mutable_device_features: c.VkPhysicalDeviceFeatures = undefined;
_ = c.vkGetPhysicalDeviceFeatures(device, &mutable_device_features);

In this example, I don't think anyone is going to type mutable_device_features by accident. Also, having references to it directly helps communicate that the memory for this is on the stack and should not escape into a longer-lived structure. Even if @hide was a language feature, using it here seems inadvisable to me.

    pub fn alloc(self: *Allocator) void {
          const self = @fieldParentPtr(CustomAllocator, "allocator", @hide(self));

If this can happen anywhere in the function then we would have the situation where you could be looking at a variable declaration and still not know what the type of that variable is when used elsewhere in the function.

Additionally, @hide would be an extra feature of the language, making it more complex. Language simplicity is zig's killer feature, and it must be fiercely guarded.

A generalization of this feature which retains language simplicity would be allowing redeclaration of the same name, like Rust does. However, I veto this on the basis that a person reading code has fewer guarantees about what names mean. The fact that when you read zig code right now, an identifier always means the same thing even if you haven't seen everywhere the identifier is used in the function, is invaluable.

I appreciate the discussion on this topic, however, I am confident in the direction of the language in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

6 participants