-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax flaw: Block statements and terminating semicolon #1677
Comments
Consider this example: test "" {
if (true)
if (true)
if (true)
if (true)
if (true)
if (true)
return;
} |
Personally I prefer to enforce braces around block bodies. Perl does this and skips a lot of pain. C-like languages generally allow you to skip the braces if there is only one statement in the body. As Apple found out, it is not always a good idea to leave out the braces. |
I think the solution to this is related to #760 #114 (this comment) and maybe even #1676. We can change the syntax of the language so that syntactic constructs that will never pass semantic analysis will give a parse error. Consider this piece of code:
No matter what the LHS and RHS are for this operator, this code will never pass semantic analysis, because @kyle-github We can then consider enforcing blocks, but it is not necessary to solve this issue (i think). |
I'm not an expert on formal language theory, so I don't understand the problem in this issue. Does the syntax flaw manifest in any way other than purely theoretical? |
@thejoshwolfe Well, if we wanna have a formal grammar, it should be unambiguous, so that differently implemented parses don't vary in behavior. Otherwise, what is the point of having a grammar? We could ask the same question for #760. The stage1 compiler takes a choice, and always follows that choice, so in implementation, there is no ambiguity. We still consider it a problem though, because ambiguities make code harder to reason about and read (even if the parser is consistent). |
Also, it makes using parser generators to parse the Zig language trivial, which helps with specifying and testing the grammar itself. If we can have a grammar that can actually parse code by generating a parser, then, if some compilers vary in what syntax they parse, then you can point to the grammar and its generates parser and say "well, the parser that does the same as this generated one is correct" (This will help when new syntax have to be added to stage1 and stage2). |
I've also heard, that different C++ compiler disagrees on what syntax is valid. I'd be nice if we could avoid such a mess. :) |
3rdly! Having a grammar we can generate from allows from quick prototyping of syntax. We still have #114, and that is a big change. We should probably ensure we get it right, before rewriting a 2000 line parser :) |
Here are some excerpts from the grammar at the bottom of the langref docs:
I'm not totally clear on what a context-free grammar is, but I'm assuming that this parametric I don't think this is ambiguous. You're supposed to match the patterns in highest-to-lowest precedent as listed. So if |
We don't need the rules in #114 to be encoded in the grammar. Those rules can be enforced after parsing by examining the AST. I'm more optimistic about that approach generating more helpful error messages anyway. I understand that that will result in sloppy implementations failing to reject invalid Zig programs, but I think that's a minor concern compared to implementations failing to parse correct Zig programs. |
It sounds like you may be suggesting that an
There are 0 violations of such a rule for If we required braces on statement-level if (something) {
return;
} This isn't necessarily out of the question, but I don't want to go with this option just yet. There are a few other cases where braces imply semicolons, but these all have 1-token lookahead, so I don't think anyone's too concerned about these: comptime foo();
comptime {
foo();
}
defer foo();
defer {
foo();
}
suspend; // not sure about this one.
suspend {
resume @handle();
}
|
That's not true, there are a couple. Here's one from
|
Correct. I think we should stick to BNF for representing our grammar. No EBNF or other variations, as they are less supported in parser generators (and all variations are supersets, so if we keep it simple, we save everyone a lot of pain). I'm currently using bison + flex to validate my grammar work, and bison can report when there are conflicts (places where bison had to take a choice between two actions). These conflicts tend to be ambiguities, and bison reported one for the block expressions. This issue is my theory as to why bison had problems. I'm not 100% sure my theory is correct, as bison reports conflicts in the state machine and not in the grammar (which makes it hard to figure out the exact problem). Bison generates a LARL(1) parser so I thought this was a lookahead issue. I'll look more into this later, when I can work with the grammar again. #1676 is also causing conflicts and those might be interfering with other rules.
I don't think we have to give up the shorthand versions of the block expressions to resolve this issue. Also, we have the dangeling
|
Grammars are ambiguous as long as there is one input that has two ways of expanding the grammar. Some parsing techniques do not have the concept of "highest-to-lowest", so they will parse grammars differently from parsers that do if the grammar is ambiguous (LARL parsers expand the grammar into states, where a state can handle EDIT: The
|
More fun grammar that requires async<if (true) A else B> fn()void {}
async<comptime A> fn()void {} |
@Hejsil is your work on the flex/bison parser online anywhere? That sounds like it might be a good starting point for overhauling the grammar specification. |
@thejoshwolfe I've pushed my work to here. It can parse most of Zigs std (except async calls and a few statements). Currently, I'm working on identifying what syntactic constructs bison emits conflict warnings for, and seeing if I can restructure the grammar to get rid of them. Edit: The grammar I'm working on reverts the changes made in #1628 and encodes the rules for #1047 (which makes the grammar simpler) |
I don't see how this requires N token lookahead. I think lookahead refers to reading extra tokens to decide what to do with the already read tokens, for example you must read more before you reduce "a + b". Lookahead will always be required in some cases, so I don't see what's so bad about lookahead. When you've seen |
@UniqueID1 Anyways, the real problem in this issue is this, and how to specify Zigs syntax rules in grammar:
I have a solution to this, but it required changing when |
Fixed: #1685 |
Currently, block expressions (
for
,if
,while
) are only terminated with;
, if the ending block of that expression is not a block. This makes the grammar very hard (maybe even impossible) to make context-free. My best bet at formalizing a grammar for this have been something along these lines:Sadly, this grammar requires
N
token lookahead at best, and at worst it is ambiguous because aBlock
is also anExpr
.The text was updated successfully, but these errors were encountered: