Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Use line breaks instead of semicolons #7938

Closed
ghost opened this issue Feb 2, 2021 · 33 comments
Closed

Proposal: Use line breaks instead of semicolons #7938

ghost opened this issue Feb 2, 2021 · 33 comments
Milestone

Comments

@ghost
Copy link

ghost commented Feb 2, 2021

Based on #483, I suggest making line feeds mandatory, unless semicolons are used. Moreover, braces/brackets/parentheses that span multiple lines should be matched:

const g = fn (_: usize) void {}

pub const main = fn () void {
    var a = 3 // ok

    var b = g(
        3 + 7 * 2 / 8 >> 7 // ok
    )

    var c = g
        (3 + 7 * 2 / 8 >> 7) // error

    var d = 4; var e = 9 // ok

}

Now that the "last statement without semicolon returns" are gone (#629), there is no syntactical ambiguity.

@ghost
Copy link
Author

ghost commented Feb 2, 2021

Six 👎s and still not a single response.

@nektro
Copy link
Contributor

nektro commented Feb 2, 2021

semicolons should never not be necessary.

@g-w1
Copy link
Contributor

g-w1 commented Feb 2, 2021

Besides less typing, what is your reasoning for this? IMO the readability of the code goes way down with this.
Zig prioritizes reading code over writing.

@ghost
Copy link
Author

ghost commented Feb 2, 2021

semicolons should never not be necessary.

@nektro Good point.
A redundant semicolon should be an error, just as it currently is. The difference is

var a = 4

should be correct, and the semicolon at the end of

var a = 4;

should be redundant, and it shouldn't compile.

@ghost
Copy link
Author

ghost commented Feb 2, 2021

Besides less typing, what is your reasoning for this? IMO the readability of the code goes way down with this.
Zig prioritizes reading code over writing.

@g-w1 Python does too. Yet, the behavior that I described is no different from Python.

@ikskuh
Copy link
Contributor

ikskuh commented Feb 2, 2021

I am strongly against this propsal. There might be no technical reason to not remote the semicolon, but having a semicolon makes parsing code easier (for humans!).
If i can just skip-read to the next semicolon, i don't have to understand everything before. With your changed proposal i now have to parse the code in the head to build an AST. Makes reading the code way harder if i just search for a specific thing.

@g-w1
Copy link
Contributor

g-w1 commented Feb 2, 2021

Besides less typing, what is your reasoning for this? IMO the readability of the code goes way down with this.
Zig prioritizes reading code over writing.

@g-w1 Python does too. Yet, the behavior that I described is no different from Python.

You still didn't explain your reasoning for this.

@mikdusan
Copy link
Member

mikdusan commented Feb 2, 2021

var d = 4; var e = 9 // ok

// how does this impact parser? pretend it's a really long line that has to be split or splits used for clarity:
var e = foo.isEmpty()
    or foo.isNull()
    or bar.doesNotExist()

@ghost
Copy link
Author

ghost commented Feb 2, 2021

var d = 4; var e = 9 // ok

// how does this impact parser? pretend it's a really long line that has to be split or splits used for clarity:
var e = foo.isEmpty()
    or foo.isNull()
    or bar.doesNotExist()

I already mentioned that. The newline should be treated as a line terminator, and it should be a syntax error.
If the line is too long, part of it should be moved into a variable.

@data-man
Copy link
Contributor

data-man commented Feb 2, 2021

@CoolRustacean

What's your point?

If you had read such articles, you would not have created this issue.

@ghost
Copy link
Author

ghost commented Feb 2, 2021

Besides less typing, what is your reasoning for this? IMO the readability of the code goes way down with this.
Zig prioritizes reading code over writing.

@g-w1 Python does too. Yet, the behavior that I described is no different from Python.

You still didn't explain your reasoning for this.

@g-w1 Readability.
Currently, blocks should not be terminated by semicolons, but statements should:

pub fn main() {
    var x: usize = 62;

    if (x != 0) {
        x = 1 / x;
    }
}

Treating linefeeds as line terminators should remove this inconsistency:

pub fn main() {
    var x: usize = 62

    if (x != 0) {
        x = 1 / x
    }
}

Moreover, semicolons create an asymmetry:

a = 3;
a = 3

@kristoff-it
Copy link
Member

kristoff-it commented Feb 2, 2021

If the line is too long, part of it should be moved into a variable

That's a very intimate change in the language that has non-trivial ramifications, all for not a lot of value, to be honest.
I hope you understand why this is not just an easy proposal to consider at this stage of the project.

@ghost
Copy link
Author

ghost commented Feb 2, 2021

If the line is too long, part of it should be moved into a variable

That's a very intimate change in the language that has non-trivial ramifications, all for not a lot of value, to be honest.
I hope you understand why this is not just an easy proposal to consider at this stage of the project.

I understand, but named function declarations will be removed soon, and all functions will be anonymous (another breaking change that has not been implemented yet). Is a second breaking change just as bad as the first?

@g-w1
Copy link
Contributor

g-w1 commented Feb 2, 2021

If the line is too long, part of it should be moved into a variable

That's a very intimate change in the language that has non-trivial ramifications, all for not a lot of value, to be honest.
I hope you understand why this is not just an easy proposal to consider at this stage of the project.

I understand, but named function calls will be removed soon, and all functions will be anonymous (another breaking change that has not been implemented yet). Is a second breaking change just as bad as the first?

This is not true. They will not be removed, just the way of declaring a function is different. There was a lot of discussion on #1717, and ultimately it was accepted.

@ghost
Copy link
Author

ghost commented Feb 2, 2021

If the line is too long, part of it should be moved into a variable

That's a very intimate change in the language that has non-trivial ramifications, all for not a lot of value, to be honest.
I hope you understand why this is not just an easy proposal to consider at this stage of the project.

I understand, but named function calls will be removed soon, and all functions will be anonymous (another breaking change that has not been implemented yet). Is a second breaking change just as bad as the first?

This is not true. They will not be removed, just the way of declaring a function is different. There was a lot of discussion on #1717, and ultimately it was accepted.

I know it was accepted. My point is that it's a breaking change. The proposal explicitly states that all functions will be anonymous.

@andrewrk andrewrk closed this as completed Feb 2, 2021
@SpexGuy
Copy link
Contributor

SpexGuy commented Feb 2, 2021

A large part of the problem with this proposal is that it doesn't provide any motivation. It suggests a change, but doesn't address why the change would be a good idea. This is an especially easy way to get thumbs down votes, and ultimately a closed issue. If you don't try to make a convincing argument, few people will be convinced 😉. At minimum, you need to examine the reasons that people would argue against removing semicolons from the language, and attempt to address those arguments in the context of your proposal. Why did other recent languages, like Rust and D, choose to require semicolons? Why don't those reasons apply to Zig? Go chose a different rule entirely for where semicolons are required. If we were going to remove semicolons, why is your rule better than theirs?

Breaking changes are not out of the question at this point in the project, but it does need to be made clear that the new way would be better than the old way. For many people here, this is not intuitively true about removing semicolons. In particular, Zig prioritizes reading code over writing, so part of this argument needs to be exploring the worst possible cases where this proposed form could be harder to read than the old form, and showing that those problems are either addressed by the proposal or don't occur in real code.

It's also not clear that you've thought through any detailed scenarios about how this interacts with other language features. Multiline strings come to mind, as well as disambiguation between if/while/for/switch expressions and if/while/for/switch statements. You haven't shown how this interacts with declarative scopes, or field declarations, or anything else that isn't an execution scope. In fact you haven't really addressed any cases that would be tricky to parse, nor have you given a strong argument for the nontrivial idea that there are no such cases. One of those two examinations is absolutely necessary here.

With a convincing argument, this proposal could be reconsidered. That argument will take real work to craft, but it is far less than the work required to change the language in this way. At this point in the project, where we are trying to stabilize the language and receiving many proposals, we cannot spend time building the devils advocate argument for every low-effort proposal. We need you to show completely that the idea is possible without undue complexity and that the modified version of the language is conclusively better. If you just want to see how the community will react to an idea without putting in this much effort, please ask on a community location like the IRC or Discord channel instead of creating a full proposal.

Personally, I think there's a decent chance that removing semicolons might improve the language, especially after #1717 is added. But I'm worried that it might make certain constructs harder to read, or cause other sorts of unforeseen parsing problems. A proposal that addressed those concerns could be very convincing to me, especially if it contained large examples of real code that had been modified to the new style.

@ghost
Copy link
Author

ghost commented Feb 2, 2021

@SpexGuy Thank you. That sounds complicated, but I understand.

I have not thought about the consequences of this proposal, rather, I was making the assumption that there wouldn't be any unforeseen parsing problems, provided that the parser doesn't try to infer that a linefeed should be interpreted as meaningless whitespace like JavaScript and Scala parsers do.

What difference does it make if a program must be written as

statement1
statement2
statement3

or

statement1; statement2; statement3

rather than

statement1; statement2; statement3;

@SpexGuy
Copy link
Contributor

SpexGuy commented Feb 2, 2021

There are some common cases where the compiler should interpret a line feed as meaningless whitespace. Here are a few from the standard library:
braced initialization
array initialization
multiline strings
if expressions
long expressions
field lists
long conditionals

I think these cases are probably solvable, but they need to be addressed, along with any other somewhat common patterns that may require multiple lines (e.g. chains of function calls).

You can use this regex to search through zig code for more examples: ^[^/]+[^;,{}]$

@SpexGuy
Copy link
Contributor

SpexGuy commented Feb 4, 2021

We discussed this idea a bit more on the discord, and found the following problematic cases:

suspend
{
    // is this the suspend block or does it follow an empty suspend?
}

const x = fn () void
{
    // is this the body of function x or is x a function type and this is an empty block?
}

const x = T
{
    // Is this x = T{} or x = T followed by an empty block?
}

// when does this invoke some_function()?
if (some_long_condition()) return
    some_function()

// same
for (some_slice) |x| {
    if (some_long_condition()) break
    some_function()
}

These problems are still solvable, but you would have to make some language changes (require an expression on suspend, use separate keywords for fn declarations vs fn types, remove T{} syntax #5038). You would also need to use indentation to disambiguate the last two cases. I think that's probably unlikely to be accepted, though there is #35 so there might be an argument for it.

@ghost
Copy link
Author

ghost commented Feb 4, 2021

Those don't look like problems. They look like bad programs. Allowing the first three would allow ignoring the coding convention.

suspend
{
    // is this the suspend block or does it follow an empty suspend?
}

const x = fn () void
{
    // is this the body of function x or is x a function type and this is an empty block?
}

const x = T
{
    // Is this x = T{} or x = T followed by an empty block?
}

Allowing the second two would be confusing. The linefeed implies ";".

// when does this invoke some_function()?
if (some_long_condition()) return
    some_function()

// same
for (some_slice) |x| {
    if (some_long_condition()) break
    some_function()
}

@jedisct1
Copy link
Contributor

jedisct1 commented Feb 4, 2021

Having an explicit expression terminator reduces confusion.

Parsing is easier both for machines and for humans, so less context is needed to understand where an expression starts and where it ends. Even with badly or partially indented code.

I don't think there is a need to change the status quo. It is not going to make applications run faster, nor save development time.

@mikdusan
Copy link
Member

mikdusan commented Feb 4, 2021

@CoolRustacean

Those don't look like problems. They look like bad programs. Allowing the first three would allow ignoring the coding convention.

It would be helpful if you provided translations. Example: specs_guy showed clearly that this is ambiguous in at least 2 ways:

const x = T
{
    // Is this x = T{} or x = T followed by an empty block?
}

Please provide, translations for the above code for each ambiguity. If a translation doesn't apply, state so and why. Rinse and repeat for each code segment shown.

Thank you.

@SpexGuy
Copy link
Contributor

SpexGuy commented Feb 4, 2021

Allowing the first three would allow ignoring the coding convention.

This is true, but the language does allow ignoring the style conventions of the standard library. That's why it's a convention, and not the language syntax. We still compile programs with poor indentation or weird newlines, or if you open a brace on a new line.

The problem is not that these cases couldn't be resolved. It's that code that currently works (and that a quick survey of the discord showed that many people write), where braces are opened on the next line, still compiles with the new scheme but has a new meaning. There is a vast difference between whitespace deciding whether a program compiles and whitespace changing what a program does. If these two sequences both compile but do completely different things, that's really bad:

const x = fn () void {
    y = 4;
}

const x = fn () void
{
    y = 4;
}

Similarly, if there is no way to distinguish these cases, that's also bad:
(shown with semicolons, to clarify)

const x = fn () void;
{
    y = 4;
}

const x = fn () void
{
    y = 4;
};

@ghost
Copy link
Author

ghost commented Feb 4, 2021

Allowing the first three would allow ignoring the coding convention.

This is true, but the language does allow ignoring the style conventions of the standard library. That's why it's a convention, and not the language syntax. We still compile programs with poor indentation or weird newlines, or if you open a brace on a new line.

The problem is not that these cases couldn't be resolved. It's that code that currently works (and that a quick survey of the discord showed that many people write), where braces are opened on the next line, still compiles with the new scheme but has a new meaning. There is a vast difference between whitespace deciding whether a program compiles and whitespace changing what a program does. If these two sequences both compile but do completely different things, that's really bad:

const x = fn () void {
    y = 4;
}

const x = fn () void
{
    y = 4;
}

Similarly, if there is no way to distinguish these cases, that's also bad:
(shown with semicolons, to clarify)

const x = fn () void;
{
    y = 4;
}

const x = fn () void
{
    y = 4;
};

I disagree.

const x = fn () void {
    y = 4
}

is clearly a function and

const x = fn () void

{
    y = 4
}

is clearly a variable declaration followed by a block.

@ghost
Copy link
Author

ghost commented Feb 4, 2021

@SpexGuy

where braces are opened on the next line, still compiles with the new scheme but has a new meaning.

I don't know why you're so confident that the same program will compile and now do something different.

const x = fn () void
{
    // ...
};

will no longer compile because the semicolon is unnecessary.

Moreover, any code using that variable would break since it's supposed to have the fn type but instead has the type type.

const x = fn () void
{
    // ...
}

const z = x() // ERROR

@mikdusan
Copy link
Member

mikdusan commented Feb 4, 2021

It's like pulling teeth to get this ambiguity resolved by this proposal; up until 3 comments ago this was not clearly specified anywhere.

// 1 statement: fn definition; open-brace is required on same line
const x = fn () void {
    y = 4
}

// 2 statements: a decl followed by a block
const x = fn () void
{
    y = 4
}

@ghost
Copy link
Author

ghost commented Feb 4, 2021

It's like pulling teeth to get this ambiguity resolved by this proposal; up until 3 comments ago this was not clearly specified anywhere.

// 1 statement: fn definition; open-brace is required on same line
const x = fn () void {
    y = 4
}

// 2 statements: a decl followed by a block
const x = fn () void
{
    y = 4
}

That's not ambiguity. That's the expected behavior. There is no use for breaking the convention.

@ghost
Copy link
Author

ghost commented Feb 4, 2021

@SpexGuy

That's why it's a convention, and not the language syntax. We still compile programs with poor indentation or weird newlines, or if you open a brace on a new line.

No. We should be writing robust, optimal, and maintainable code.

@SpexGuy
Copy link
Contributor

SpexGuy commented Feb 4, 2021

If that's true, then a smaller prerequisite change is to make braces on a new line a compile error, without removing semicolons. I think that will also be a contentious decision, but if accepted it removes some of the ambiguity here and makes this one look more possible. If that proposal isn't accepted, then the same reasoning will rule out this proposal.

@mikdusan
Copy link
Member

mikdusan commented Feb 4, 2021

That's not ambiguity. That's the expected behavior. There is no use for breaking the convention.

Crossed signals I think. My meaning was that you did in fact clear up that ambiguity. The single-comments are mine but it's cut-n-paste what you added clarity to. In my honest opinion, this is what this PR needs/needed more of from the very beginning.

@pjz
Copy link
Contributor

pjz commented Mar 18, 2021

If we want to allow braces on the next line, we could do the python thing and add a line-continuation character (I don't think I've seen this in zig anywhere, since whitespace is currently insignificant). Then:

// 2 statements: a decl followed by a block
const x = fn () void
{
    y = 4
}

is easily 'fixed' via the trailing backslash on the first line:

// 1 statement: a fn definition
const x = fn () void \
{
    y = 4
}

I like it because extra linefeeds are allowed but extra semicolons aren't. I don't know why extra semicolons are illegal (while extra empty braces aren't), but I found it an extra corner to bump into while learning the language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants