Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require parentheses sometimes to disambiguate confusing operator precedence #114

Open
PavelVozenilek opened this issue Feb 9, 2016 · 27 comments
Labels
accepted This proposal is planned. breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@PavelVozenilek
Copy link

Look at the cryptic table:

x() x[] x.y
!x -x ~x *x &x ?x %x %%x
x{}
* / %
+ - ++
<< >>
&
^
|
== != < > <= >=
&&
||
?? %%
= *= /= %= += -= <<= >>= &= ^= |= &&= ||=

Why it is all needed? Who will find the time to learn it? I had not internalized C precedence after two decades.

Language Pony does it differently:

Pony takes a different approach and outlaws infix precedence. Any expression where more than one infix operator is used must use parentheses to remove the ambiguity. If you fail to do this the compiler will complain.

@andrewrk
Copy link
Member

andrewrk commented Feb 9, 2016

Interesting idea. This is on the table since it adheres to the principle that being readable is more important than being writable.

Arguments against it are that it would be strange and cumbersome for people coming from a C background. People are used to 3 * 4 + 1 meaning that the multiplication comes first.

@emanuelpalm
Copy link

👍

@andrewrk
Copy link
Member

andrewrk commented Feb 9, 2016

Some precedence is necessary, such as assignment, function call invocation, array index. Too much parens and the language turns into lisp.

@PavelVozenilek
Copy link
Author

Assignment, function call, array index are not infix operators. Could there be any ambiguity with them?

I can imagine having default precedence for basic arithmetic (+-/*), this is what little kids get drilled in schools, but IMO it is not worth the effort. One never writes 3 * 4 + 1 in the code, one does some_variable * another_variable + something_completely_else, and all advantages of short infix syntax get lost there anyway. Brackets would help to separate it into visual chunks.

@PavelVozenilek
Copy link
Author

Also, this feature would make adding user defined operators into the language easier.

@thejoshwolfe
Copy link
Contributor

I think this would be a cool idea for some categories of operators, specifically the ones i can never remember the precedence for. e.g. a && b || c would be a compile error, or a & b | c. i don't think we need to go so far as to outlaw a + b * c though.

@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label May 7, 2016
@andrewrk andrewrk added this to the 0.1.0 milestone Mar 26, 2017
@thejoshwolfe
Copy link
Contributor

see also #29

@phase
Copy link
Contributor

phase commented Apr 25, 2017

My favorite quote from the Pony page is

Most people will remember to do multiplication before addition, but what about left bit shifting versus bitwise and? Sometimes people misremember (or guess wrong) and that leads to bugs. Worse, those bugs are often very hard to spot.

I love the idea for specific parens. It makes the execution crystal clear.

@thejoshwolfe thejoshwolfe changed the title precedence table require parentheses sometimes to disambiguate confusing operator precedence Apr 25, 2017
@thejoshwolfe
Copy link
Contributor

The current operator precedence table is inspired by C. But I think Zig can do better than that. For example, & > ^ > | seems pretty silly. Even if you think & > | makes sense (because of the analogy to * and +), it's still pretty confusing that ^ is in the middle of them.

I wanted to write up a proposal here to outline a new precedence table for operators in Zig, but Zig's grammar is a bit more complicated than the current table in doc/langref.md shows, and it's not as simple as rearranging and combining some rows.

At the very least, I'd like to see required parentheses for: &, ^, |, <<, >> intermixed with each other, and for and and or intermixed with each other. There's more to the proposal, but I need to do a more careful analysis of Zig's grammar before posting something comprehensive.

@andrewrk andrewrk added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label May 2, 2017
@thejoshwolfe
Copy link
Contributor

thejoshwolfe commented May 9, 2017

Here's the real story on Zig's expression grammar (this does not include grammars such as statements, case declarations, top level declarations, etc.). I'll write a follow up comment documenting what i think the expression grammar should be changed to.

There are three types of operator:

  • infix operators: always start with x and end with y, where x and y are some expressions
  • prefix operators: always end with x, where x is some expression
  • suffix operators: always start with x, where x is some expression
  • primary expressions: not really operators, because they are either a single token, or they start and end with unambiguous grouping tokens.

A chainable operator is one that you can use like a + b + c, !!x, or a(b)(c), and has a well-defined associativity. All infix chainable operators are left-to-right associative (e.g. a - b - c == (a - b) - c). Chainable prefix and suffix operators have an obvious associativity (right-to-left and left-to-right respectively). A not-chainable operator cannot be used repeatedly, e.g. inline inline a, a{}{}, and a = b = c are all not allowed.

The following is a list of operator groups. The groups are listed here in order of highest-to-lowest precedence. Each item in a group has equal precedence. If a group is chainable, the operators in the group can be intermixed while chaining (e.g. a.b[c]).

primary expressions:

  • 123 - and variations, where 123 is a number literal
  • "..." - where ... is the characters of the string literal
  • '...' - where ... is the characters of the char literal
  • true, false, null, continue, undefined, error, this, unreachable
  • id - where id is an identifier
  • @id(...) - where id is an identifier and ... is a parameter list
  • goto label - where label is a label
  • break, return - without an expression
  • (x)
  • {...} - where ... is a statement list
  • asm ("...") - and variations, where ... is inline assembly
  • struct { ... } - and variations, where ... is a container member list
  • if (a) {...} - and variations, where ... is a statement list[1]
  • if (a) b else {...} - and variations, where ... is a statement list[1]
  • while (a) {...} - and variations, where ... is a statement list[1]
  • while (a) b else {...} - and variations, where ... is a statement list[1]
  • for (a) {...} - and variations, where ... is a statement list[1]
  • for (a) b else {...} - and variations, where ... is a statement list
  • switch (a) {...} - where ... is a case list
  • comptime {...} - where ... is a statement list[1]

prefix, not chainable:

  • inline x

suffix, chainable:

  • x(...) - where ... is a parameter list
  • x[a] - and variations
  • x.id - where id is an identifier

prefix, chainable:

  • !x
  • -x, -%x
  • ~x
  • *x
  • &x - and variations
  • ??x
  • %%x
  • ?x
  • %x
  • []x - and variations
  • extern fn() -> x - and variations

suffix, not chainable:

  • x{...} - where ... is a container init body

infix, chainable:

  • x * y, x *% y
  • x / y
  • x % y
  • x ** y

infix, chainable:

  • x + y, x +% y
  • x - y, x -% y
  • x ++ y

infix, chainable:

  • x << y, x <<% y
  • x >> y

infix, chainable:

  • x & y

infix, chainable:

  • x ^ y

infix, chainable:

  • x | y

infix, not chainable:

  • x == y - and variations
  • x < y - and variations

infix, chainable:

  • x and y

infix, chainable:

  • x or y

infix, chainable:

  • x ?? y
  • x %% y - and variations

infix, not chainable:

  • x = y - and variations

prefix, chainable:

  • return x
  • break x
  • if (a) x - and variations[1]
  • if (a) b else x - and variations[1]
  • while (a) x - and variations[1]
  • while (a) b else x - and variations[1]
  • for (a) x - and variations[1]
  • for (a) b else x - and variations[1]
  • comptime x[1]

[1]: for the statement-like expressions if, for, while, and comptime, a special exception is made that if a body expression starts with a block (starts with a {), then the body expression ends at the end of that block (at the }). This disallows, for example, if (a) {b} + 1 else {c}.

@andrewrk
Copy link
Member

For what it's worth I just removed the inline prefix operator. (See #306)

@thejoshwolfe
Copy link
Contributor

Here's my proposal for the "operator precedence table". It's a lot more complicated than a table.

Before people automatically reject this grammar for being overly complicated, the goal here is to make Zig source code less complicated. My goal for grouping everything the way I did was so that any time a user would ask "Which of these two operators has precedence over the other.", the answer is that they're incompatible, and you need to use parentheses.

Now clearly it's difficult to predict when users will ask that, and so most of the proposal is what I think is a good idea to me, and it's wildly up for debate and discussion. For example, why did I make modulo not heterogeneously chainable with multiplication? Because it seemed like you wouldn't really need that, and it would be more clear with parentheses I guess. It's pretty subjective, so debate on all points of the proposal is welcome and requested, even if you're just giving your opinion.

What follows is a proposal (that has a lot in common with the existing situation), then a summary of changes from the existing grammar, and then some examples.

Proposal

There are three types of operator:

  • infix operators: always start with x and end with y.
  • prefix operators: always end with x.
  • suffix operators: always start with x.
  • primary expressions: not really operators, because they are either a single token, or they start and end with unambiguous grouping tokens.

The tree below has ordered lists and unordered lists. The ordered lists specify a precedence relation in order of highest to lowest (1 is the highest precedence.). An unordered list specifies a group of items that cannot be intermixed with each other, unless the header for the list specifies otherwise.

Chainability comes in three forms:

  • not chainable. There is no implicit associativity, so you must use parentheses to combine 3 or more operands. e.g. you can't do a == b == c.
  • chaininable somehow. This means for infix operators associativity is implicitly left-to-right when combining 3 or more operands with the operator in between them. e.g. a + b + c is implicitly equivalent to (a + b) + c. When a group (specified with an unordered list) of 2 or more items is chainiable, it will be one of these two subtypes:
    • heterogeneous chainable. This means all members of the group can be intermixed while chaining. e.g. a + b - c.
    • homogeneous chainable. Each member of the group can be chained only with itself. e.g. a & b & c is allowed, but a & b | c is not.
  1. primary expressions, unambiguous:
    • 123 - and variations, where 123 is a number literal
    • "..." - where ... is the characters of the string literal
    • '...' - where ... is the characters of the char literal
    • true, false, null, continue, undefined, error, this, unreachable
    • id - where id is an identifier
    • @id(...) - where id is an identifier and ... is a parameter list
    • goto label - where label is a label
    • break, return - without an expression
    • (x)
    • {...} - where ... is a statement list
    • asm ("...") - and variations, where ... is inline assembly
    • struct { ... } - and variations, where ... is a container member list
    • if (a) {...} - and variations, where ... is a statement list[1]
    • if (a) b else {...} - and variations, where ... is a statement list[1]
    • while (a) {...} - and variations, where ... is a statement list[1]
    • while (a) b else {...} - and variations, where ... is a statement list[1]
    • for (a) {...} - and variations, where ... is a statement list[1]
    • for (a) b else {...} - and variations, where ... is a statement list
    • switch (a) {...} - where ... is a case list
    • comptime {...} - where ... is a statement list[1]
  2. suffix access and call operators, heterogeneous chainable:
    • x(...) - where ... is a parameter list
    • x[a] - and variations
    • x.id - where id is an identifier
  3. prefix high-precedent operators, heterogeneous chainable:
    • !x
    • -x, -%x
    • ~x
    • *x
    • &x - and variations
    • ??x
    • %%x
    • ?x
    • %x
    • []x - and variations
    • extern fn() -> x - and variations
  4. suffix container initialization operator, not chainable:
    • x{...} - where ... is a container init body
  5. infix value manipulation operators:
    • math:
      1. multiplicative:
        • scalar multiplicative, heterogeneous chainable:
          • x * y
          • x *% y
          • x / y
        • modulo, not chainable:
          • x % y
        • repetition, not chainable:
          • x ** y
      2. additive:
        • scalar additive, heterogeneous chainable:
          • x + y
          • x +% y
          • x - y
          • x -% y
        • concatenation, chainable:
          • x ++ y
    • bit shifting, not chainable:
      • x << y
      • x <<% y
      • x >> y
    • bitwise algebra and monad coercion, homogeneous chainable:
      • x & y
      • x ^ y
      • x | y
      • x ?? y
      • x %% y - and variations
  6. infix comparison operators, not chainable:
    • x == y
    • x != y
    • x < y
    • x <= y
    • x > y
    • x >= y
  7. infix boolean algebra operators, homogeneous chainable:
    • x and y
    • x or y
  8. prefix control operators, heterogeneous chainable:
    • return x
    • break x
    • if (a) x - and variations[1]
    • if (a) b else x - and variations[1]
    • while (a) x - and variations[1]
    • while (a) b else x - and variations[1]
    • for (a) x - and variations[1]
    • for (a) b else x - and variations[1]
    • comptime x[1]

[1]: for the statement-like expressions if, for, while, and comptime, a special exception is made that if a body expression starts with a block (starts with a {), then the body expression ends at the end of that block (at the }). This disallows, for example, if (a) {b} + 1 else {c}.

Proposed Changes

  • Most of the "infix value manipulation operators" group is new.
    • Bitshifting cannot be chained.
    • ++ cannot be mixed with + and the like.
    • ** cannot be mixed with * and the like, and cannot be chained.
    • % cannot be mixed with other multiplicative operators, and is not chainable.
    • None of the bit manipulation operators can be intermixed with the math operators. Not even a << b + c.
    • ?? and %% infix operators cannot be intermixed.
  • and and or cannot be intermixed.
  • assignment is no longer an expression operator, but a statement construct. That's why it's not in the above tree.
  • inline is gone; see Proposal: possibility to inline a function on the call site #306.

Examples

  • valid: return a and b == c + d * e;
  • error: x << n + 1. fixed: x << (n + 1).
  • error: x << 1 << n. fixed: x << (1 << n) (or whatever).
  • error: " " ** spaces_per_indent ** indent_level. fixed: " " ** (spaces_per_indent * indent_level).
  • error: foo() ?? bar() %% 0. fixed: foo() ?? (bar() %% 0).

Here's some code that already follows this proposal from std/rand.zig:

  • prev_value = int(i) +% f *% (prev_value ^ (prev_value >> (int.bit_count - 2)));
  • mt.array[i] = mt.array[i + m - n] ^ (x >> 1) ^ mag01[x & 0x1];

Here's some code from std/base64.zig that's fine:

  • dest[out_index] = alphabet[((source[i] & 0x3) <<% 4) | ((source[i + 1] & 0xf0) >> 4)];

But this code from std/base64.zig needs parentheses:

  • dest[dest_index] = ascii6[source[src_index + 2]] <<% 6 | ascii6[source[src_index + 3]];

Here's a curious case from std/rand.zig, where there are too many parentheses:

  • return start + (rand_val % range);

Perhaps that last case is a clue that we should not consider % to be a multiplicative math operator, but rather be a value manipulation operator like & that doesn't mix with any other value manipulation operator.

I want ?? to be chainable because of this case:

  • valid: foo() ?? bar() ?? 0. equivalent to foo() ?? (bar() ?? 0) and (foo() ?? bar()) ?? 0.

But now that I think about that some more, it might not actually be correct. Consider if foo() returns a ??u32. This means (foo() ?? bar()) ?? 0 would unwrap both maybes, but foo() ?? (bar() ?? 0) would not. Hm. So either we need to define ?? as right-to-left associative, or make it not chainable. Similarly with %%.

@andrewrk
Copy link
Member

I agree with this proposal. Regarding the question at the end, let's just make it not chainable.

Regarding %, I think it's related enough to / that it should have the same chain properties as /.

@thejoshwolfe
Copy link
Contributor

can you give a usecase for chaining a % b % c? it seems like you'd only ever want at most one % in a chained expression​.

@andrewrk
Copy link
Member

andrewrk commented May 18, 2017

I meant that you would be able to do return start + rand_val % range

@andrewrk
Copy link
Member

I'll take a look at this before 0.2.0. If anyone wants to see this in 0.1.0 feel free to make a pull request.

@andrewrk andrewrk modified the milestones: 0.2.0, 0.1.0 Jun 17, 2017
@andrewrk andrewrk added the accepted This proposal is planned. label Dec 3, 2017
@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Jan 3, 2018
@andrewrk andrewrk removed this from the 0.3.0 milestone Feb 28, 2018
@tauoverpi
Copy link
Contributor

Having zig fmt automatically fix such expressions with the correct paren nesting zig sees would be incredibly helpful as then there's less of a need to place the parens manually and eliminates the possible learning curve to which expressions need them where. The writing case wouldn't be affected as much apart from having to remove extra parens which almost any editor handles with ease.

I wouldn't mind contributing it once I've gotten my head around the current formatting code and zig in general.

@Mouvedia
Copy link

Mouvedia commented Oct 11, 2020

Based on

I agree with this proposal.

I reckon there are 3 steps to this :

  1. introduce the changes proposed by @thejoshwolfe before 1.0.0
  2. have an experimental branch of fmt that automatically adds parentheses for a chosen subset of the operators
  3. evaluate whether we keep it as is or make parentheses a requirement (hence move the precedence logic to fmt)

@ryancsh
Copy link

ryancsh commented Jan 8, 2022

I think removing infix precedence would make things unambiguous, but also harder to read.

By that I mean: the order in which things happen in would be clear, but would also slow down reading considerably. The human mind prefers infix, because infix is how natural languages (English, French, ...) are structured.

To illustrate:

infix:       2         +         3      
            boy       take      book 


prefix:      +         2         3
            take      boy       book 


postfix:     2         3         +
            boy       book      take 

A more complicated example:

infix:   2 + 3 + 4 + 5 + 6 + 7 + 8 + 9

prefix:  + + + + + + + 2 3 4 5 6 7 8 9

postfix: 2 3 4 5 6 7 8 9 + + + + + + +

With the prefix and postfix notation, I can't even visually make sure the number of operators is correct. Not only is it hard to write, but also hard to read.


However, I do agree that the entire precedence table is difficult to remember and I find myself either looking it up or adding extra parentheses anyway to make things clear.

Thus, one radical way to solve this might be to entirely get rid of operator precedence for all binary operators (operators that take in two operands).

For example, the following two lines would be equivalent:

a * b + c / d % e % f + g << h & i | j and k or f

(((((((((((a * b) + c) / d) % e) % f) + g) << h) & i) | j) and k) or f)

Simply read from left to right. It's still somewhat awkward because of math classes hammering in the idea that multiplication happens before addition, but I think it's a reasonable compromise between readability and unambiguity. At the very least, I prefer this over the prefix / postfix notation.

However, getting rid of the entire precedence table and going left to right would lead to the following:

// consider this:
x += 3 * 5    

// it converts to
(x += 3) * 5

// to get original meaning we would need to do this
x += (3 * 5)

It's somewhat arguable if this really helps or not. Imagine reading through a 3-d math function full of this:

x += ( x - (y * z));
y += ( a + (b / 8));
z -= ( c | (d & e));

I feel like this might be more readable, but I can see somebody disliking it.


TLDR:

  1. Removing infix notation is not worth it. Slows down reading too much.
  2. As an alternative, consider flattening all binary operators precedence, and instead having the leftmost operator acting first (assuming there are no parentheses).
  3. Simple rules over complex ones.

@Srekel
Copy link

Srekel commented Mar 28, 2022

A perhaps minor thing, but I feel like division should need parenthesis.

20 / 2 / 2, is it 1 or 5?
20 / 2 * 2, is it 20 or 5?

Maybe I should be 100% certain after programming for 20 years but... I'm not.

@ominitay
Copy link
Contributor

I think it's fair enough to assume basic mathematical knowledge, with division and multiplication first, left-to-right, and then addition and subtraction, also left-to-right. So 20 / 2 / 2 is the same as (20 / 2) / 2, and 20 / 2 * 2 is (20 / 2) * 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

14 participants