-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should we use whitespace-sensitive operator fixity? #520
Comments
Making whitespace significant around operators seems like a big cost to me:
Since I view these rules as a cost, I have to ask "what is the benefit and is the benefit worth the cost?" And to answer that I feel like I need to know what the alternatives are. Is this only for
I think phrasing the question this way makes it much clearer what the costs and alternatives are. |
Another drawback of making whitespace significant in this way: it seems like it would make Carbon substantially more difficult to parse using tools like lex, yacc, and their derivatives. The best option I've come up with so far is for the lexer to have separate tokens for " There may also be accessibility issues; for example, will screen readers reliably surface the difference between |
I've moved that question out into #523. I'd like to keep this issue focused on the specific approach of using whitespace for this purpose, though I expect the arguments here to drive #523 and vice versa, because the |
I really like the approach used by Swift here from a clear and principled approach that gives us a strong evolutionary path. While we don't currently plan on user-defined operators, this leaves that door open in the future. It also aligns really well with the goals for ensuring language evolution by providing a large space to evolve into with operators in the future, even if they aren't user-defined. Whenever we choose an operator here, we can still be very cautious on a case-by-case basis that any collision with other uses of the same symbols don't become confusing. I really do agree that this is something with high cost and to be generally avoided. Maybe we want to deal with the different meanings of All this said, I find the point raised by @geoffromer really concerning. Making Carbon intractably hard to parse with tools like yacc/bison seems like a pretty unfortunate consequence. I'd like to understand if there is any reasonable way to address this with these kinds of tools... Because if not, I think that's a high cost. FWIW, I also want to address the meta-level cost raised by @josh11b around giving whitespace this level of significance. I originally was reluctant to follow Swift's lead here for exactly these reasons. However, I changed my mind because this is specifically the presence of whitespace being significant, and not the amount of whitespace. To me at least, that seems like a really important difference. Literals, identifiers, keywords, and other syntactic elements are defined by the presence of whitespace as well. And there is a language that has experimented with these rules and I've not heard from anyone that these end up being a source of user confusion with Swift in practice. So while I originally shared the high level concern of leaning on whitespace in this way, these factors have largely convinced me that this would be fine for humans. Regarding divergence from C/C++, changes to encode widespread practice and gain greater evolutionary freedom seem reasonable to me. Especially early on, I would expect us to have a strong ability to diagnose common mistakes here. Regarding edge cases around repeated unary operators -- I feel like this is a somewhat orthogonal issue to choosing fixity of symbols based on whitespace. EIther approach provides similar challenges around My several cents here... |
From the limited digging around I've done, my guess is that it's closer to "ugly and annoying" than "intractable". I'd recommend that we treat this concern as non-blocking for now, but ask that any concrete proposals along these lines include a prototype implementation in |
I think that's a reasonable request. I've put together an example change showing how this can be done for a |
There was a bunch of discussion of this (in the context of #523 where we want postfix The open discussion minutes have some more details, but the suggested initial rules are:
We talked a bunch about whether we can recover well and correct common errors here, and there don't seem to be big problems there for catching the common mistakes. Having some real world experience will also be good for recovery. We may eventually discover enough pain points and need to move toward a set of rules closer to what Swift uses so that we accept more different formatting patterns. But it seems reasonable to wait for those pain points to emerge before we adopt the more complex rules. This also seems to match what Richard has prototyped w/ Flex and Bison. Last but not least, the goal is still to be very cautious in the use of this flexibility. It looks useful for So, what do folks think? This a reasonable place to start? |
Summary of some discussion from open discussion sessions follows. The proposal described in the previous comment received push-back in two directions:
However, the desire to use For the accessibility concern, we observe that the rule we are considering, for the specific case of pointer types and multiplication, will typically be resolving only the failure of the grammar to be LR(1) (or indeed LR(k) for any k), not an actual ambiguity, and to a human we expect the parse to typically be obvious without whitespace cues. In particular, we expect this to be the case because we don't expect types to appear as arbitrary subexpressions much, and instead to mostly appear in the constrained domain of an argument to a function call (where the type will always be followed by For the readability concern, we agreed that this was a real concern, and noted that this is in fact a pre-existing problem with automatic formatters for other languages, often handled by turning off the automatic formatter for the code in question. That outcome seems far from ideal. We further noted that in the motivating cases, the characters / tokens immediately adjacent to the operator directly indicate the intended interpretation: We suggest revising the rule as follows:
This is somewhat closer to the Swift rule than we were previously, but still rejects cases that the Swift rule might accept. Note that the "shall not be whitespace" rule for unary operators is not essential, but we would like to try the more-constraining rule first and only consider relaxing it if we discover it to be a source of friction. Some additional considerations (not discussed in the open discussion session):
What do people think of this revised approach? |
BTW, I checked with folks in the C++ #include group (they have a dedicated accessibility forum) to understand how much constructs like Lots of existing ways to handle this. They're not perfect, but also not a large or even medium problem.
Overall, it doesn't seem to be a pressing problem in need of solving. But (similarly to the visual and parsing side) it also isn't something we would want happening all over the place. So the direction of trying to minimize and/or avoid code having patterns where this might be confusing is basically the right direction. Seems unlikely that we need to stress about the edge cases here given the tools available, provided they really are edge cases. |
Closing this with the decision in @zygoloid's recent comment: #520 (comment) |
#582) The presence or absence of whitespace is used to determine which operator is in use, following the rules described in #520. Support for prefix * dereference operator follows #523. Co-authored-by: Geoff Romer <[email protected]>
#582) The presence or absence of whitespace is used to determine which operator is in use, following the rules described in #520. Support for prefix * dereference operator follows #523. Co-authored-by: Geoff Romer <[email protected]>
For full details, see #168, and in particular this section.
It would be useful to be able to use
*
as all of:... but there are problems with the same operator being both the second and third kind. For example, if we also allow (say)
+
as both an infix operator (for addition / type-type composition) and a prefix operator (like C++'s unary operator+
), then the expressiona * + b
is ambiguous: it could be either(a *) + b
, ora * (+ b)
.There are a few ways to handle this, as detailed in #168. In this issue, I'd like to determine whether we're happy with Swift's answer to this: the fixity of an operator depends on its surrounding whitespace. That is:
a* + b
is(a*) + b
a * +b
isa * (+b)
a * + b
ora* +b
) is an errorMore generally:
We would treat
a.foo
anda->foo
anda[i]
anda(args)
as postfix. For non-symbolic unary operators (egnot
), we can't avoid whitespace in general, but I don't think we anticipate having the same non-symbolic operator with multiple fixities, so that seems unproblematic.Are those rules acceptable?
The text was updated successfully, but these errors were encountered: