Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend RFC 550 with misc. follow set corrections #1494

Merged
merged 3 commits into from
Apr 8, 2016

Conversation

durka
Copy link
Contributor

@durka durka commented Feb 8, 2016

RFC 550 introduced follow sets for future-proofing macro definitions. They have been tweaked since then to reflect the realities of extant Rust syntax, or to bail out crates that were too-big-to-fail.

The idea (insofar as I understand it) is that the follow set for a fragment specifier X contains any tokens or symbols that could follow a complete X in current (or planned) valid Rust syntax. In particular, FOLLOW(X) does not contain anything that could be part of X (silly example: + is not in FOLLOW(expr) because obviously + may be in the middle of an expression). A macro may not specify syntax that has an X followed by something not in FOLLOW(X). That way, if in the future we extend Rust syntax so that X can encompass more things (that were previously not in the follow set), the macro doesn't break.

The upshot is that if there is something that can currently follow X in valid Rust, but it is not in FOLLOW(X) and this is unlikely to change in the future, it is simply an unnecessary roadblock to macro writing, because a change that would break a macro would also break regular syntax. This RFC amendment proposes to remove two one of those roadblocks. (If there are more that I missed, please suggest them.)

Specifically:

  • Allow ty (and path) fragments to be followed by block fragments. Precedent is function and closure declarations, i.e. fn foo() -> TYPE BLOCK. Indeed you can already write $t:ty { $($foo:tt)* } in a macro rule, so $t:ty $b:block is natural. And FOLLOW(path) = FOLLOW(ty) because a path can name a type.
  • Allow pat fragments to be followed by :. Precedent is let-bindings and function/closure arguments, i.e. let PAT: TYPE = EXPR.

RFC 550 introduced follow sets for future-proofing macro definitions. They have been tweaked since then to reflect the realities of extant Rust syntax, or to bail out crates that were too-big-to-fail.

The idea (insofar as I understand it) is that the follow set for a fragment specifier `X` contains any tokens or symbols that could follow `X` in current (or planned) valid Rust syntax. In particular, `FOLLOW(X)` does _not_ contain anything that could be part of `X` (silly example: `+` is not in `FOLLOW(expr)` because obviously `+` may be in the middle of an expression). A macro may not accept syntax that has an `X` followed by something not in `FOLLOW(X)`. That way, if in the future we extend Rust syntax so that `X` can encompass more things (that were previously not in the follow set), the macro doesn't break.

The upshot is that if there is something that can _currently_ follow `X` in valid Rust, but it is not in `FOLLOW(X)`, it is simply an unnecessary roadblock to macro writing, because a change that would break a macro would also break regular syntax. This RFC amendment proposes to remove two of those roadblocks. (If there are more that I missed, please suggest them.)

Specifically:

- Allow `ty` (and `path`) fragments to be followed by `block` fragments. Precedent is function and closure declarations, i.e. `fn foo() -> TYPE BLOCK`. Indeed you can already write `$t:ty { $($foo:tt)* }` in a macro rule, so `$t:ty $b:block` is natural. And `FOLLOW(path) = FOLLOW(ty)` because a path can name a type.
- Allow `pat` fragments to be followed by `:`. Precedent is let-bindings and function/closure arguments, i.e. `let PAT: TYPE = EXPR`.
@alexcrichton alexcrichton added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 8, 2016
@durka
Copy link
Contributor Author

durka commented Feb 9, 2016

Implemented here. I found out that my notation Interpolated(NtBlock(_)) is not quite correct, but I don't know exactly how it should be written in the RFC. (edit: changed to fictional Nonterminal(Block) syntax)

@pnkfelix
Copy link
Member

pnkfelix commented Feb 9, 2016

I'd have to re-read the type ascription RFC to be sure, but I think the logic given in the description to justify adding colon to follow of pat does not quite hold up,

In particular, we could change the grammar to say:

pat ::= pat ':' type | ...
stmt ::= 'let' pat '=' expr ';'  | ...

which would invalidate the logic given here to justify putting colon in pat's follow set.

(This also demonstrates why the follow sets are not simply mechanically derivable from the grammar)


Update: such a change to the grammar would also take it very far afield from being LL(1); I don't think that is as much of a concern as it used to be, but I would love to be proven wrong on that point. )

@durka
Copy link
Contributor Author

durka commented Feb 9, 2016

Good point. Though type ascription turns expr ':' expr into an expr. And with the above change you'd have to make function/closure arguments pat ',' pat ',' .... And I wonder what a "type-ascripted pattern" means in match.

@durka
Copy link
Contributor Author

durka commented Feb 9, 2016

I read a bit of the context on type ascription, including the fact that type ascription on patterns was considered before 1.0 but postponed (and is still currently postponed). So it seems like Colon should be kept out of FOLLOW(pat). I'll revise the PRs accordingly.

@durka
Copy link
Contributor Author

durka commented Feb 17, 2016

While I continue to understand the motivation for not adding : to FOLLOW(pat), it means the current situation is that it's impossible to parse closure/function arguments in a macro. Concretely, the best parser I have for closures is

$(move)* |$($p:ident $(: $t:ty)*),*| $(-> $rt:ty)* { $($closure:tt)* }

which simply disallows pattern arguments. Another alternative is

$(move)* |$($p:pat $(=> $t:ty)*),*| $(-> $rt:ty)* { $($closure:tt)* }

which allows patterns but bizarrely uses => for type annotation instead of : because it happens to be in the follow set.

I briefly thought the upcoming #![feature(type_ascription)] might make this better, as $e:expr could be used instead of $p:pat : $t:ty, but alas | is not in FOLLOW(expr) so |$($e:expr),*| is not a valid matcher.

Do you have any ideas for improving this situation?

(cc @talchas @tikue we were talking about this on IRC)

@dgrunwald
Copy link
Contributor

I just ran into the $t:ty $b:block problem.

Is there some workaround for that? I tried $t:ty { $( $b:stmt )* }, but that doesn't seem to work properly accept blocks with multiple statements. I also tried $t:ty { $( $b:tt )* } but that doesn't work when there are macro invocations in the body.

Also, I think this RFC should be uncontroversial now that the pat change was removed; can we get this back on track?

@durka
Copy link
Contributor Author

durka commented Mar 6, 2016

@dgrunwald yeah we should move this forward, but { $($b:tt)* } should work in all situations, no? You might need a as_something! auxiliary macro as explained here.

@dgrunwald
Copy link
Contributor

@durka: Thanks, AST coercion does the trick.

I now got a Python class in my Rust program. 😄 🎉 🎈

@pnkfelix
Copy link
Member

pnkfelix commented Mar 7, 2016

Nominating for discussion at Lang team mtg

@pnkfelix
Copy link
Member

Just noting mostly to track progress/discussion: lang team currently thinks that the goal is good, but it may have concerns about whether the manner in which it is specified is quite right.

Namely, is putting a construct like Nonterminal(Block) into the follow set what we want, or should it instead be doing something saying that when $b:block, FIRST($b) is { '{' } rather than the simple NT { $b }.

(I think the manner of specification here is okay, and properly matches the way that RFC 550 is specified, but the situation is worth reviewing carefully.)

@durka
Copy link
Contributor Author

durka commented Mar 14, 2016

@pnkfelix the tests that I wrote pass with this way of specifying it, if you want I can try it the other way as well...

@durka
Copy link
Contributor Author

durka commented Mar 29, 2016

What's the next step here? The report from the lang team above is frustratingly ambiguous.

@pnkfelix
Copy link
Member

@durka I'm actually going to re-nominate the RFC for consideration at the next lang team mtg, using the basic argument as follows:

The lang team has pointed out that a more ambitious generalization of the future-proofing rules could be applied here.

  • In particular, instead of adding NonTerminal(block) to the follow-set for ty, we could instead leave the primitive follow-set definition unchanged, and instead revise the future-proofing specification to say that the codomain of FIRST and LAST are sets of non-NT tokens, and then define things such that FIRST(NonTerminal(block)) = = { OpenDelim(Brace) }
  • With that infrastructure in place, we would immediately deduce that FOLLOW(ty) ⊇ FIRST(NonTerminal(block)) (because OpenDelim(Brace) ∈ FOLLOW(ty)), achieving the primary goal of this RFC, as well as allowing a number of other macro definitions through (probably).

The above idealized goal may be desirable; however, I argue: There is no actual proposed amendment to extend the future-proofing rules in the manner hypothesized above. Any such amendment that I can imagine would solely generalize the effects of this RFC amendment (#1494). Therefore, we can continue to consider making such a change in the future, but we can and should merge amendment #1494 in the meantime.

@durka
Copy link
Contributor Author

durka commented Mar 30, 2016

Er, since when can a block start with a semicolon?

@pnkfelix
Copy link
Member

Er, since when can a block start with a semicolon?

Oops, fixed. Thanks!

@durka
Copy link
Contributor Author

durka commented Mar 31, 2016

Your reasoning sounds good to me. I'm also interested in the lang team's ideas, if any, on how to make it possible to parse closure syntax.

@pnkfelix
Copy link
Member

Note for Lang team: one problem with the hypothesized generalization is that we would have to define a peimitive FIRST(nt) to set of non-NT tokens (the same way this RFC defines FOLLOW(nt), and future language changes would have to be compatible with (all previous definitions of) FIRST. This is likely to be more restrictive to future grammar extensions than the current setup as used in the amendment as written

@aturon aturon added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed I-nominated labels Apr 1, 2016
@aturon
Copy link
Member

aturon commented Apr 1, 2016

💬 This RFC is entering its Final Comment Period! 💬

@nikomatsakis nikomatsakis merged commit 449c6a6 into rust-lang:master Apr 8, 2016
@nikomatsakis
Copy link
Contributor

It's official! The @rust-lang/lang team has decided to accept this RFC.

Manishearth added a commit to Manishearth/rust that referenced this pull request Apr 15, 2016
implement RFC amendment 1494

Adds `:block` to the follow set for `:ty` and `:path`. See rust-lang/rfcs#1494.
@Centril Centril added A-macros Macro related proposals and issues A-syntax Syntax related proposals & ideas labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Macro related proposals and issues A-syntax Syntax related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants