Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_parser incorrectly parses groups with Delimiter::None #67062

Open
petrochenkov opened this issue Dec 5, 2019 · 10 comments
Open

rustc_parser incorrectly parses groups with Delimiter::None #67062

petrochenkov opened this issue Dec 5, 2019 · 10 comments
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@petrochenkov
Copy link
Contributor

petrochenkov commented Dec 5, 2019

Such groups can currently be created by proc macros.

Reproduction, proc macro crate:

#[proc_macro]
pub fn add_mul(_: TokenStream) -> TokenStream {
    // ⟪ 2 + 2 ⟫ * 2
    vec![
        TokenTree::from(Group::new(Delimiter::None, "2 + 2".parse().unwrap())),
        TokenTree::from(Punct::new('*', Spacing::Alone)),
        TokenTree::from(Literal::u8_unsuffixed(2)),
    ].into_iter().collect()
}

User crate:

fn main() {
    let x = add_mul!();
    assert_eq!(x, 8); // FAIL: the result is 6 != 8
}

This is not a huge issue right now because proc macros have no reasons to created Delimiter::None delimiters themselves (but they can still get them from input and return them if they return their input partially or fully, see the example below).

However, this will became a huge issue when interpolated tokens like $e: expr migrate from AST pieces to token streams, because in the token stream model they are represented exactly like groups with Delimiter::None delimiters (proc macros already see them in this way).

@petrochenkov petrochenkov added T-lang Relevant to the language team, which will review and decide on the PR/issue. A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. WG-compiler-front A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) labels Dec 5, 2019
@petrochenkov
Copy link
Contributor Author

How to reproduce this without constructing Delimiter::None by yourself.

Proc macro crate:

#[proc_macro]
pub fn add_mul(input: TokenStream) -> TokenStream {
    let mul_2 = vec![
        TokenTree::from(Punct::new('*', Spacing::Alone)),
        TokenTree::from(Literal::u8_unsuffixed(2)),
    ];
    input.into_iter().chain(mul_2.into_iter()).collect()
}

User crate:

macro_rules! mbe {
    ($e: expr) => ( add_mul!($e) )
}

fn main() {
    let x = mbe!(2 + 2);
    assert_eq!(x, 8); // FAIL: the result is 6 != 8
}

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Dec 5, 2019

What it means to parse Delimiter::None correctly is a big language design question in general.

However, it's clear that in pieces of grammar with expression-like priorities they should behave more or less like parentheses to keep the "operator priority hygiene" property of our macros.

"How would an interpolated token behave here?" should probably be the primary intuition here.

@petrochenkov
Copy link
Contributor Author

cc @dtolnay @alexcrichton
To make you aware of this issue, and perhaps proc-macro2 and syn could attempt hiding this bug somehow.

@petrochenkov
Copy link
Contributor Author

Encountered in practice in #74036.

nnethercote added a commit to nnethercote/rust that referenced this issue Apr 29, 2022
`make_tokenstream` has three commented hacks, and a comment at the top
referring to rust-lang#67062. These hacks have no observable effect, at least as judged
by running the test suite. The hacks were added in rust-lang#82608, with an explanation
[here](rust-lang#82608 (comment)). It
appears that one of the following is true: (a) they never did anything useful,
(b) they do something useful but we have no test coverage for them, or (c)
something has changed in the meantime that means they are no longer necessary.

This commit removes the hacks and the comments, in the hope that (b) is not
true.
JohnTitor added a commit to JohnTitor/rust that referenced this issue May 11, 2022
…cks, r=Aaron1011

Remove hacks in `make_token_stream`.

`make_tokenstream` has three commented hacks, and a comment at the top
referring to rust-lang#67062. These hacks have no observable effect, at least as judged
by running the test suite. The hacks were added in rust-lang#82608, with an explanation
[here](rust-lang#82608 (comment)). It
appears that one of the following is true: (a) they never did anything useful,
(b) they do something useful but we have no test coverage for them, or (c)
something has changed in the meantime that means they are no longer necessary.

This commit removes the hacks and the comments, in the hope that (b) is not
true.

r? `@Aaron1011`
calebcartwright pushed a commit to calebcartwright/rust that referenced this issue Jun 13, 2022
…cks, r=Aaron1011

Remove hacks in `make_token_stream`.

`make_tokenstream` has three commented hacks, and a comment at the top
referring to rust-lang#67062. These hacks have no observable effect, at least as judged
by running the test suite. The hacks were added in rust-lang#82608, with an explanation
[here](rust-lang#82608 (comment)). It
appears that one of the following is true: (a) they never did anything useful,
(b) they do something useful but we have no test coverage for them, or (c)
something has changed in the meantime that means they are no longer necessary.

This commit removes the hacks and the comments, in the hope that (b) is not
true.

r? `@Aaron1011`
@nnethercote
Copy link
Contributor

I have made a serious attempt to fix this issue in #114647. The commits there eliminate Nonterminal and Token::Interpolate, and stop the stripping of invisible delimiters in the parser. This basically works for declarative macros, where pasted code fragments have a known kind, such as expr, stmt, item, etc. But it's harder for proc macros, where the produced code fragments have unknown kinds. A lot of the time the kind is clear from context, but there are certain cases where it's very difficult to know how a proc-macro-produced fragment should be parsed. See this comment for details. This seems hard to work around, and could be a potential showstopper.

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 18, 2024
…ors,cjgillot

macro_rules: Preserve all metavariable spans in a global side table

This PR preserves spans of `tt` metavariables used to pass tokens to declarative macros.
Such metavariable spans can then be used in span combination operations like `Span::to` to improve all kinds of diagnostics.

Spans of non-`tt` metavariables are currently kept in nonterminal tokens, but the long term plan is remove all nonterminal tokens from rustc parser and rely on the proc macro model with invisible delimiters (rust-lang#114647, rust-lang#67062).
In particular, `NtIdent` nonterminal (corresponding to `ident` metavariables) becomes easy to remove when this PR lands (rust-lang#119412 does it).

The metavariable spans are kept in a global side table keyed by `Span`s of original tokens.
The alternative to the side table is keeping them in `SpanData` instead, but the performance regressions would be large because any spans from tokens passed to declarative macros would stop being inline and would work through span interner instead, and the penalty would be paid even if we never use the metavar span for the given original span.
(But also see the comment on `fn maybe_use_metavar_location` describing the map collision issues with the side table approach.)

There are also other alternatives - keeping the metavar span in `Token` or `TokenTree`, but associating it with `Span` itsel is the most natural choice because metavar spans are used in span combining operations, and those operations are not necessarily tied to tokens.
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 18, 2024
…ors,cjgillot

macro_rules: Preserve all metavariable spans in a global side table

This PR preserves spans of `tt` metavariables used to pass tokens to declarative macros.
Such metavariable spans can then be used in span combination operations like `Span::to` to improve all kinds of diagnostics.

Spans of non-`tt` metavariables are currently kept in nonterminal tokens, but the long term plan is remove all nonterminal tokens from rustc parser and rely on the proc macro model with invisible delimiters (rust-lang#114647, rust-lang#67062).
In particular, `NtIdent` nonterminal (corresponding to `ident` metavariables) becomes easy to remove when this PR lands (rust-lang#119412 does it).

The metavariable spans are kept in a global side table keyed by `Span`s of original tokens.
The alternative to the side table is keeping them in `SpanData` instead, but the performance regressions would be large because any spans from tokens passed to declarative macros would stop being inline and would work through span interner instead, and the penalty would be paid even if we never use the metavar span for the given original span.
(But also see the comment on `fn maybe_use_metavar_location` describing the map collision issues with the side table approach.)

There are also other alternatives - keeping the metavar span in `Token` or `TokenTree`, but associating it with `Span` itsel is the most natural choice because metavar spans are used in span combining operations, and those operations are not necessarily tied to tokens.
@ijackson
Copy link
Contributor

This is not a huge issue right now because proc macros have no reasons to created Delimiter::None delimiters themselves (but they can still get them from input and return them if they return their input partially or fully, see the example below).

#124974 demonstrates how this can produce wrong answers when a macro_rules macro is combined with a proc_macro that manipulates a TokenStream. The two macros might be fine in isolation, and the combining program need not even have thought about this potential bug. This seems a troublesome hazard to me.

Imagine a proc macro that accepts an attribute whose argument is supposed to be an expression. Ideally, the macro would use a group when emitting the output, and it might use a None-delimited group for prettier output. Such a careful macro's efforts would be undermined by this bug.

derive-deftly is trying to be such a macro, and derive-deftly's users are potentially affected by this bug. (derive-deftly#68)

@ijackson
Copy link
Contributor

In the meantime I will make an MR that at least notes in the docs that this doesn't work reliably.

@petrochenkov
Copy link
Contributor Author

In the meantime I will make an MR that at least notes in the docs that this doesn't work reliably.

Like this one - #124389 ?

@ijackson
Copy link
Contributor

Like this one - #124389 ?

Err, yes :-). Thanks for the pointer!

fmease added a commit to fmease/rust that referenced this issue May 23, 2024
…enkov

Add a warning to proc_macro::Delimiter::None that rustc currently does not respect it.

It does not provide the behaviour it is indicated to provide when used in a proc_macro context.

This seems to be a bug, (rust-lang#67062), but it is a long standing one, and hard to discover.

This pull request adds a warning to inform users of this issue, with a link to the relevant issue, and a version number of the last known affected rustc version.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue May 23, 2024
Rollup merge of rust-lang#124389 - CensoredUsername:master, r=petrochenkov

Add a warning to proc_macro::Delimiter::None that rustc currently does not respect it.

It does not provide the behaviour it is indicated to provide when used in a proc_macro context.

This seems to be a bug, (rust-lang#67062), but it is a long standing one, and hard to discover.

This pull request adds a warning to inform users of this issue, with a link to the relevant issue, and a version number of the last known affected rustc version.
@CensoredUsername
Copy link
Contributor

Added a warning in the documentation for this in the documentation to stop people from getting confused by it in #124389.

Documenting this so it can be removed when this issue is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants