Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce proc_macro::Span::source_text #55780

Merged
merged 1 commit into from
Mar 27, 2019
Merged

Introduce proc_macro::Span::source_text #55780

merged 1 commit into from
Mar 27, 2019

Conversation

ogoffart
Copy link
Contributor

@ogoffart ogoffart commented Nov 8, 2018

A function to extract the actual source behind a Span.

Background: I would like to use syn in a build.rs script to parse the rust code, and extract part of the source code. However, syn only gives access to proc_macro2::Span, and i would like to get the source code behind that.
I opened an issue on proc_macro2 bug tracker for this feature dtolnay/proc-macro2#110 and @alexcrichton said the feature should first go upstream in proc_macro. So there it is!

Since most of the Span API is unstable anyway, this is guarded by the same proc_macro_span feature as everything else.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 8, 2018
@nikomatsakis
Copy link
Contributor

So this code seems fine, but I'm not sure from a procedural and stability point of view what is the best way to handle this.

@nikomatsakis
Copy link
Contributor

cc @dtolnay @petrochenkov @alexcrichton -- thoughts?

@ogoffart
Copy link
Contributor Author

One doubt i had was if we should return None , instead of the macro call inside for span belonging to the call site. (reemit! example in the test)

@alexcrichton
Copy link
Member

This seems like a reasonable API edition to me and one that we'll want in the long haul. If any procedural macro has whitespace-sensitive parsing associated with it then accessing the source text via means like this is intended to be the main way to actually do the parsing.

I don't think we're on track to stabilize this in the near term, but in terms of a long-term addition I think we'll want this which to me means it's fine to land unstable for now in proc_macro

@dtolnay
Copy link
Member

dtolnay commented Nov 12, 2018

We might want to strip comments. What do others think? I can get on board with whitespace-sensitive macro DSLs such as languages that differentiate between a-b and a - b. But I would like macros to be forced to use /// and /** */ for any assignment of meaning to text within comments, with // and /* */ guaranteed to be meaningless.

@alexcrichton
Copy link
Member

I could go either way on comments personally, but one aspect about omitting comments that may be a bit odd is if the difference of byte positions of a span is very different from the length of the source text due to comment removal

@dtolnay
Copy link
Member

dtolnay commented Nov 12, 2018

Good call. We could sub out the comment with spaces.

@alexcrichton
Copy link
Member

Seems plausible to me!

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Nov 14, 2018

I ... I don't know. If we're going to give the source text, I'm inclined to just give the source text, and let macros do weird things with comments. Let the market decide. =)

e.g., sometimes people add "pre and post conditions" in the form of specially formatted comments. That seems not terrible to me.

@ogoffart
Copy link
Contributor Author

I think we should keep preserve the comment.

As an usecase, the main reason I'm doing this change is for the cpp crate which extract C++ code. And people use comments in C++ to annotate things for static analyzers. (For example, gcc's -Wimplicit-fallthrough warning understands the /* falls through */ comments in the code.)
(I know that Rust and C++ have different lexing rules regarding comments, but I assume developers can cope with that)

Another usecase would be to print snippets of the code while compiling for better diagnostics. We wants the comments in this case.

@nikomatsakis
Copy link
Contributor

@ogoffart interesting. Makes sense to me.

@ogoffart
Copy link
Contributor Author

What should I do now?

@ogoffart
Copy link
Contributor Author

@nikomatsakis ping?

src/libproc_macro/lib.rs Outdated Show resolved Hide resolved
src/libproc_macro/lib.rs Outdated Show resolved Hide resolved
@Centril Centril dismissed their stale review November 22, 2018 18:01

Requested changes done.

@Centril
Copy link
Contributor

Centril commented Nov 22, 2018

I'm worried about giving guarantees to users about whitespace and comments because that forces alternative Rust compiler implementations into preserving such things rather than just throwing such things away permanently during lexing. In other words, should we give a guarantee, this effectively forces all Rust compilers to use a certain compilation model and makes that part of the specification.

If this was not a guarantee but rather "at the compilers option, you may get whitespace and comments..." then I'd be less worried.

@ogoffart
Copy link
Contributor Author

That's why it returns an Optional. If the compiler do not have access to the actual source code, it can return None.

@Centril
Copy link
Contributor

Centril commented Nov 23, 2018

@ogoffart Ah; I thought

It only returns a result if the span corresponds to real source code.

referred only to getting None when the code was produced by macros and such...

Can we clarify this in the documentation somehow that compilers are not required to give you the actual source code even in cases where it's not produced by macros?

@petrochenkov
Copy link
Contributor

It would be good to somehow document this as unstable, "best effort" and restricted to "for diagnostics only".
If the macro succeds then the observable result should only rely on tokens, but not on this text.

@Centril
Copy link
Contributor

Centril commented Nov 23, 2018

@petrochenkov Yeah; "best effort" / "for diagnostics only" sounds like appropriate wording; thank you <3.

@roblabla
Copy link
Contributor

My specific use-case is a power_assert macro. I want an assertion macro that has the following output:

thread '<main>' panicked at 'assertion failed: bar.val == bar.foo.val
power_assert!(bar.val == bar.foo.val)
              |   |   |  |   |   |
              |   3   |  |   |   2
              |       |  |   Foo { val: 2 }
              |       |  Bar { val: 3, foo: Foo { val: 2 } }
              |       false
              Bar { val: 3, foo: Foo { val: 2 } }
', examples/normal.rs:26

In order to do this, I get the span of the full expression (bar.val == bar.foo.val), and then the span of each internal component. By looking at the Span::start(), I am able to place the labels at the correct position (basically, component.start().column - full.start().column will give me the column the expression starts at within the full expression).

For this to work, Span::start() and the string I print out need to match.

If this was not a guarantee but rather "at the compilers option, you may get whitespace and comments..." then I'd be less worried.

If we don't get whitespace and comments, then we run the risk of having Span::start() become out of sync with the raw text, breaking the above functionality if a comment was put inside the assert macro.

@ogoffart
Copy link
Contributor Author

@roblabla: do you take in to account the fact that the column is in utf-8 bytes.

   /* 🐘 */  power_assert!(normalize("🐘") /* Éléphant emoji */ == "Éléphant" );

In order to do that, you indeed need to know what exactly is in the comments (how many byte, corresponds to how many code points) (I guess this should be computed with UnicodeWidthStr::width(...))

@Centril
Copy link
Contributor

Centril commented Nov 23, 2018

@roblabla

If we don't get whitespace and comments, then we run the risk of having Span::start() become out of sync with the raw text, breaking the above functionality if a comment was put inside the assert macro.

Can you not have some fallback such that the power_assert! macro just gives less good "diagnostics" when Span::start() returns None? It seems to me that you'll have to handle that anyway if power_assert! is done inside a macro (the macro wouldn't be very good if you couldn't...)? Is there some difference in terms of correctness if None is returned here?

@ogoffart
Copy link
Contributor Author

I added a note that this should not be relied upon, and is only there for diagnostics.

@bors

This comment has been minimized.

@nikomatsakis
Copy link
Contributor

Or maybe because it's just an unstable addition, we can "just do it"? If so, is there some place that needs to be updated (a tracking issue, etc?)

@nikomatsakis
Copy link
Contributor

The code seems fine to me, I just want to ensure that we don't lose track of this random thing.

@nikomatsakis
Copy link
Contributor

I guess that would be #54725

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 11, 2019

Thinking about this a bit more, I feel like there is quite a number of considerations and unknowns that came up on this thread (e.g., "comments or not?" etc), and I'm a bit reluctant to just r+ this without at least recording those. So I guess I would say, could someone produce a brief summary of the conversation and in particular the unknowns?

Then we can put that in the tracking issue and I would feel pretty good about an r+.

(I may have time to do that on monday, gotta run right now)

@Centril
Copy link
Contributor

Centril commented Jan 11, 2019

I feel like procedurally this probably requires an FCP. But what team? I guess technically this is a libs API? But it feels like compiler team should check off?

@nikomatsakis I believe most changes to the proc macro APIs are shared between T-Libs and T-Lang so both of those teams. :)

@nikomatsakis
Copy link
Contributor

@Centril seems sensible. Regardless, I think what we need most at this juncture is a kind of capsule summary of the conversation and in particular highlighting the alternative designs that were visited and the reasons for the current one.

@Centril Centril added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 15, 2019
@Centril
Copy link
Contributor

Centril commented Jan 15, 2019

@nikomatsakis Fair; I've nominated to discuss this a bit on Thursday. :)

@nikomatsakis
Copy link
Contributor

One thing we might want to note in any summary:

There are a variety of possible strings you might return here. For example, if you had foo!($a) from inside a macro-rules invocation, we might see the $a substituted -- or now. We should document the return value and check for feedback as to whether it feels like it is the "right" one. Or maybe return None in tricky cases.

@matklad
Copy link
Member

matklad commented Feb 1, 2019

unknowns

Another small unknown is line-endings. Ideally, the meaning of Rust program should be independent of line endings used (because, for example, gitconfig might change the line endings). I think currently this is more or less the case: for example, line endings in string literals seem to be normalized to \n. For this API, we might want to normalize newlines as well!

@Dylan-DPC-zz
Copy link

ping from triage @nikomatsakis what's the update on this?

@Mark-Simulacrum
Copy link
Member

@nikomatsakis Could you post a summary of the current state of this pull request?

(triage)

@Dylan-DPC-zz
Copy link

ping from triage anyone from @rust-lang/lang @rust-lang/libs can review this?

@Centril
Copy link
Contributor

Centril commented Mar 18, 2019

I'm going to r? @petrochenkov for now (please make a fresh issue number and tracking issue for it) so that this PR can be dealt with. We don't have to figure out everything just now.

@petrochenkov
Copy link
Contributor

I'm ok with this as long as this is unstable and documented as best effort.
I guess the use experience will show whether any normalization is necessary or not.

proc_macro_span is a pretty heterogeneous feature and is very unlikely to be stabilized in one (or even two) step(s), so it's probably ok to track this with #54725 as well.
#54725 also contains some discussion about incremental, which is relevant for this function as well.

@bors r+

@bors
Copy link
Contributor

bors commented Mar 26, 2019

📌 Commit e88b0d9 has been approved by petrochenkov

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 26, 2019
@bors
Copy link
Contributor

bors commented Mar 27, 2019

⌛ Testing commit e88b0d9 with merge c5fb4d0...

bors added a commit that referenced this pull request Mar 27, 2019
Introduce proc_macro::Span::source_text

A function to extract the actual source behind a Span.

Background: I would like to use `syn` in a `build.rs` script to parse the rust code, and extract part of the source code. However, `syn` only gives access to proc_macro2::Span, and i would like to get the source code behind that.
I opened an issue on proc_macro2 bug tracker for this feature dtolnay/proc-macro2#110  and @alexcrichton said the feature should first go upstream in proc_macro.  So there it is!

Since most of the Span API is unstable anyway, this is guarded by the same `proc_macro_span` feature as everything else.
@bors
Copy link
Contributor

bors commented Mar 27, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: petrochenkov
Pushing c5fb4d0 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 27, 2019
@bors bors merged commit e88b0d9 into rust-lang:master Mar 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.