Introduce proc_macro::Span::source_text #55780

ogoffart · 2018-11-08T09:45:36Z

A function to extract the actual source behind a Span.

Background: I would like to use syn in a build.rs script to parse the rust code, and extract part of the source code. However, syn only gives access to proc_macro2::Span, and i would like to get the source code behind that.
I opened an issue on proc_macro2 bug tracker for this feature dtolnay/proc-macro2#110 and @alexcrichton said the feature should first go upstream in proc_macro. So there it is!

Since most of the Span API is unstable anyway, this is guarded by the same proc_macro_span feature as everything else.

rust-highfive · 2018-11-08T09:45:46Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

nikomatsakis · 2018-11-12T19:19:00Z

So this code seems fine, but I'm not sure from a procedural and stability point of view what is the best way to handle this.

nikomatsakis · 2018-11-12T19:24:00Z

cc @dtolnay @petrochenkov @alexcrichton -- thoughts?

ogoffart · 2018-11-12T21:25:11Z

One doubt i had was if we should return None , instead of the macro call inside for span belonging to the call site. (reemit! example in the test)

alexcrichton · 2018-11-12T21:59:13Z

This seems like a reasonable API edition to me and one that we'll want in the long haul. If any procedural macro has whitespace-sensitive parsing associated with it then accessing the source text via means like this is intended to be the main way to actually do the parsing.

I don't think we're on track to stabilize this in the near term, but in terms of a long-term addition I think we'll want this which to me means it's fine to land unstable for now in proc_macro

dtolnay · 2018-11-12T22:10:28Z

We might want to strip comments. What do others think? I can get on board with whitespace-sensitive macro DSLs such as languages that differentiate between a-b and a - b. But I would like macros to be forced to use /// and /** */ for any assignment of meaning to text within comments, with // and /* */ guaranteed to be meaningless.

alexcrichton · 2018-11-12T22:51:55Z

I could go either way on comments personally, but one aspect about omitting comments that may be a bit odd is if the difference of byte positions of a span is very different from the length of the source text due to comment removal

dtolnay · 2018-11-12T22:56:03Z

Good call. We could sub out the comment with spaces.

alexcrichton · 2018-11-12T23:10:00Z

Seems plausible to me!

nikomatsakis · 2018-11-14T20:33:57Z

I ... I don't know. If we're going to give the source text, I'm inclined to just give the source text, and let macros do weird things with comments. Let the market decide. =)

e.g., sometimes people add "pre and post conditions" in the form of specially formatted comments. That seems not terrible to me.

ogoffart · 2018-11-15T11:05:56Z

I think we should keep preserve the comment.

As an usecase, the main reason I'm doing this change is for the cpp crate which extract C++ code. And people use comments in C++ to annotate things for static analyzers. (For example, gcc's -Wimplicit-fallthrough warning understands the /* falls through */ comments in the code.)
(I know that Rust and C++ have different lexing rules regarding comments, but I assume developers can cope with that)

Another usecase would be to print snippets of the code while compiling for better diagnostics. We wants the comments in this case.

nikomatsakis · 2018-11-16T20:12:25Z

@ogoffart interesting. Makes sense to me.

ogoffart · 2018-11-17T09:41:29Z

What should I do now?

ogoffart · 2018-11-21T12:29:40Z

@nikomatsakis ping?

src/libproc_macro/lib.rs

Requested changes done.

Centril · 2018-11-22T18:06:58Z

I'm worried about giving guarantees to users about whitespace and comments because that forces alternative Rust compiler implementations into preserving such things rather than just throwing such things away permanently during lexing. In other words, should we give a guarantee, this effectively forces all Rust compilers to use a certain compilation model and makes that part of the specification.

If this was not a guarantee but rather "at the compilers option, you may get whitespace and comments..." then I'd be less worried.

ogoffart · 2018-11-23T08:30:05Z

That's why it returns an Optional. If the compiler do not have access to the actual source code, it can return None.

Centril · 2018-11-23T08:37:55Z

@ogoffart Ah; I thought

It only returns a result if the span corresponds to real source code.

referred only to getting None when the code was produced by macros and such...

Can we clarify this in the documentation somehow that compilers are not required to give you the actual source code even in cases where it's not produced by macros?

petrochenkov · 2018-11-23T10:17:52Z

It would be good to somehow document this as unstable, "best effort" and restricted to "for diagnostics only".
If the macro succeds then the observable result should only rely on tokens, but not on this text.

Centril · 2018-11-23T11:55:58Z

@petrochenkov Yeah; "best effort" / "for diagnostics only" sounds like appropriate wording; thank you <3.

roblabla · 2018-11-23T13:15:18Z

My specific use-case is a power_assert macro. I want an assertion macro that has the following output:

thread '<main>' panicked at 'assertion failed: bar.val == bar.foo.val
power_assert!(bar.val == bar.foo.val)
              |   |   |  |   |   |
              |   3   |  |   |   2
              |       |  |   Foo { val: 2 }
              |       |  Bar { val: 3, foo: Foo { val: 2 } }
              |       false
              Bar { val: 3, foo: Foo { val: 2 } }
', examples/normal.rs:26

In order to do this, I get the span of the full expression (bar.val == bar.foo.val), and then the span of each internal component. By looking at the Span::start(), I am able to place the labels at the correct position (basically, component.start().column - full.start().column will give me the column the expression starts at within the full expression).

For this to work, Span::start() and the string I print out need to match.

If this was not a guarantee but rather "at the compilers option, you may get whitespace and comments..." then I'd be less worried.

If we don't get whitespace and comments, then we run the risk of having Span::start() become out of sync with the raw text, breaking the above functionality if a comment was put inside the assert macro.

ogoffart · 2018-11-23T14:04:35Z

@roblabla: do you take in to account the fact that the column is in utf-8 bytes.

   /* 🐘 */  power_assert!(normalize("🐘") /* Éléphant emoji */ == "Éléphant" );

In order to do that, you indeed need to know what exactly is in the comments (how many byte, corresponds to how many code points) (I guess this should be computed with UnicodeWidthStr::width(...))

Centril · 2018-11-23T14:08:21Z

@roblabla

If we don't get whitespace and comments, then we run the risk of having Span::start() become out of sync with the raw text, breaking the above functionality if a comment was put inside the assert macro.

Can you not have some fallback such that the power_assert! macro just gives less good "diagnostics" when Span::start() returns None? It seems to me that you'll have to handle that anyway if power_assert! is done inside a macro (the macro wouldn't be very good if you couldn't...)? Is there some difference in terms of correctness if None is returned here?

ogoffart · 2018-11-23T15:03:38Z

I added a note that this should not be relied upon, and is only there for diagnostics.

nikomatsakis · 2019-01-11T22:30:45Z

Or maybe because it's just an unstable addition, we can "just do it"? If so, is there some place that needs to be updated (a tracking issue, etc?)

nikomatsakis · 2019-01-11T22:31:15Z

The code seems fine to me, I just want to ensure that we don't lose track of this random thing.

nikomatsakis · 2019-01-11T22:31:46Z

I guess that would be #54725

nikomatsakis · 2019-01-11T22:34:41Z

Thinking about this a bit more, I feel like there is quite a number of considerations and unknowns that came up on this thread (e.g., "comments or not?" etc), and I'm a bit reluctant to just r+ this without at least recording those. So I guess I would say, could someone produce a brief summary of the conversation and in particular the unknowns?

Then we can put that in the tracking issue and I would feel pretty good about an r+.

(I may have time to do that on monday, gotta run right now)

Centril · 2019-01-11T22:59:22Z

I feel like procedurally this probably requires an FCP. But what team? I guess technically this is a libs API? But it feels like compiler team should check off?

@nikomatsakis I believe most changes to the proc macro APIs are shared between T-Libs and T-Lang so both of those teams. :)

nikomatsakis · 2019-01-15T17:18:36Z

@Centril seems sensible. Regardless, I think what we need most at this juncture is a kind of capsule summary of the conversation and in particular highlighting the alternative designs that were visited and the reasons for the current one.

Centril · 2019-01-15T17:36:04Z

@nikomatsakis Fair; I've nominated to discuss this a bit on Thursday. :)

nikomatsakis · 2019-01-17T20:17:42Z

One thing we might want to note in any summary:

There are a variety of possible strings you might return here. For example, if you had foo!($a) from inside a macro-rules invocation, we might see the $a substituted -- or now. We should document the return value and check for feedback as to whether it feels like it is the "right" one. Or maybe return None in tricky cases.

matklad · 2019-02-01T13:59:15Z

unknowns

Another small unknown is line-endings. Ideally, the meaning of Rust program should be independent of line endings used (because, for example, gitconfig might change the line endings). I think currently this is more or less the case: for example, line endings in string literals seem to be normalized to \n. For this API, we might want to normalize newlines as well!

Dylan-DPC-zz · 2019-02-11T20:36:31Z

ping from triage @nikomatsakis what's the update on this?

Mark-Simulacrum · 2019-02-27T00:57:13Z

@nikomatsakis Could you post a summary of the current state of this pull request?

(triage)

Dylan-DPC-zz · 2019-03-18T09:32:55Z

ping from triage anyone from @rust-lang/lang @rust-lang/libs can review this?

Centril · 2019-03-18T17:17:07Z

I'm going to r? @petrochenkov for now (please make a fresh issue number and tracking issue for it) so that this PR can be dealt with. We don't have to figure out everything just now.

petrochenkov · 2019-03-26T20:57:53Z

I'm ok with this as long as this is unstable and documented as best effort.
I guess the use experience will show whether any normalization is necessary or not.

proc_macro_span is a pretty heterogeneous feature and is very unlikely to be stabilized in one (or even two) step(s), so it's probably ok to track this with #54725 as well.
#54725 also contains some discussion about incremental, which is relevant for this function as well.

@bors r+

bors · 2019-03-26T20:57:54Z

📌 Commit e88b0d9 has been approved by petrochenkov

bors · 2019-03-27T08:58:48Z

⌛ Testing commit e88b0d9 with merge c5fb4d0...

@alexcrichton

Introduce proc_macro::Span::source_text A function to extract the actual source behind a Span. Background: I would like to use `syn` in a `build.rs` script to parse the rust code, and extract part of the source code. However, `syn` only gives access to proc_macro2::Span, and i would like to get the source code behind that. I opened an issue on proc_macro2 bug tracker for this feature dtolnay/proc-macro2#110 and @alexcrichton said the feature should first go upstream in proc_macro. So there it is! Since most of the Span API is unstable anyway, this is guarded by the same `proc_macro_span` feature as everything else.

bors · 2019-03-27T12:29:15Z

☀️ Test successful - checks-travis, status-appveyor
Approved by: petrochenkov
Pushing c5fb4d0 to master...

rust-highfive assigned nikomatsakis Nov 8, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 8, 2018

nikomatsakis approved these changes Nov 12, 2018

View reviewed changes

roblabla mentioned this pull request Nov 20, 2018

Rewrite using proc-macro-hack gifnksm/power-assert-rs#17

Open

Centril previously requested changes Nov 22, 2018

View reviewed changes

src/libproc_macro/lib.rs Outdated Show resolved Hide resolved

src/libproc_macro/lib.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

Centril added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 15, 2019

Centril removed the I-nominated label Jan 17, 2019

nikomatsakis closed this Jan 17, 2019

nikomatsakis reopened this Jan 17, 2019

rust-highfive assigned petrochenkov and unassigned nikomatsakis Mar 18, 2019

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 26, 2019

bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 27, 2019

bors merged commit e88b0d9 into rust-lang:master Mar 27, 2019

Introduce proc_macro::Span::source_text #55780

Introduce proc_macro::Span::source_text #55780

Conversation

ogoffart commented Nov 8, 2018

rust-highfive commented Nov 8, 2018

nikomatsakis commented Nov 12, 2018

nikomatsakis commented Nov 12, 2018

ogoffart commented Nov 12, 2018

alexcrichton commented Nov 12, 2018

dtolnay commented Nov 12, 2018

alexcrichton commented Nov 12, 2018

dtolnay commented Nov 12, 2018

alexcrichton commented Nov 12, 2018

nikomatsakis commented Nov 14, 2018 • edited Loading

ogoffart commented Nov 15, 2018

nikomatsakis commented Nov 16, 2018

ogoffart commented Nov 17, 2018

ogoffart commented Nov 21, 2018

Centril commented Nov 22, 2018 • edited Loading

ogoffart commented Nov 23, 2018

Centril commented Nov 23, 2018

petrochenkov commented Nov 23, 2018

Centril commented Nov 23, 2018

roblabla commented Nov 23, 2018

ogoffart commented Nov 23, 2018

Centril commented Nov 23, 2018

ogoffart commented Nov 23, 2018

This comment has been minimized.

nikomatsakis commented Jan 11, 2019

nikomatsakis commented Jan 11, 2019

nikomatsakis commented Jan 11, 2019

nikomatsakis commented Jan 11, 2019 • edited Loading

Centril commented Jan 11, 2019

nikomatsakis commented Jan 15, 2019

Centril commented Jan 15, 2019

nikomatsakis commented Jan 17, 2019

matklad commented Feb 1, 2019

Dylan-DPC-zz commented Feb 11, 2019

Mark-Simulacrum commented Feb 27, 2019

Dylan-DPC-zz commented Mar 18, 2019

Centril commented Mar 18, 2019

petrochenkov commented Mar 26, 2019

bors commented Mar 26, 2019

bors commented Mar 27, 2019

bors commented Mar 27, 2019

nikomatsakis commented Nov 14, 2018 •

edited

Loading

Centril commented Nov 22, 2018 •

edited

Loading

nikomatsakis commented Jan 11, 2019 •

edited

Loading