Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #5260 #5262

Merged
merged 1 commit into from
May 8, 2022
Merged

Fixes #5260 #5262

merged 1 commit into from
May 8, 2022

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Mar 12, 2022

Fixes #5260 by checking if it is part of a type '::'
Add possibility to run single test e.g. TEST_FILE=5250 cargo test system to run test files containing 5250

src/test/mod.rs Outdated Show resolved Hide resolved
@ytmimi
Copy link
Contributor

ytmimi commented Mar 12, 2022

Thanks for the PR!

As I mentioned in #5260 (comment), I'm wondering if we could take a similar approach to the referenced PR and add another regex check in has_url (linked below for reference):

rustfmt/src/comment.rs

Lines 928 to 936 in 83c1692

/// Returns `true` if the given string MAY include URLs or alike.
fn has_url(s: &str) -> bool {
// This function may return false positive, but should get its job done in most cases.
s.contains("https://")
|| s.contains("http://")
|| s.contains("ftp://")
|| s.contains("file://")
|| REFERENCE_LINK_URL.is_match(s)
}

Just wondering if you gave that a try and landed on this approach because the regex route didn't work?
I personally feel that the change would be a little cleaner if it was encapsulated in has_url instead of directly tweaking how we break strings.

@PSeitz
Copy link
Contributor Author

PSeitz commented Mar 12, 2022

I was considering disabling breaking lines completely if detected, but I think the solution in the PR is closer to what we actually want, since it will still try to split the line.

Currently there is one part missing to apply the result, because the call to
wrap_str(result, fmt.config.max_width(), fmt.shape) accepts only string rewrites if no line is exceeding the max_width limit, which is odd since it is a better formatting result.

I moved the test filtering to this PR #5265

@PSeitz PSeitz force-pushed the master branch 2 times, most recently from 812f4b2 to fbc3497 Compare March 12, 2022 07:56
@ytmimi
Copy link
Contributor

ytmimi commented Mar 12, 2022

Thanks for taking the time to move the test filtering code to a different PR!

Agreed that not splitting the line at all isn't ideal, but it's consistent with how we handle other links. I think trying to uphold that consistency would be best, but totally open to considering these changes!

break_string is a source of some other bugs and, that's mostly why I'm reluctant to change it directly to support not breaking on :: (aside from the earlier point of has_url better encapsulating the fix).

Right now I'm leaning towards the has_url solution because the surface for code changes is low and it provides an immediate fix for the problem of breaking qualified paths to Types in markdown URLs. In the future I think it would be cool to use a markdown parser to more intelligently determine line breaks in doc comments for other markdown constructs that we don't currently handle well!

One nit, but something I'd want changed before moving forward is that it looks like the commit history is a bit messed up and these changes were amended to an unrelated commit (or at least the commit message / author info from a previous commit was copied over). When you have a chance could you rebase these changes and update the commit message, Thanks!

@PSeitz PSeitz force-pushed the master branch 2 times, most recently from 594993a to cd19e80 Compare March 13, 2022 02:04
@PSeitz
Copy link
Contributor Author

PSeitz commented Mar 13, 2022

I understand the url approach, but I think this is a more complete solution.

The root cause of the issue is not markdown links, since (ABC)[CDE], does not contain any characters that are considered a line break. It's that : is considered a line break (punctuation), although it's part of a type.

This approach covers more aspects, e.g.

  • break_string is called from multiple positions, but the url check does not cover e.g. itemized blocks.

  • We can have very long types longer than max_width. I extended the test to illustrate.

I fixed the commit to not amend to another commit.

In general I think it would better, to handle tokens and not graphemes in break_string.

@calebcartwright
Copy link
Member

Thank you both for the PR/review and dialog! I've not had a chance to look at the code but wanted to chime in on a couple of the points of dialog.

Re: leading link/url approach vs. double colon search, you've both made some good points. There's an inherent friction in trying to split things between splitting something you don't want to split vs. failing to split something that could be split. I'd say historically the tact has been more aggressive in nature with subsequent attempts to curb or soften when/if breaks will happen, and we're at such a point again. Ultimately we'll have to decide (or expose as yet another config option) whether any :: is strictly part of a type (possibly reasonable/safe) or if there's valid cases where it could appear in some breakable context

In general I think it would better, to handle tokens and not graphemes in break_string

This is a short, subtle sentence but that speaks to a much larger topic that I don't really want to delve into here beyond saying that the utilization of graphemes is intentional to account for the full range of text rustfmt will encounter, including all sorts of unicode fun, rtl vs ltr languages, etc.

If you think there's a better alternative that accounts for everything then certainly feel free to make a separate proposal elsewhere, but it definitely would not be desirable to e.g. move from grapheme processing to more trivial code point/token based processing.

@PSeitz
Copy link
Contributor Author

PSeitz commented Mar 16, 2022

Ultimately we'll have to decide (or expose as yet another config option) whether any :: is strictly part of a type (possibly reasonable/safe) or if there's valid cases where it could appear in some breakable context

Good point. I'm not aware of any :: which we want to split. I don't think it occurs in natural language.
It does appear in URIs, which we also don't want to split.

If you think there's a better alternative that accounts for everything then certainly feel free to make a separate proposal elsewhere, but it definitely would not be desirable to e.g. move from grapheme processing to more trivial code point/token based processing.

What I meant with tokens is to group graphemes together to a token (e.g. a word).

@PSeitz
Copy link
Contributor Author

PSeitz commented Apr 11, 2022

@ytmimi @calebcartwright
Can we move forward on this? This fixes an issue I have in documenting and referencing types in tantivy.

@calebcartwright
Copy link
Member

@ytmimi @calebcartwright Can we move forward on this? This fixes an issue I have in documenting and referencing types in tantivy.

I appreciate you've got some vested interest in seeing the issue resolved, though please keep in mind that's also true for others that have reported and/or proposed changes for things they are encountering.

rustfmt is sadly underserved when it comes to the review/maintenance side of the equation, and as such the PR queue tends to back up. This one isn't too far out now, but we'll get to it when we can.

Thank you for your patience @PSeitz

@calebcartwright
Copy link
Member

This is up next on my list to review, though before diving in, @ytmimi do you have any remaining concerns with the approach?

I'm probably grossly oversimplifying things, but AIUI the question/approaches were really between whether we'd only refrain from splitting the type if it appeared at the start of a comment line, or alternatively, anywhere within the line?

@PSeitz - definitely no rush, but as this has been waiting on us for a while an we've had to sync some changes from upstream in the interim, would probably be good to do a rebase on master at your convenience

@ytmimi
Copy link
Contributor

ytmimi commented Apr 29, 2022

@calebcartwright I've gone ahead and given this a re-review. Initially I wanted to avoid going down this code path at all. After sitting on it for a little while and taking a look at the implementation again I think I'm warmed up to moving forward with the proposed solution.

The only thing I'm wondering now is if it would be better to generalize is_part_of_type to be more generic and disallow breaking on any consecutive punctuation, but taking a look at the Unicode Category "Other Punctuation" (which is what is_punctuation is implemented with), I don't know if we'll run into very many situations like this.

Maybe breaking at an exclamation point if it's on the comment width boundary e.g: ...(some comment text)!!!! -> ...(some comment text)!!\n!!

I don't think it's necessary to implement a generalized solution, just wondering if including this now would prevent other line break issues down the road. Overall I'm happy to move forward with this.

Fixes rust-lang#5260 by checking if it is part of a type '::'
@PSeitz
Copy link
Contributor Author

PSeitz commented May 3, 2022

I think this generalization to disallow breaking on any consecutive punctuation is a good a idea. I updated the PR and added a test for it.

Copy link
Contributor

@ytmimi ytmimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this! @calebcartwright I'm happy to move forward, but will give you a chance to review.

@calebcartwright
Copy link
Member

I'm open to discussing additional changes beyond what's really in scope for the types-related issue, but I don't want to keep increasing the scope and elongating the amount of time on this particular PR. Let's drop this back to strictly handling types, and open a separate issue for discussion of broader rules, like consecutive punctuation

@PSeitz
Copy link
Contributor Author

PSeitz commented May 5, 2022

I dropped the commit, and created a PR that includes the commit here #5331

Copy link
Contributor

@ytmimi ytmimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calebcartwright sorry about that. I didn't mean to extend the the scope of the PR with the suggestion that we disallow breaking on any punctuation. Agreed that it's probably best to continue the discussion in a separate issue.

@PSeitz Thanks for reverting the changes LGTM.

@PSeitz
Copy link
Contributor Author

PSeitz commented May 5, 2022

I can’t merge the PR, since I don’t have write access

@ytmimi
Copy link
Contributor

ytmimi commented May 5, 2022

I can’t merge the PR, since I don’t have write access

Yup, that's intentional. I think we're good to go here but will hold off on merging this until @calebcartwright gives another review as I'm not sure if he wants to see any other changes besides dropping the double punctuation logic.

Copy link
Member

@calebcartwright calebcartwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much! I had a couple minor thoughts but this is good enough as-is so would rather just move forward 👍

@calebcartwright calebcartwright merged commit c65ba14 into rust-lang:master May 8, 2022
@calebcartwright calebcartwright added release-notes Needs an associated changelog entry and removed pr-follow-up-review-pending labels May 8, 2022
@calebcartwright calebcartwright removed the release-notes Needs an associated changelog entry label Jun 17, 2022
PSeitz added a commit to quickwit-oss/tantivy that referenced this pull request Sep 8, 2022
fix link alias after rust-lang/rustfmt#5262 has been merged and released.
fix dead links
fulmicoton pushed a commit to quickwit-oss/tantivy that referenced this pull request Sep 8, 2022
fix link alias after rust-lang/rustfmt#5262 has been merged and released.
fix dead links
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wrap_comments can break up type links
3 participants