Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LF line-ending auto format bug #10802

Merged
merged 12 commits into from
Sep 4, 2024
Merged

Conversation

jordi1215
Copy link
Member

@jordi1215 jordi1215 commented Aug 27, 2024

This PR fixes #5589 #4349 #10466, the bug that adds extra lines when VS code users have LF line ending enabled for their Razor files and they format the document.

Root Cause

This bug happens for CSharp code in a razor file when VS coder users use LF line ending for a razor file. Based on my best understanding, the problem lies that the razor server is formatting the csharpSourceText (see here), which is generated with the default line ending of the operating system (LF for Unix and CRLF for Windows, hence this doesn't repro on Mac). Once the csharpSourceText is formatted it gets compared against CSharp portion of the original document which could have the LF line ending, and if so, the formatting service adds a \r before every \n.

Summary of the changes

  • I implemented a method that filters out unwanted TextEdits returned back to the client in RazorFormattingService.cs. The normalizing method implemented counts the occurrences of CRLF and LF line endings in the original text. If LF line endings are more prevalent, it removes any CR characters (\r) from the text edits to ensure consistency with the LF style.
  • I have added a second test to all the current razor formatting tests in which we flip the line ending of the files from the system default line ending (LF to CRLF and vice versa). By doing so, we found out that there are more underlying issues with LF line ending formatting and is being tracked in issue Formatting Test Failed for GenericComponentWithCascadingTypeParameter_Nested() in LF line ending #10836.
  • As a result of the added test, I tracked down one indentation bug where a temporarily stored indentation value got overwritten and only appeared in LF line ending files since the buffer space between lines was one character short (from \n\r to \n). I added an if statement that prevents from value getting overwritten.

The TextEdit normalization technique introduced in this PR can be inverted if a better solution is thought of. There are two main issues:

  1. Getting the line ending information from the client.
  2. Generate the csharpSourceText with the correct line ending or convert it to the correct one upon loading it.

…server from sending /r to LF line ending docs
@jordi1215 jordi1215 requested a review from a team as a code owner August 27, 2024 22:33
@davidwengier
Copy link
Contributor

I don't have any real concerns with this, but I don't think I can approve without tests. It's very hard to understand the real world impact otherwise. Tests would also help prove if the simplified approach I commented will actually work.

…d keep everything else intact in the text edit. Added test cases
@jordi1215
Copy link
Member Author

I don't have any real concerns with this, but I don't think I can approve without tests. It's very hard to understand the real world impact otherwise. Tests would also help prove if the simplified approach I commented will actually work.

For some reason it totally slipped my mind to add in some tests. I added some by hard coding the two different line endings in the string. I confirmed that the tests fail before these commits and pass after the changes made so far in this PR. Please let me know if I should add more tests. I figured since we have very comprehensive tests on all sorts of razor-related formatting and the fact that these changes didn't fail any of the previous tests, we should be fine.

Copy link
Contributor

@davidwengier davidwengier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job. My comments are really just small tweaks, but hopefully we can massively increase the test coverage.

@@ -37,6 +37,44 @@ public interface Bar
""");
}

[Fact]
public async Task FormatsCodeBlockDirectiveWithCRLFLineEnding()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are great, and I was going to ask you to create a version of the Formats_CodeBlockDirectiveWithMarkup_NonBraced test as well (because it adds a newline with trailing whitespace, which these don't cover), but then I thought about it more, and I think it would be more interesting if FormattingTestBase was modified to replace \r\n with \n, and we run every test with both line endings.

ie, duplicate these two lines again, but with input and expected having had their line endings replaced with LF
https://github.com/dotnet/razor/blob/main/src/Razor/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs#L54

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense! That would be a better way to widen the test coverage

return minimalEdits;
}

private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This method could just return a bool and be called HasLFLineEndings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of thoughts/questions about this method:

  1. Has SourceText.Lines already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through the Lines collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in the SourceText.
  2. Should we be concerned about other legal "new line" characters? Here are the others that I know of:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions and make it an extension method on SourceText.

Copy link
Member

@DustinCampbell DustinCampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, though I ask that you address @davidwengier's feedback to use a simple foreach loop rather than LINQ to avoid unnecessary allocations.

I added several suggestions to improve efficiency, facilitate code reuse, and improve testing. I added a lot of explanation and samples in case you find it useful.

return minimalEdits;
}

private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of thoughts/questions about this method:

  1. Has SourceText.Lines already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through the Lines collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in the SourceText.
  2. Should we be concerned about other legal "new line" characters? Here are the others that I know of:

Comment on lines 267 to 286
var crlfCount = 0;
var lfCount = 0;

for (var i = 0; i < sourceText.Length; i++)
{
if (sourceText[i] == '\r')
{
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
{
crlfCount++;
i++; // Skip the next character as it's part of the CRLF sequence
}
}
else if (sourceText[i] == '\n')
{
lfCount++;
}
}

return (crlfCount, lfCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an algorithm like this, I might do a few things differently to improve the efficiency. You can take these suggestions or leave them! Or, you can take an entirely different approach! I just wanted to share a couple of opportunities:

  1. I would extract SourceText.Length to a local variable to avoid accessing it each iteration of the loop. SourceText.Length is an abstract property and is overridden by every SourceText implementation. So, I wouldn't expect the JIT to do any smart inlining here.
  2. I would capture the first character before the loop and then start the loop at index 1. That way, I could avoid the extra length check inside the loop for \r.
  3. I would likely extract sourceText[i] to a local variable or use a switch statement to avoid indexing extra times. This allows you to walk characters in pairs -- the previous and current character.
  4. This doesn't help with efficiency so much, but C# tuples are mutable structs. At the method body level, it can be useful to think of them as little packs of variables. So, you could just declare a single local at the top of the method for the result tuple and just update the fields. This won't affect efficiency and is largely a stylistic choice in this case, but I thought it might be useful to mention.
Suggested change
var crlfCount = 0;
var lfCount = 0;
for (var i = 0; i < sourceText.Length; i++)
{
if (sourceText[i] == '\r')
{
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
{
crlfCount++;
i++; // Skip the next character as it's part of the CRLF sequence
}
}
else if (sourceText[i] == '\n')
{
lfCount++;
}
}
return (crlfCount, lfCount);
var result = (crlfCount: 0, lfCount: 0);
var length = sourceText.Length;
if (length == 0)
{
return result;
}
var previous = sourceText[0];
if (previous == '\n')
{
result.lfCount++;
}
for (var i = 1; i < length; i++)
{
var current = sourceText[i];
if (current == '\n')
{
if (previous == '\r')
{
result.crlfCount++;
// Skip ahead to avoid counting the '\n' again during the next iteration.
// However, we need to be careful not to index past the end of the SourceText!
if (++i < length)
{
// Set previous to the character *after* current. And, since we've already
// set previous, we can continue the loop. Otherwise, previous will get set
// to the wrong value below.
previous = sourceText[i];
continue;
}
}
else
{
result.lfCount++;
}
}
previous = current;
}
return result;

As I mentioned above, it might just be simpler (and more efficient) to walk through SourceText.Lines. However, I wanted to provide some suggestions in case you keep this algorithm. Also, please note that I have not tested this code! I wrote it directly into the GitHub comment field. So, please take the accuracy with a grain of salt. I've only had one cup of coffee this morning. 😄

return minimalEdits;
}

private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions and make it an extension method on SourceText.

Comment on lines 43 to 56
await RunFormattingTestAsync(
input:
"@code {\r\n" +
" public class Foo{}\r\n" +
" public interface Bar {\r\n" +
"}\r\n" +
"}",
expected:
"@code {\r\n" +
" public class Foo { }\r\n" +
" public interface Bar\r\n" +
" {\r\n" +
" }\r\n" +
"}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding test helpers to add the line breaks for you. Something like these would make it a little less error prone to write new tests.

public static string CodeFromLines(string lineEnding, params string[] lines)
{
    return string.Join(lineEnding, lines);
}

Then, the tests can be written by passing in the line text without line endings like so:

Suggested change
await RunFormattingTestAsync(
input:
"@code {\r\n" +
" public class Foo{}\r\n" +
" public interface Bar {\r\n" +
"}\r\n" +
"}",
expected:
"@code {\r\n" +
" public class Foo { }\r\n" +
" public interface Bar\r\n" +
" {\r\n" +
" }\r\n" +
"}");
await RunFormattingTestAsync(
input: CodeFromLines("\r\n",
"@code {",
" public class Foo{}",
" public interface Bar {",
"}",
"}"),
expected: CodeFromLines("\r\n",
"@code {",
" public class Foo { }",
" public interface Bar",
" {",
" }",
"}");

Even better, since this is test code, efficiency is less worrisome than in production code. So, you could write a helper that uses a SourceText to parse the lines and use that to build up a new string with different line endings. Something like the following would do the trick (Note: untested code!).

public static string NormalizeLineEndings(string lineEnding, string code)
{
    using var _ = StringBuilderPool.GetPooledObject(out var builder);

    var text = SourceText.From(code);

    foreach (var line in text.Lines)
    {
        builder.Append(text.GetSubTextString(line.Span));

        if (line.End < text.Length)
        {
            builder.Append(lineEnding);
        }
    }

    return builder.ToString();
}

Then, the tests could continue using raw string literals like so:

Suggested change
await RunFormattingTestAsync(
input:
"@code {\r\n" +
" public class Foo{}\r\n" +
" public interface Bar {\r\n" +
"}\r\n" +
"}",
expected:
"@code {\r\n" +
" public class Foo { }\r\n" +
" public interface Bar\r\n" +
" {\r\n" +
" }\r\n" +
"}");
await RunFormattingTestAsync(
input: NormalizeLineEndings("\r\n", """
@code {
public class Foo{}
public interface Bar {
}
}
""",
expected: NormalizeLineEndings("\r\n", """
@code {
public class Foo { }
public interface Bar
{
}
}
""");

Even, even better, the test helper could be extension method on string.

public static string ChangeLineEndingsTo(this string code, string lineEnding);

foreach (var line in text.Lines)
{
var lineBreakSpan = TextSpan.FromBounds(line.End, line.EndIncludingLineBreak);
var lineBreak = line.Text?.ToString(lineBreakSpan) ?? string.Empty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line.Text is just returning you the text you already have, and already know is not null. But also, can we just do this without creating any strings at all, and just using if (line.EndIncludingLineBreak - line.End == 2)?

Having said that, I have a wacky idea and I don't think need this method at all, but I can't be sure until I've debugged through a test and see for myself what things actually look like, so will leave that one for my TODO list later :)

@jordi1215
Copy link
Member Author

jordi1215 commented Aug 30, 2024

Dear reviewers, as I am expanding the test coverage I came across a new bug. The indentation space from the razor formatting service is not working for the @section Scripts block for LF line-ending files(see video attached).
(Note that the video captured is on my Mac where the original line adding bug doesn't repro)
Should I:

  1. Fix the indentation bug in this PR so that the more comprehensive razor formatting test (on both line endings) that I am adding can pass?
  2. Merge this PR without the added tests and open a new PR for the indentation bug and add the tests in that PR?

Thanks!

Screen.Recording.2024-08-29.at.4.58.42.PM.mov

@davidwengier
Copy link
Contributor

If the fix is quick and easy, then fixing it would be great. If not, the best thing to do is to add the test, but skip it and create an issue for the bug. That way when it comes time to fix, half of the work has already been done.

Copy link
Member

@DustinCampbell DustinCampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple of small nits from me.

@jordi1215 jordi1215 merged commit 148d71a into main Sep 4, 2024
12 checks passed
@jordi1215 jordi1215 deleted the dev/jordi1215/fix-LF-Format branch September 4, 2024 21:43
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Sep 4, 2024
@davidwengier
Copy link
Contributor

Late to this, but I love the fact that you only skipped the LF part of the failing tests, and not the whole thing <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants