Fix LF line-ending auto format bug #10802

jordi1215 · 2024-08-27T22:33:33Z

This PR fixes #5589 #4349 #10466, the bug that adds extra lines when VS code users have LF line ending enabled for their Razor files and they format the document.

Root Cause

This bug happens for CSharp code in a razor file when VS coder users use LF line ending for a razor file. Based on my best understanding, the problem lies that the razor server is formatting the csharpSourceText (see here), which is generated with the default line ending of the operating system (LF for Unix and CRLF for Windows, hence this doesn't repro on Mac). Once the csharpSourceText is formatted it gets compared against CSharp portion of the original document which could have the LF line ending, and if so, the formatting service adds a \r before every \n.

Summary of the changes

I implemented a method that filters out unwanted TextEdits returned back to the client in RazorFormattingService.cs. The normalizing method implemented counts the occurrences of CRLF and LF line endings in the original text. If LF line endings are more prevalent, it removes any CR characters (\r) from the text edits to ensure consistency with the LF style.
I have added a second test to all the current razor formatting tests in which we flip the line ending of the files from the system default line ending (LF to CRLF and vice versa). By doing so, we found out that there are more underlying issues with LF line ending formatting and is being tracked in issue Formatting Test Failed for GenericComponentWithCascadingTypeParameter_Nested() in LF line ending #10836.
As a result of the added test, I tracked down one indentation bug where a temporarily stored indentation value got overwritten and only appeared in LF line ending files since the buffer space between lines was one character short (from \n\r to \n). I added an if statement that prevents from value getting overwritten.

The TextEdit normalization technique introduced in this PR can be inverted if a better solution is thought of. There are two main issues:

Getting the line ending information from the client.
Generate the csharpSourceText with the correct line ending or convert it to the correct one upon loading it.

…server from sending /r to LF line ending docs

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

davidwengier · 2024-08-27T23:34:38Z

I don't have any real concerns with this, but I don't think I can approve without tests. It's very hard to understand the real world impact otherwise. Tests would also help prove if the simplified approach I commented will actually work.

…d keep everything else intact in the text edit. Added test cases

jordi1215 · 2024-08-29T04:18:20Z

I don't have any real concerns with this, but I don't think I can approve without tests. It's very hard to understand the real world impact otherwise. Tests would also help prove if the simplified approach I commented will actually work.

For some reason it totally slipped my mind to add in some tests. I added some by hard coding the two different line endings in the string. I confirmed that the tests fail before these commits and pass after the changes made so far in this PR. Please let me know if I should add more tests. I figured since we have very comprehensive tests on all sorts of razor-related formatting and the fact that these changes didn't fail any of the previous tests, we should be fine.

…mplementation

davidwengier

Awesome job. My comments are really just small tweaks, but hopefully we can massively increase the test coverage.

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

davidwengier · 2024-08-29T06:36:34Z

...crosoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/CodeDirectiveFormattingTest.cs

@@ -37,6 +37,44 @@ public interface Bar
                    """);
    }

+    [Fact]
+    public async Task FormatsCodeBlockDirectiveWithCRLFLineEnding()


These tests are great, and I was going to ask you to create a version of the Formats_CodeBlockDirectiveWithMarkup_NonBraced test as well (because it adds a newline with trailing whitespace, which these don't cover), but then I thought about it more, and I think it would be more interesting if FormattingTestBase was modified to replace \r\n with \n, and we run every test with both line endings.

ie, duplicate these two lines again, but with input and expected having had their line endings replaced with LF
https://github.com/dotnet/razor/blob/main/src/Razor/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs#L54

This makes sense! That would be a better way to widen the test coverage

davidwengier · 2024-08-29T06:37:58Z

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

+        return minimalEdits;
+    }
+
+    private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)


Nit: This method could just return a bool and be called HasLFLineEndings.

I had a couple of thoughts/questions about this method:

Has SourceText.Lines already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through the Lines collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in the SourceText.

Should we be concerned about other legal "new line" characters? Here are the others that I know of:

Next Line (U+0085)

Line Separator (U+2028)

Paragraph Separator (U+2029)

Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions and make it an extension method on SourceText.

DustinCampbell

This looks good to me, though I ask that you address @davidwengier's feedback to use a simple foreach loop rather than LINQ to avoid unnecessary allocations.

I added several suggestions to improve efficiency, facilitate code reuse, and improve testing. I added a lot of explanation and samples in case you find it useful.

DustinCampbell · 2024-08-29T15:00:30Z

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

+        return minimalEdits;
+    }
+
+    private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)


I had a couple of thoughts/questions about this method:

Has SourceText.Lines already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through the Lines collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in the SourceText.

Should we be concerned about other legal "new line" characters? Here are the others that I know of:

Next Line (U+0085)

Line Separator (U+2028)

Paragraph Separator (U+2029)

DustinCampbell · 2024-08-29T15:32:33Z

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

+        var crlfCount = 0;
+        var lfCount = 0;
+
+        for (var i = 0; i < sourceText.Length; i++)
+        {
+            if (sourceText[i] == '\r')
+            {
+                if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
+                {
+                    crlfCount++;
+                    i++; // Skip the next character as it's part of the CRLF sequence
+                }
+            }
+            else if (sourceText[i] == '\n')
+            {
+                lfCount++;
+            }
+        }
+
+        return (crlfCount, lfCount);


For an algorithm like this, I might do a few things differently to improve the efficiency. You can take these suggestions or leave them! Or, you can take an entirely different approach! I just wanted to share a couple of opportunities:

I would extract SourceText.Length to a local variable to avoid accessing it each iteration of the loop. SourceText.Length is an abstract property and is overridden by every SourceText implementation. So, I wouldn't expect the JIT to do any smart inlining here.

I would capture the first character before the loop and then start the loop at index 1. That way, I could avoid the extra length check inside the loop for \r.

I would likely extract sourceText[i] to a local variable or use a switch statement to avoid indexing extra times. This allows you to walk characters in pairs -- the previous and current character.

This doesn't help with efficiency so much, but C# tuples are mutable structs. At the method body level, it can be useful to think of them as little packs of variables. So, you could just declare a single local at the top of the method for the result tuple and just update the fields. This won't affect efficiency and is largely a stylistic choice in this case, but I thought it might be useful to mention.

Suggested change

var crlfCount = 0;

var lfCount = 0;

for (var i = 0; i < sourceText.Length; i++)

{

if (sourceText[i] == '\r')

{

if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')

{

crlfCount++;

i++; // Skip the next character as it's part of the CRLF sequence

}

}

else if (sourceText[i] == '\n')

{

lfCount++;

}

}

return (crlfCount, lfCount);

var result = (crlfCount: 0, lfCount: 0);

var length = sourceText.Length;

if (length == 0)

{

return result;

}

var previous = sourceText[0];

if (previous == '\n')

{

result.lfCount++;

}

for (var i = 1; i < length; i++)

{

var current = sourceText[i];

if (current == '\n')

{

if (previous == '\r')

{

result.crlfCount++;

// Skip ahead to avoid counting the '\n' again during the next iteration.

// However, we need to be careful not to index past the end of the SourceText!

if (++i < length)

{

// Set previous to the character *after* current. And, since we've already

// set previous, we can continue the loop. Otherwise, previous will get set

// to the wrong value below.

previous = sourceText[i];

continue;

}

}

else

{

result.lfCount++;

}

}

previous = current;

}

return result;

As I mentioned above, it might just be simpler (and more efficient) to walk through SourceText.Lines. However, I wanted to provide some suggestions in case you keep this algorithm. Also, please note that I have not tested this code! I wrote it directly into the GitHub comment field. So, please take the accuracy with a grain of salt. I've only had one cup of coffee this morning. 😄

DustinCampbell · 2024-08-29T15:35:03Z

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

+        return minimalEdits;
+    }
+
+    private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)


Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions and make it an extension method on SourceText.

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

DustinCampbell · 2024-08-29T16:10:36Z

...crosoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/CodeDirectiveFormattingTest.cs

+        await RunFormattingTestAsync(
+            input:
+                "@code {\r\n" +
+                " public class Foo{}\r\n" +
+                "        public interface Bar {\r\n" +
+                "}\r\n" +
+                "}",
+            expected:
+                "@code {\r\n" +
+                "    public class Foo { }\r\n" +
+                "    public interface Bar\r\n" +
+                "    {\r\n" +
+                "    }\r\n" +
+                "}");


Consider adding test helpers to add the line breaks for you. Something like these would make it a little less error prone to write new tests.

public static string CodeFromLines(string lineEnding, params string[] lines) { return string.Join(lineEnding, lines); }

Then, the tests can be written by passing in the line text without line endings like so:

Suggested change

await RunFormattingTestAsync(

input:

"@code {\r\n" +

" public class Foo{}\r\n" +

" public interface Bar {\r\n" +

"}\r\n" +

"}",

expected:

"@code {\r\n" +

" public class Foo { }\r\n" +

" public interface Bar\r\n" +

" {\r\n" +

" }\r\n" +

"}");

await RunFormattingTestAsync(

input: CodeFromLines("\r\n",

"@code {",

" public class Foo{}",

" public interface Bar {",

"}",

"}"),

expected: CodeFromLines("\r\n",

"@code {",

" public class Foo { }",

" public interface Bar",

" {",

" }",

"}");

Even better, since this is test code, efficiency is less worrisome than in production code. So, you could write a helper that uses a SourceText to parse the lines and use that to build up a new string with different line endings. Something like the following would do the trick (Note: untested code!).

public static string NormalizeLineEndings(string lineEnding, string code) { using var _ = StringBuilderPool.GetPooledObject(out var builder); var text = SourceText.From(code); foreach (var line in text.Lines) { builder.Append(text.GetSubTextString(line.Span)); if (line.End < text.Length) { builder.Append(lineEnding); } } return builder.ToString(); }

Then, the tests could continue using raw string literals like so:

Suggested change

await RunFormattingTestAsync(

input:

"@code {\r\n" +

" public class Foo{}\r\n" +

" public interface Bar {\r\n" +

"}\r\n" +

"}",

expected:

"@code {\r\n" +

" public class Foo { }\r\n" +

" public interface Bar\r\n" +

" {\r\n" +

" }\r\n" +

"}");

await RunFormattingTestAsync(

input: NormalizeLineEndings("\r\n", """

@code {

public class Foo{}

public interface Bar {

}

}

""",

expected: NormalizeLineEndings("\r\n", """

@code {

public class Foo { }

public interface Bar

{

}

}

""");

Even, even better, the test helper could be extension method on string.

public static string ChangeLineEndingsTo(this string code, string lineEnding);

…gs to SourceTextExtensions

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

davidwengier · 2024-08-29T22:24:30Z

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Extensions/SourceTextExtensions.cs

+        foreach (var line in text.Lines)
+        {
+            var lineBreakSpan = TextSpan.FromBounds(line.End, line.EndIncludingLineBreak);
+            var lineBreak = line.Text?.ToString(lineBreakSpan) ?? string.Empty;


line.Text is just returning you the text you already have, and already know is not null. But also, can we just do this without creating any strings at all, and just using if (line.EndIncludingLineBreak - line.End == 2)?

Having said that, I have a wacky idea and I don't think need this method at all, but I can't be sure until I've debugged through a test and see for myself what things actually look like, so will leave that one for my TODO list later :)

jordi1215 · 2024-08-30T00:14:34Z

Dear reviewers, as I am expanding the test coverage I came across a new bug. The indentation space from the razor formatting service is not working for the @section Scripts block for LF line-ending files(see video attached).
(Note that the video captured is on my Mac where the original line adding bug doesn't repro)
Should I:

Fix the indentation bug in this PR so that the more comprehensive razor formatting test (on both line endings) that I am adding can pass?
Merge this PR without the added tests and open a new PR for the indentation bug and add the tests in that PR?

Thanks!

Screen.Recording.2024-08-29.at.4.58.42.PM.mov

davidwengier · 2024-08-30T00:25:00Z

If the fix is quick and easy, then fixing it would be great. If not, the best thing to do is to add the test, but skip it and create an issue for the bug. That way when it comes time to fix, half of the work has already been done.

…ases

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs

…ack the progress

DustinCampbell

Looks good! Just a couple of small nits from me.

...r/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs

…productivity

davidwengier · 2024-09-04T22:12:07Z

Late to this, but I love the fact that you only skipped the LF part of the failing tests, and not the whole thing <3

Implemented a line normalization function that prevents the language …

e97ad73

…server from sending /r to LF line ending docs

jordi1215 requested a review from a team as a code owner August 27, 2024 22:33

davidwengier reviewed Aug 27, 2024

View reviewed changes

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs Outdated Show resolved Hide resolved

davidwengier reviewed Aug 27, 2024

View reviewed changes

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs Outdated Show resolved Hide resolved

This was linked to issues Aug 28, 2024

Broken formatting on Razor (cshtml) files with LF EOL (vscode) #4349

Closed

Razor code formatting is unusable when the end of line is LF #10466

Closed

improved the line ending normalization where we only delete the \r an…

7cca6bd

…d keep everything else intact in the text edit. Added test cases

changed NormalizeLineEndings documentation to reflect the change in i…

edea7fe

…mplementation

davidwengier approved these changes Aug 29, 2024

View reviewed changes

DustinCampbell approved these changes Aug 29, 2024

View reviewed changes

jordi1215 added 2 commits August 29, 2024 11:28

Iterate through text.Lines to count line ending. Moved HasLFLineEndin…

1c7e136

…gs to SourceTextExtensions

changing the TextEdit in place instead of creating a copy

0ef7de5

ryzngard approved these changes Aug 29, 2024

View reviewed changes

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs Outdated Show resolved Hide resolved

davidwengier reviewed Aug 29, 2024

View reviewed changes

jordi1215 added 3 commits August 30, 2024 12:48

changed var name and remove hard-coded LF format tests

7e8e056

check if indentation location has been processed

c04f865

added LF line ending document to all previous razor formatting test c…

3dc6961

…ases

DustinCampbell reviewed Sep 3, 2024

View reviewed changes

src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs Outdated Show resolved Hide resolved

jordi1215 added 3 commits September 4, 2024 09:37

merge with main

053f412

skipping some LF line ending formatting tests. Created an issue to tr…

624e5d6

…ack the progress

updated NormalizeLineEndings comment into XML format

95d7a72

DustinCampbell approved these changes Sep 4, 2024

View reviewed changes

delete unncessary lines and using directive, swap boolean values for …

b445345

…productivity

jordi1215 merged commit 148d71a into main Sep 4, 2024
12 checks passed

jordi1215 deleted the dev/jordi1215/fix-LF-Format branch September 4, 2024 21:43

dotnet-policy-service bot added this to the Next milestone Sep 4, 2024

This was referenced Sep 6, 2024

[Automated] PRs inserted in VS build main-35305.94 #10846

Closed

[Automated] PRs inserted in VS build feature.debugger.main-35305.238 #10849

Closed

jordi1215 linked an issue Sep 11, 2024 that may be closed by this pull request

[BUG] cshtml formatter keeps adding blank lines microsoft/vscode-dotnettools#963

Open

dotnet-bot mentioned this pull request Sep 14, 2024

[Automated] PRs inserted in VS build feature.debugger.shadowDebug-35313.170 #10886

Closed

ryzngard mentioned this pull request Oct 1, 2024

Merge main into features/extract-to-component #10948

Merged

phil-allen-msft modified the milestones: Next, 17.12 P3 Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LF line-ending auto format bug #10802

Fix LF line-ending auto format bug #10802

jordi1215 commented Aug 27, 2024 •

edited

Loading

davidwengier commented Aug 27, 2024

jordi1215 commented Aug 29, 2024

davidwengier left a comment

davidwengier Aug 29, 2024

jordi1215 Aug 29, 2024

davidwengier Aug 29, 2024

DustinCampbell Aug 29, 2024

DustinCampbell Aug 29, 2024

DustinCampbell left a comment •

edited

Loading

DustinCampbell Aug 29, 2024

DustinCampbell Aug 29, 2024

DustinCampbell Aug 29, 2024

DustinCampbell Aug 29, 2024

davidwengier Aug 29, 2024

jordi1215 commented Aug 30, 2024 •

edited

Loading

davidwengier commented Aug 30, 2024

DustinCampbell left a comment

davidwengier commented Sep 4, 2024

-        var crlfCount = 0;
-        var lfCount = 0;
-        for (var i = 0; i < sourceText.Length; i++)
-        {
-            if (sourceText[i] == '\r')
-            {
-                if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
-                {
-                    crlfCount++;
-                    i++; // Skip the next character as it's part of the CRLF sequence
-                }
-            }
-            else if (sourceText[i] == '\n')
-            {
-                lfCount++;
-            }
-        }
-        return (crlfCount, lfCount);
+        var result = (crlfCount: 0, lfCount: 0);
+        var length = sourceText.Length;
+        if (length == 0)
+        {
+            return result;
+        }
+        var previous = sourceText[0];
+        if (previous == '\n')
+        {
+            result.lfCount++;
+        }
+        for (var i = 1; i < length; i++)
+        {
+            var current = sourceText[i];
+            if (current == '\n')
+            {
+                if (previous == '\r')
+                {
+                    result.crlfCount++;
+                    // Skip ahead to avoid counting the '\n' again during the next iteration.
+                    // However, we need to be careful not to index past the end of the SourceText!
+                    if (++i < length)
+                    {
+                        // Set previous to the character *after* current. And, since we've already
+                        // set previous, we can continue the loop. Otherwise, previous will get set
+                        // to the wrong value below.
+                        previous = sourceText[i];
+                        continue;
+                    }
+                }
+                else
+                {
+                    result.lfCount++;
+                }
+            }
+            previous = current;
+        }
+        return result;

Fix LF line-ending auto format bug #10802

Fix LF line-ending auto format bug #10802

Conversation

jordi1215 commented Aug 27, 2024 • edited Loading

Root Cause

Summary of the changes

davidwengier commented Aug 27, 2024

jordi1215 commented Aug 29, 2024

davidwengier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DustinCampbell left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jordi1215 commented Aug 30, 2024 • edited Loading

davidwengier commented Aug 30, 2024

DustinCampbell left a comment

Choose a reason for hiding this comment

davidwengier commented Sep 4, 2024

jordi1215 commented Aug 27, 2024 •

edited

Loading

DustinCampbell left a comment •

edited

Loading

jordi1215 commented Aug 30, 2024 •

edited

Loading