Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[17.6] Cache source generator node tables only if their state counts match #66992

Merged
merged 10 commits into from
Apr 11, 2023

Conversation

jjonescz
Copy link
Member

@jjonescz jjonescz commented Feb 22, 2023

@jjonescz jjonescz marked this pull request as ready for review February 22, 2023 14:55
@jjonescz jjonescz requested a review from a team as a code owner February 22, 2023 14:55
@jjonescz jjonescz changed the title Check count before reusing batched items in BatchNode Check count before reusing cached items in BatchNode Feb 23, 2023
@chsienki chsienki self-assigned this Feb 23, 2023
cston
cston previously approved these changes Mar 16, 2023
@jjonescz jjonescz requested a review from a team as a code owner March 31, 2023 06:23
@jjonescz jjonescz changed the base branch from main to release/dev17.6 March 31, 2023 06:23
@jjonescz jjonescz changed the title Check count before reusing cached items in BatchNode [17.6] Check count before reusing cached items in BatchNode Mar 31, 2023
@chsienki
Copy link
Contributor

I think this fix is actually just masking an underlying issue. When the batch node gets the input table its being listed as cached even though it has changed. That in turn causes the issue that's being addressed here. If we fix the input table issue, I don't think we need to make this fix.

Looking at the input table, what's happening is that we're essentially shrinking the number of items from the previous generation pass, but the items that remain are (correctly) listed as cached. When we create the table from the builder we set IsCached based on the sum of the elements: https://github.com/dotnet/roslyn/blob/main/src/Compilers/Core/Portable/SourceGeneration/Nodes/NodeStateTable.cs#L73

In this case, that's not correct. Yes all the items in the table are cached, but the table itself shouldn't be considered cached, as it has less items than last time. I think we should track in the builder if we can consider the table to be cached or not. The only time we can say a table from a builder (as opposed to one we explicitly call AsCached on) can be cached is if all the elements are cached, and it has the same number as the previous table.

Interestingly this is almost exactly the same as the IsRemoved bug we previously fixed and has basically the same fix.

The following test is a fairly simple repro that demonstrates this behavior outside of the batch node case:

[Fact, WorkItem(61162, "https://github.com/dotnet/roslyn/issues/61162")]
public void IncrementalGenerator_Table_Smaller_Second_TimeAround()
{
    var items = ImmutableArray.Create(new[] { "a", "b", "c" });
    var generator = new IncrementalGeneratorWrapper(new PipelineCallbackGenerator(ctx =>
    {
        var transform1 = ctx.CompilationProvider.SelectMany((c, _) => items);
        var transform2 = transform1.Select((i, _) => i);

        ctx.RegisterSourceOutput(transform2, static (spc, i) =>
        {
            spc.AddSource(i, $@"// {i}");
        });
    }));

    var source = "System.Console.WriteLine();";
    var parseOptions = TestOptions.RegularPreview;
    Compilation compilation = CreateCompilation(source, options: TestOptions.DebugExeThrowing, parseOptions: parseOptions);

    GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions);
    verify(ref driver, compilation, new[] {
        "// a",
        "// b",
        "// c"
        });

    items = ImmutableArray.Create(new[] { "a", "b" });

    verify(ref driver, compilation, new[] {
        "// a",
        "// b",
        });

    static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
    {
        driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
        outputCompilation.VerifyDiagnostics();
        generatorDiagnostics.Verify();
        var trees = driver.GetRunResult().GeneratedTrees;
        Assert.Equal(generatedContent.Length, trees.Length);
        for(int i = 0; i < trees.Length; i++)
        {
            AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
        }
    }
}

This will fail right now, as transform2 will end up returning three items the second time through, when it should only return two.

Copy link
Contributor

@chsienki chsienki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per comment.

@jjonescz
Copy link
Member Author

jjonescz commented Apr 3, 2023

Thanks @chsienki for the investigation. I couldn't reproduce a scenario where my fix would fail. Your test fails simply because you never update the compilation hence the generator doesn't run the second time. This change makes the test pass (even without the fix in this PR):

     GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions);
-    verify(ref driver, compilation, new[] {
+    verify(ref driver, ref compilation, new[] {
         "// a",
         "// b",
         "// c"
         });

     items = ImmutableArray.Create(new[] { "a", "b" });

-    verify(ref driver, compilation, new[] {
+    verify(ref driver, ref compilation, new[] {
         "// a",
         "// b",
         });

-    static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
+    static void verify(ref GeneratorDriver driver, ref Compilation compilation, string[] generatedContent)
     {
-        driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
-        outputCompilation.VerifyDiagnostics();
+        driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out compilation, out var generatorDiagnostics);
+        compilation.VerifyDiagnostics();
         generatorDiagnostics.Verify();
         var trees = driver.GetRunResult().GeneratedTrees;
         Assert.Equal(generatedContent.Length, trees.Length);
         for(int i = 0; i < trees.Length; i++)
         {
             AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
         }
     }

However, your suggestion makes sense, I rewrote the fix.

@chsienki
Copy link
Contributor

chsienki commented Apr 4, 2023

🤦 Yep, you're right about that.

Here's a test that *does* fail with the original fix
 [Fact, WorkItem(61162, "https://github.com/dotnet/roslyn/issues/61162")]
        public void IncrementalGenerator_Collect_SyntaxProvider2()
        {
            var generator = new IncrementalGeneratorWrapper(new PipelineCallbackGenerator(static ctx =>
            {
                var invokedMethodsProvider = ctx.SyntaxProvider
                    .CreateSyntaxProvider(
                        static (node, _) => node is InvocationExpressionSyntax,
                        static (ctx, ct) => ctx.SemanticModel.GetSymbolInfo(ctx.Node, ct).Symbol?.Name ?? "(method not found)")
                    .Select((n, _) => n);

                ctx.RegisterSourceOutput(invokedMethodsProvider, static (spc, invokedMethod) =>
                {
                    spc.AddSource(invokedMethod, "// " + invokedMethod);
                });
            }));

            var source = """
                System.Console.WriteLine();
                System.Console.ReadLine();
                """;
            var parseOptions = TestOptions.RegularPreview;
            Compilation compilation = CreateCompilation(source, options: TestOptions.DebugExeThrowing, parseOptions: parseOptions);

            GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions);
            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// ReadLine"
            });

            replace(ref compilation, parseOptions, """
                System.Console.WriteLine();
                """);

            verify(ref driver, compilation, new[]
            {
                "// WriteLine"
            });

            replace(ref compilation, parseOptions, "_ = 0;");
            verify(ref driver, compilation, Array.Empty<string>());

            static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
            {
                driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
                outputCompilation.VerifyDiagnostics();
                generatorDiagnostics.Verify();
                var trees = driver.GetRunResult().GeneratedTrees;
                Assert.Equal(generatedContent.Length, trees.Length);
                for (int i = 0; i < generatedContent.Length; i++)
                {
                    AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
                }
            }

            static void replace(ref Compilation compilation, CSharpParseOptions parseOptions, string source)
            {
                compilation = compilation.ReplaceSyntaxTree(compilation.SyntaxTrees.Single(), CSharpSyntaxTree.ParseText(source, parseOptions));
            }
        }

The new proposed fix does make that test pass, but I'm no longer convinced its actually the correct thing to do.

I was originally thinking this was an issue that would affect any node that removed tail entries but realize now that it doesn't because we track the removed entries. So remains the question as to why we're getting a cached result as the input when it shouldn't be?

Looking at it I think the bug is actually in the SyntaxNodeProvider. When we construct the filter table, we will modify any existing items, but don't carry over the removed ones.

https://github.com/dotnet/roslyn/blob/main/src/Compilers/Core/Portable/SourceGeneration/Nodes/PredicateSyntaxStrategy.cs#L101-L104

This surfaces as a correctness problem when the removed items are at the end of the table, but it also surfaces as a perf issue when the removed items are in the middle of the table.

Consider the following test
        [Fact, WorkItem(61162, "https://github.com/dotnet/roslyn/issues/61162")]
        public void IncrementalGenerator_Collect_SyntaxProvider3()
        {
            var generator = new IncrementalGeneratorWrapper(new PipelineCallbackGenerator(static ctx =>
            {
                var invokedMethodsProvider = ctx.SyntaxProvider
                    .CreateSyntaxProvider(
                        static (node, _) => node is InvocationExpressionSyntax,
                        static (ctx, ct) => ctx.SemanticModel.GetSymbolInfo(ctx.Node, ct).Symbol?.Name ?? "(method not found)")
                    .Select((n, _) => n)
                    .WithTrackingName("Select");

                ctx.RegisterSourceOutput(invokedMethodsProvider, static (spc, invokedMethod) =>
                {
                    spc.AddSource(invokedMethod, "// " + invokedMethod);
                });
            }));
            var source1 = """
                System.Console.WriteLine();
                System.Console.ReadLine();
                """;

            var source2 = """
                class C {
                    public void M()
                    {
                        System.Console.Clear();
                        System.Console.Beep();
                    }
                }
                """;

            var parseOptions = TestOptions.RegularPreview;
            Compilation compilation = CreateCompilation(new[] { source1, source2 }, options: TestOptions.DebugExeThrowing, parseOptions: parseOptions);

            GeneratorDriver driver = CSharpGeneratorDriver.Create(new[] { generator }, parseOptions: parseOptions, driverOptions: new GeneratorDriverOptions(IncrementalGeneratorOutputKind.None, trackIncrementalGeneratorSteps: true));
            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// ReadLine",
                "// Clear",
                "// Beep"
            });

            // edit part of source 1
            replace(ref compilation, parseOptions, """
                System.Console.WriteLine();
                System.Console.Write(' ');
                """);

            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// Write",
                "// Clear",
                "// Beep"
            });

            Assert.Collection(driver.GetRunResult().Results[0].TrackedSteps["Select"],
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Modified, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason)
                );

            // remove second line of source 1
            replace(ref compilation, parseOptions, """
                System.Console.WriteLine();
                """);

            verify(ref driver, compilation, new[]
            {
                "// WriteLine",
                "// Clear",
                "// Beep"
            });

            Assert.Collection(driver.GetRunResult().Results[0].TrackedSteps["Select"],
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Removed, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason),
                (r) => Assert.Equal(IncrementalStepRunReason.Cached, r.Outputs.Single().Reason)
                );

            static void verify(ref GeneratorDriver driver, Compilation compilation, string[] generatedContent)
            {
                driver = driver.RunGeneratorsAndUpdateCompilation(compilation, out var outputCompilation, out var generatorDiagnostics);
                outputCompilation.VerifyDiagnostics();
                generatorDiagnostics.Verify();
                var trees = driver.GetRunResult().GeneratedTrees;
                Assert.Equal(generatedContent.Length, trees.Length);
                for (int i = 0; i < generatedContent.Length; i++)
                {
                    AssertEx.EqualOrDiff(generatedContent[i], trees[i].ToString());
                }
            }

            static void replace(ref Compilation compilation, CSharpParseOptions parseOptions, string source)
            {
                compilation = compilation.ReplaceSyntaxTree(compilation.SyntaxTrees.First(), CSharpSyntaxTree.ParseText(source, parseOptions));
            }
        }

This will fail on the last collection assertion. The table is returning [[WriteLine, Cached], [Clear, Modified], [Beep, Modified]]. Clear and Beep weren't modified, and so their downstream transforms shouldn't run but they do. This occurs because they're replacing the 'missing' Write which isn't tracked as being removed.

Basically I think we're not correctly tracking removals in the PredicateSyntaxStrategy.

In general, I think there should probably be an invariant that the table size going in is the same as what comes out, before compaction. In practice this isn't true today, as we implicitly track empty slots, meaning tables can (and do) get shorter. We might want to think about changing that, as I think its probably been the route cause of a lot of these removal bugs. But until we do (its not a totally trivial amount of work) lets keep the 'we only cache if the table size is the same' logic.

HasTrackedSteps = hasTrackedSteps;

static bool areStatesCompatible(ImmutableArray<TableEntry> previous, ImmutableArray<TableEntry> current)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets calculate this in the builder, and just pass in an 'IsCached' flag instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this computation into the builder. I just realized you originally suggested we keep track of the flag throughout the builder, but it seems computing it once is simpler.

@chsienki
Copy link
Contributor

chsienki commented Apr 4, 2023

@jjonescz @jaredpar I think we should probably take this almost as-is for 17.6, while recognizing that its really just patching over the underlying issue. On main we can fix the SyntaxStrategy issue, then decide even later on if we want to consider re-architecting the way we handle the empty entries or not.

@jaredpar
Copy link
Member

jaredpar commented Apr 4, 2023

@chsienki if you both agree let's do that. I'd like to open a new issue that details what needs to change for the more complete fix as a part of merging this. That way we can track the work.

@jjonescz
Copy link
Member Author

jjonescz commented Apr 4, 2023

if you both agree let's do that. I'd like to open a new issue that details what needs to change for the more complete fix as a part of merging this. That way we can track the work.

I agree. Opened issues per Chris work item suggestions (one for the SyntaxNodeProvider fix we should do in main and the other for re-architecting the way we handle the empty entries):

@jjonescz jjonescz changed the title [17.6] Check count before reusing cached items in BatchNode [17.6] Cache source generator node tables only if their state counts match Apr 4, 2023
@jjonescz
Copy link
Member Author

jjonescz commented Apr 5, 2023

@cston @dotnet/roslyn-compiler for a second review

@cston cston dismissed their stale review April 5, 2023 18:29

Stale review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incremental Source Generator - Not generating empty file, when last syntax node is deleted
4 participants