Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal SQL generation for query with optional navigation, its collection navigation and lateral join #27072

Closed
imb590 opened this issue Dec 29, 2021 · 7 comments
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-bug
Milestone

Comments

@imb590
Copy link

imb590 commented Dec 29, 2021

Consider the following code:

using System.Collections.Generic;
using System.ComponentModel.DataAnnotations.Schema;
using System.Linq;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;

using var db = new Context();

var result = db.Set<EntityOne>()
    .Select(o => new
    {
        Ids = o.Two!.Children!.Select(c => new { ChildId = c.Id, ParentId = o.Two.Id }),
    });

System.Console.WriteLine(result.ToQueryString());

public class Context : DbContext
{
    private static ILoggerFactory LoggerFactory => new ServiceCollection().AddLogging(l => l.AddConsole().SetMinimumLevel(LogLevel.Trace)).BuildServiceProvider().GetRequiredService<ILoggerFactory>();

    public DbSet<EntityOne> Ones => Set<EntityOne>();
    public DbSet<EntityTwo> Twos => Set<EntityTwo>();

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder
            .UseNpgsql("Host=stub")
            .EnableSensitiveDataLogging()
            .UseLoggerFactory(LoggerFactory);
    }
}

public class EntityOne
{
    public int Id { get; set; }

    public int? TwoId { get; set; }

    [ForeignKey(nameof(TwoId))]
    public EntityTwo? Two { get; set; }
}

public class EntityTwo
{
    public int Id { get; set; }

    public int? ParentId { get; set; }

    [ForeignKey(nameof(ParentId))]
    public EntityTwo? Parent { get; set; }

    [InverseProperty(nameof(Parent))]
    public ICollection<EntityTwo>? Children { get; set; }
}

The generated SQL is:

SELECT o."Id", t."Id", t0."ChildId", t0."ParentId"
FROM "Ones" AS o
LEFT JOIN "Twos" AS t ON o."TwoId" = t."Id"
LEFT JOIN LATERAL (
    SELECT t1."Id" AS "ChildId", t."Id" AS "ParentId"
    FROM "Twos" AS t1
    WHERE (t."Id" = t1."ParentId") OR ((t."Id" IS NULL) AND (t1."ParentId" IS NULL))
) AS t0 ON TRUE
ORDER BY o."Id", t."Id"

It is very suboptimal, since due to the unnecessary OR ((t."Id" IS NULL) AND (t1."ParentId" IS NULL)) clause in the inner query, it will iterate over all EntityTwo having ParentId = null for every EntityOne with TwoId = null, creating a new output row on every iteration.

It's possible to get rid of the iteration in the database with a workaround like:

db.Set<EntityOne>()
    .Select(o => new
    {
        Ids = o.Two!.Children!.Where(c => c.ParentId != null).Select(c => new { ChildId = c.Id, ParentId = o.Two.Id }),
    });

provider and version information

EF Core version: 6.0.1
Database provider: Npgsql.EntityFrameworkCore.PostgreSQL 6.0.2
Target framework: .NET 6

@roji
Copy link
Member

roji commented Dec 29, 2021

Can you please reopen this in https://github.com/dotnet/efcore? This is a general SQL generation question rather than anything specific to Npgsql.

@roji roji closed this as completed Dec 29, 2021
@imb590
Copy link
Author

imb590 commented Dec 29, 2021

isn't this already in https://github.com/dotnet/efcore?

@roji
Copy link
Member

roji commented Dec 29, 2021

Oh apologies, I misread the repo name, going a bit too quickly here.

@smitpatel
Copy link
Contributor

cc: @maumar

@maumar
Copy link
Contributor

maumar commented Jan 21, 2022

when nav expansion is rewriting the collection navigation we add a null check for the outer Id being not null

DbSet<EntityOne>()
    .LeftJoin(
        inner: DbSet<EntityTwo>(), 
        outerKeySelector: e => EF.Property<int?>(e, "TwoId"), 
        innerKeySelector: e0 => EF.Property<int?>(e0, "Id"), 
        resultSelector: (o, i) => new TransparentIdentifier<EntityOne, EntityTwo>(
            Outer = o, 
            Inner = i
        ))
    .Select(e => new { Ids = DbSet<EntityTwo>()
        .Where(e1 => EF.Property<int?>(e.Inner, "Id") != null && object.Equals(
            objA: (object)EF.Property<int?>(e.Inner, "Id"), 
            objB: (object)EF.Property<int?>(e1, "ParentId")))
        .Select(e1 => new { 
            ChildId = e1.Id, 
            ParentId = e.Inner.Id
         }) })

but we get rid of it (as a redundant null check) when we translate to JOIN/APPLY. So when null semantics rewriter processes the query it applies the OR ((t."Id" IS NULL) AND (t1."ParentId" IS NULL)) term because both sides are nullable. If we kept the null check as it was, we won't need to add them (since we do simplified null semantics here) and the result query would look something like this:

SELECT [o].[Id], [t].[Id], [t0].[ChildId], [t0].[ParentId]
FROM [Ones] AS [o]
LEFT JOIN [Twos] AS [t] ON [o].[TwoId] = [t].[Id]
OUTER APPLY (
    SELECT [t1].[Id] AS [ChildId], [t].[Id] AS [ParentId]
    FROM [Twos] AS [t1]
    WHERE [t].[Id] IS NOT NULL AND [t].[Id] = [t1].[ParentId]
) AS [t0]
ORDER BY [o].[Id], [t].[Id]

suggestion: we should keep the null check if we can't convert from APPLY to JOIN

@ajcvickers ajcvickers added this to the 7.0.0 milestone Jan 22, 2022
@maumar maumar removed this from the 7.0.0 milestone Jan 25, 2022
@maumar
Copy link
Contributor

maumar commented Jan 25, 2022

note: this is data corruption, but not a regression (ef5 had the bug already). @ajcvickers @smitpatel thoughts on patching this?

@maumar maumar added this to the 7.0.0 milestone Jan 25, 2022
@smitpatel smitpatel removed this from the 7.0.0 milestone Jan 26, 2022
@smitpatel
Copy link
Contributor

It should be easy fix by just remembering the earlier predicate rather than recreating it from join predicate extraction. Due to low risk and data corruption, we should patch this. Let's discuss in team meeting.

maumar added a commit that referenced this issue Jan 27, 2022
…igation, its collection navigation and lateral join

When expanding collection navigation we added a check to filter out rows for which parent value is null and only afterwards match inner and outer keys. This could happen when chaining collection navigation after optional reference. Problem was that during translation we try to convert the collection subquery to a join and if that succeeds we don't need the null check anymore - joins key comparison doesn't match nulls. However, we also remove the null check when we are not able to convert to JOIN and use APPLY instead.

Fix is to extract two versions of join predicate (with and without the null checks) and then use the one that we need, once we know whether JOIN or APPLY will be used.
maumar added a commit that referenced this issue Jan 28, 2022
…igation, its collection navigation and lateral join

When expanding collection navigation we added a check to filter out rows for which parent value is null and only afterwards match inner and outer keys. This could happen when chaining collection navigation after optional reference. Problem was that during translation we try to convert the collection subquery to a join and if that succeeds we don't need the null check anymore - joins key comparison doesn't match nulls. However, we also remove the null check when we are not able to convert to JOIN and use APPLY instead.

Fix is to save the original predicate before we try to extract the join predicate and then for APPLY case re-apply the original predicate and only use the extracted join predicate for the JOIN case.
maumar added a commit that referenced this issue Jan 28, 2022
…igation, its collection navigation and lateral join

When expanding collection navigation we added a check to filter out rows for which parent value is null and only afterwards match inner and outer keys. This could happen when chaining collection navigation after optional reference. Problem was that during translation we try to convert the collection subquery to a join and if that succeeds we don't need the null check anymore - joins key comparison doesn't match nulls. However, we also remove the null check when we are not able to convert to JOIN and use APPLY instead.

Fix is to save the original predicate before we try to extract the join predicate and then for APPLY case re-apply the original predicate and only use the extracted join predicate for the JOIN case.

Fixes #27072
@ajcvickers ajcvickers added this to the 6.0.x milestone Jan 28, 2022
@ajcvickers ajcvickers modified the milestones: 6.0.x, 6.0.3 Feb 14, 2022
@ajcvickers ajcvickers added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. and removed Servicing-consider labels Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-bug
Projects
None yet
Development

No branches or pull requests

5 participants