Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize use of global filter when filtering is already done implicitly through joins #22914

Open
Tracked by #21459
stevendarby opened this issue Oct 7, 2020 · 6 comments

Comments

@stevendarby
Copy link
Contributor

stevendarby commented Oct 7, 2020

Background:

  • All entities have a TenantId column, which forms part of the primary key along with the Id.
  • All relationships between tables involve the TenantId from both tables.
  • A global filter is added to all entities to filter on TenantId = the current tenant ID.

Currently:
When querying and starting with one entity and including others, the query will apply the filter to every table using a separate input parameter with the current tenant ID for each table.

What I'd like to see:
One input parameter to filter the primary table, and then all the other tables that are joined to will be implicitly filtered by that through the join condition on TenantId. This is how I would write the query manually, but I appreciate that it may be easier said than done when it comes to translating expressions. I hope this is valid for consideration though.

@AndriySvyryd
Copy link
Member

@Snappyfoo Could you share the query where you experience this issue?

@stevendarby
Copy link
Contributor Author

stevendarby commented Oct 9, 2020

Sure, here's an example (not production code but illustrates what I mean)

public class Program
{
    public static void Main(string[] args)
    {
        var loggerFactory = new LoggerFactory();
        loggerFactory.AddProvider(new DebugLoggerProvider());

        using var connection = new SqliteConnection("Data Source=:memory:");
        connection.Open();

        var options = new DbContextOptionsBuilder()
            .UseSqlite(connection)
            .UseLoggerFactory(loggerFactory)
            .EnableSensitiveDataLogging()
            .Options;

        using var context = new MyDbContext(options, 42);
        context.Database.EnsureCreated();

        _ = context.Set<Post>().Where(x => x.Blog.Author.Name == "Bob").ToList();
    }
}

public class MyDbContext : DbContext
{
    public int TenantId { get; }

    public MyDbContext(DbContextOptions options, int tenantId) : base(options)
    {
        TenantId = tenantId;
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);

        var authorBuilder = modelBuilder.Entity<Author>();
        authorBuilder.HasKey(x => new {x.Id, x.TenantId});
        authorBuilder
            .HasMany(x => x.Blogs)
            .WithOne(x => x.Author)
            .HasForeignKey(x => new {x.AuthorId, x.TenantId});
        authorBuilder.HasQueryFilter(x => x.TenantId == TenantId);

        var blogBuilder = modelBuilder.Entity<Blog>();
        blogBuilder.HasKey(x => new { x.Id, x.TenantId });
        blogBuilder
            .HasMany(x => x.Posts)
            .WithOne(x => x.Blog)
            .HasForeignKey(x => new { x.BlogId, x.TenantId });
        blogBuilder.HasQueryFilter(x => x.TenantId == TenantId);

        var postBuilder = modelBuilder.Entity<Post>();
        postBuilder.HasKey(x => new { x.Id, x.TenantId });
        postBuilder.HasQueryFilter(x => x.TenantId == TenantId);
    }
}

public abstract class EntityBase
{
    public int Id { get; set; }
    public int TenantId { get; set; }
}

public class Author : EntityBase
{
    public string Name { get; set; }
    public ICollection<Blog> Blogs { get; set; }
}

public class Blog : EntityBase
{
    public string Url { get; set; }
    public int AuthorId { get; set; }
    public Author Author { get; set; }
    public ICollection<Post> Posts { get; set; }
}

public class Post : EntityBase
{
    public string Title { get; set; }
    public string Content { get; set; }
    public int BlogId { get; set; }
    public Blog Blog { get; set; }
}
Microsoft.EntityFrameworkCore.Database.Command: Information: Executed DbCommand (7ms) [Parameters=[@__ef_filter__TenantId_1='42' (DbType = String), @__ef_filter__TenantId_2='42' (DbType = String), @__ef_filter__TenantId_0='42' (DbType = String)], CommandType='Text', CommandTimeout='30']
SELECT "p"."Id", "p"."TenantId", "p"."BlogId", "p"."Content", "p"."Title"
FROM "Post" AS "p"
INNER JOIN (
    SELECT "b"."Id", "b"."TenantId", "b"."AuthorId", "b"."Url"
    FROM "Blog" AS "b"
    WHERE "b"."TenantId" = @__ef_filter__TenantId_1
) AS "t" ON ("p"."BlogId" = "t"."Id") AND ("p"."TenantId" = "t"."TenantId")
INNER JOIN (
    SELECT "a"."Id", "a"."TenantId", "a"."Name"
    FROM "Author" AS "a"
    WHERE "a"."TenantId" = @__ef_filter__TenantId_2
) AS "t0" ON ("t"."AuthorId" = "t0"."Id") AND ("t"."TenantId" = "t0"."TenantId")
WHERE ("p"."TenantId" = @__ef_filter__TenantId_0) AND ("t0"."Name" = 'Bob')

Each entity type needs a TenantId query filter applied to them when creating the model because each may be accessed individually. However, when a query involves multiple entities, and the joins between them contain a property that is being filtered with a common value in such a way as above, the filter only really needs to be applied via the final WHERE clause on the Post.TenantId - at least, coming from the perspective of how I would write the query manually*.

How generalizable this scenario is, I'm not sure. I also haven't delved into what this means for performance; this is driven mostly by a sense that it feels wrong to repeat the parameters and conditions, especially in much larger queries crossing many more tables.

* There would be other changes if I wrote it manually, but there is already a separately tracked issue for query filters causing joins to expand into large sub-queries with all columns selected.

@smitpatel
Copy link
Member

Cannot make determination of performance directly. If the table has a lot of data for a large number of tenant ids but only few rows for given tenant then applying tenant filter before join is perf beneficial.

Linking to #21459

@AndriySvyryd AndriySvyryd added this to the 6.0.0 milestone Oct 12, 2020
@stevendarby
Copy link
Contributor Author

Been thinking about this one again @smitpatel. It's interesting to know that the filter before join can be perf beneficial in some cases. I suppose if you were to take this any further, you would need to investigate the affect on perf under various scenarios.

One area I do not have enough knowledge on is how query plans are generated on the server, their quality, how the server optimizes and so on. It is conceivable to me that a query with lots of parameters is harder for the server to optimize. And where we have queries that join to dozens of entities, each with a TenantId filter, we have dozens of parameters.

Because of that I wondered - even if filtering before the join does turn out to be beneficial and something to keep - then as a separate related issue to consider: couldn't a single parameter be used in all those filters? In the LINQ, the same property is used to get the filter value so it seems feasible that the same SQL parameter could be used too. This just seems neater, if nothing else.

Thanks for considering this for 6.0.0.

@stevendarby
Copy link
Contributor Author

I have raised the issue of too many parameters as a separate issue: #24476

This issue remains more focused on the fact the condition might not have to be applied to each entity directly if it's implicitly applied through the join condition.

@ajcvickers ajcvickers modified the milestones: 6.0.0, Backlog May 6, 2021
stevendarby pushed a commit to stevendarby/efcore that referenced this issue Oct 25, 2022
@stevendarby
Copy link
Contributor Author

stevendarby commented Jul 24, 2023

Just coming back to this... I do wonder if detecting this pattern with global filters might be more trouble than it's worth, so I'm thinking if there's actually another feature request at the heart of this.

Just to recap: the only reason we use a TenantId query filter on every entity is so that when we can use any entity as the starting DbSet in a query and have it apply the TenantId filter to that for us. TenantId is always part of the composite key in navigations to other entities that may also be involved in the query thereafter, and thus those joins act as implicit filters for those. We don't really need explicit filters for every entity in the query.

Therefore I think what I'd really like is the ability to apply a filter to any entry 'DbSet' involved in a query. This is usually just one, when navigations are used for other entities, but there could be multiple when doing manual joins and so on. I wonder if this is already achievable with IQueryExpressionInterceptor? Not sure where to start but if someone thinks it might be doable and could give a couple of pointers, that'd be appreciated. Thanks

Note: the main motivation at this point is just SQL neatness. Entities with query filters get expanded out to subqueries and it make queries involving dozens of entities quite verbose and a bit harder to read through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants