Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an IncludeInner method #24955

Closed
GertArnold opened this issue May 22, 2021 · 10 comments
Closed

Add an IncludeInner method #24955

GertArnold opened this issue May 22, 2021 · 10 comments
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported

Comments

@GertArnold
Copy link

GertArnold commented May 22, 2021

The relational database model doesn't have a concept of mandatory children since ownership is defined by a foreign key in a child relation. As a consequence, in SQL, Include always has to translate to OUTER JOIN because there's no formal language to configure the association as 1-1..n (certainly not vendor-independent). It's always 1-0..n. A statement like

context.Parents.Include(p => p.Children)

should, of course, return all parent records, not just the ones with children. Hence the outer join.

There are three reasons why this OUTER JOIN may not always be desired.

  1. It's not uncommon for parents without children to be invalid or meaningless. Fetching them from the database by plain SQL would always involve an INNER JOIN because there is foreknowledge of the database content. (Which in such cases will be enforced by client-side validation).
  2. Irrespective of database content, developers may want to Include children and at the same time filter only parents with children. Here, INNER JOIN is better than OUTER JOIN + a predicate. Again, in plain SQL that would be a no-brainer.
  3. INNER JOINs may perform significantly better than OUTER JOINs, esp. in more complex queries, it would be useful for developers to have control over the type of join EF will execute.

So I wonder if there is a case for adding an IncludeInner (+ ThenIncludeInner) method that does exactly that: always translate to an inner join?

This will also work with filtered includes. It can also be made to work with split queries: one query without results for a parent will remove the parent from the end result.

It is expected from developers to have sufficient knowledge of SQL to understand how INNER and OUTER joins work an how they affect one another, esp. how one INNER JOIN can turn previous OUTER JOINs into de-facto INNER joins.

@roji
Copy link
Member

roji commented May 22, 2021

Rather than a special IncludeInner operator, users could express this via something like:

ctx.Parents.Include(b => b.Children).Where(p => p.Children.Any()).ToList();

This current version generates an OUTER JOIN and an EXISTS subquery, but maybe we could identify this pattern and generate an INNER JOIN instead... /cc @smitpatel

Regardless, @GertArnold there are various other scenarios where EF Core already generates an INNER JOIN, based on the precise situation and model configuration. Whether EF generates an OUTER or INNER join generally shouldn't be something the user specifies directly, but is rather inferred from EF based on the query and the model.

@smitpatel
Copy link
Contributor

Instead of going into pattern matching, we could have metadata API to configure that collection will always have 1 child. Just like how we have API to market that principal to dependent reference navigation is required. (i.e. Child will always exist for the given principal). This will be useful for any nav expansion not just include.

Apart from that, users are free to write LINQ when their joins contain additional information not configured in metadata manually using existing LINQ operators to generate inner/outer join as needed.

@ajcvickers
Copy link
Member

We discussed this in triage. As @smitpatel said, we should consider expanding optional/required dependents in metadata to better optimize Include for the model. @AndriySvyryd @smitpatel I can't find any existing issue to cover this.

In addition, we should have an issue on the backlog to support generating more efficient queries for cases for queries that assert requiredness without that being in the model.

@Liero
Copy link

Liero commented Jun 10, 2021

I went to github asking for the same as @GertArnold.

I would be perfectly ok with .Where(p => p.Children.Any()),
however from my experience it could be troublesome with filtered include:

Regarding model metadata: OK, but I still want to be able to to force inner join when writing query.
I often have situation, where Parent can exist without any Child, but I want to write special query with inner join.

I would like to be able to write query like:

SELECT * FROM Parents p INNER JOIN Children c ON p.Id = c.ParentId
WHERE c.Type = 1

which would return IQueryable<Parent> at the end.

ctx.Parents.Include(b => b.Children.Where(p => p.Type = 1)).Where(p => p.Children.Any()).ToList();

Use Case

I needed only parents with specific children in my OData endpoint and I find it difficult to with RawSql / DB View, because how do I apply Skip/Take or Count() at Parents' level?

I've ended up with two options:

  1. "OUTER JOIN and an EXISTS subquery" which too slow
  2. or stored procedure where I have to apply all OData operators manually, which out of my league

@smitpatel
Copy link
Contributor

You can always write a query with whatever joins you need without using navigations directly if you want to deviate from the configured metadata in the model.

@Liero
Copy link

Liero commented Jun 11, 2021

@smitpatel: if you mean this, then no, because it returns parents x children, not parents with children.

I need IQueryable<Parent> on the output, so the OData can do it's magic at the end.

@TrobinsonCincom
Copy link

I have to upvote the original suggestion and disagree with the idea that optional/required metadata should decide whether inner or outer joins should be used. You don't define foreign keys as an "inner foreign keys" or "outer foreign keys". Foreign keys are supposed to allow you to write both inner and outer joins depending on the scenario. Currently you have to write complex LINQ queries to get an inner join on a many to many relationship. Whereas if you were writing raw SQL you would just have to replace a single keyword with LEFT or INNER.

@begerard
Copy link

begerard commented Oct 1, 2021

Hey @smitpatel, do you have any update about this problem?

The current status is either slow SQL or RAW SQL (as Liero explained well), and I can't see how the metadata would help as it does depend of what I want to do with one Include specifically, not all the Includes on a navigation property.

@ajcvickers ajcvickers added closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. and removed type-enhancement area-perf propose-close area-query labels Nov 10, 2021
@ajcvickers ajcvickers removed this from the Backlog milestone Nov 10, 2021
@1nfected
Copy link

I happened to stumble upon a similar problem.

The way I got around this was to reverse my query, i.e. instead of querying the parent and including the child, I queried the child and included the parent, and then finally selecting the parent from the child and doing a distinct.

Something like:

var children = context.Children.Where(child => child.Type == 1).Include(child => child.Parent).ToList();
var parents = children.Select(child => child.Parent).Distinct();

This produces the desired query with INNER JOIN instead of LEFT JOIN.

@GertArnold
Copy link
Author

@1nfected Yes, that's a work-around. Note that the child collections aren't marked as loaded which is awkward when lazy loading is enabled: it should be disabled temporarily when accessing the collections. Also, when using AsNoTracking, there's no identity resolution by default and, hence, no distinct parents.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2022
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported
Projects
None yet
Development

No branches or pull requests

8 participants