-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cosmos: Add Include support #16920
Comments
This is cross-document Include--backlogging. However... can you do in-memory Include? |
Preview 5 appears to have this functioning. Is there a reason it was removed in later work? |
@TeaBaerd The query pipeline had to be rewritten to remove the Relinq dependency |
Bummer. Thank you for the explanation @AndriySvyryd. |
Don't forget to vote (👍) for the features that are most important for you so we can prioritize better. |
If I am missing something please let me know, and I mean no offense by this. I would like to mention that, in my experience, not having I was attempting to query an inverse collection property and then filter it into a dictionary with each element key being a property of a navigation property on an element of the aforementioned collection, which would require
This initially led me to believe that my problem was related to #16730, but as I continued to change the rest of my query to eventually use Cosmos DB is paid for in RU/s; if people are legitimately expected to use this provider, then it would be appreciated if this basic quality of life feature, that can do what would otherwise require The following is what was attempted before finding true culprit. await entry.Collection(profile => profile.Aliases).Query().Include(alias => alias.Platform).ToDictionaryAsync(alias => alias.Platform.Name, alias => alias.Identification); await entry.Collection(profile => profile.Aliases).Query().Include(alias => alias.Platform).Select(alias => new { Platform = alias.Platform.Name, Identification = alias.Identification }).ToDictionaryAsync(alias => alias.Platform, alias => alias.Identification); (await entry.Collection(profile => profile.Aliases).Query().Include(alias => alias.Platform).Select(alias => new { Platform = alias.Platform.Name, Identification = alias.Identification }).ToListAsync()).ToDictionary(alias => alias.Platform, alias => alias.Identification); The final solution is a lot more wasteful clunky than it would have been with include, but it works. aliases = new Dictionary<string, string> { };
await foreach (Alias alias in entry.Collection(profile => profile.Aliases).Query().AsAsyncEnumerable())
{
await context.Entry(alias).Reference(alias => alias.Platform).LoadAsync();
aliases[alias.Platform.Name] = alias.Identification;
} I've seen somewhere that Cosmos DB may not even support I would appreciate any help to make what I ended up with more efficient. |
@TheFanatr Your understanding is mostly correct. Include is indeed a useful feature, but EF will only be able to issue a more efficient query if the included data is in the same collection and the same partition. It's unlikely that we would support other kind of Include, as it would be equivalent to the code you posted above. Consider embedding the related data by using owned types if performance is a concern. |
can't wait for this feature |
Please add support for this 🙏 |
Is there any news if this feature will be supported in / by v6? |
@braidenstiller We're currently figuring that out. |
Hello Guys, I'm sorry I've been scouring the docs for hours on end so I'm just gonna ask a related question here because I can't find any answer in the docs. I understand there is a limitation with the include function and it is not available for the Cosmos provider. If so, what alternatives do I have other than embedding the one document into another? I mean, embedding works fine, except that I cannot directly query for the embedded entity from the context. So I am forced to traverse the hierarchy. Example:
The problem is that querying for Regardless of the inefficiency, it results in very clunky code, with null checks and so on, not mention the need to copy those results into another object to return a non-context Dto back to the user (it is a WebApi). On the other hand if I try to use embedded entities:
This all works fine, except that there is no way for me to query the context for SubEntities directly.
I completely understand why is this happening, but it kinda seems I'm out of options. I can't use Include, and I can't use embedding, and using neither results in a very ugly situation. Am I missing something here? Is there a way to make this cleaner that I just couldn't find? |
Is it at all possible to Include (or something equivalent) cross-document (but not cross-collection) related data right now? |
@helshabini any others, the Cosmos provider will default to implicit ownership in 6.0; this means that navigations will be automatically populated when you load the root entity type, without requiring Include. This issue could still track implementing cross-document include. |
That's great news. This makes the Cosmos provider far more usable. Thanks. |
+1 on still tracking cross document include as this would be incredibly useful. Some workloads require cross document relationships, so being able to load them in a idiomatic way would be great. |
@braidenstiller I'm not a Cosmos expert, but the crucial point here is that Cosmos itself doesn't seem to support cross-document joins, which is a typical characteristic of document databases (as opposed to relational databases). Assuming I haven't missed anything, that would mean that EF Core would have to pull back documents and perform joins client-side, which is something I strongly believe it should not do, since that's likely to be extremely inefficient. If that's something you really want, you should be able to easily express this as follows: _ = ctx.Foo
.AsEnumerable()
.Join(ctx.Bar, f => f.Id, b => b.FooId, (f, b) => new { Foo = f, Bar = b }); The AsEnumerable triggers the client-side evaluation, and makes it extremely clear what is going on here (and the fact that this is a potentially problematic query perf-wise). That shouldn't be hidden behind an innocuous-looking Include (especially for people coming from relational databases, who are used to that happening database-side). |
@roji that's a fair argument. I was coming at this as a limitation that ef core could potentially address instead as a feature enhancement on top of cosmos. Unfortunately that's correct, due to lack of include support on the cosmos side the query would have to be done client side. Personally I think I the potential performance tradeoff is worth it as some relationships just can't be embedded (unbound related lists for example). Official cosmos documentation even advises to use related documents in these cases: Include would be useful here but perhaps it could be enabled in configuration as a feature flag and not implicitly executed. I haven't tested that explicit join syntax exactly but I'm fairly certain even that isn't supported at the moment. You have to retrieve the Id and query it explicitly (and importantly seperatly) from the root document query. |
That's a fair point and thanks for the link. My gut feeling is that the misuse potential of an EF Core Include implementation (considerably) outweighs the advantage, especially given that it's already possible to express the same operation explicitly via a client-side join (and if it isn't currently, it should be). This feels like the type of thing that shouldn't be made too easy, and especially too implicit/under the hood, but rather be a very explicit thing users opt into on a query-by-query basis.
That may be true, but that depends on exactly what your join conditions are like (you may do a join on any arbitrary properties on both the principal and dependent sides). |
@roji Maybe the way forward is, resolve this issue, and track another for the join capabilities (I think there may be one already). |
I did some research into this, and I do believe we can do something which makes sense. First, as pointed out by @braidenstiller above, the Cosmos docs do recommend using normalized modeling in certain scenarios, i.e. splitting information across multiple documents as an alternative to embedding all data in a single document. To implement this efficiently, we could use a multiple query loading strategy which would be somewhat similar to the split query implemented for relational databases. Assuming a one-to-many Blogs/Posts model and a query with a predicate over Blogs, we'd:
var blogIds = new[] { "2", "3", "123" };
var query = new QueryDefinition("SELECT c.Id, c.Name FROM Posts c WHERE ARRAY_CONTAINS(@BlogIds, c.BlogId)")
.WithParameter("@BlogIds", blogIds); The above approach efficiently uses an index over c.BlogId. However, before implementation we should confirm that a large number of IDs can be sent like this, without some arbitrary Cosmos upper limit or perf degradation. Note that the above assumes a model where each Post has a BlogId, similar to relational foreign key modeling. Cosmos also supports each Blog having a list of Post IDs instead, in which case we'd read that property and load the Posts directly by their IDs in the second query. Although we don't have to support this model style, Cosmos many-to-many modeling (#23523) involves an array of IDs on each side (instead of a join table); the one-to-many array modeling could be done as part of that. Note that the above is only a design - we don't currently have plans to implement this for 7.0. |
@roji The problem with not having at least a feature-flag-controlled client-side (inefficient) Include() is with 3rd party libraries where you can't control the implementation. Not sure if this is a good example but consider OData. I'm having a small application with a small dataset where there's not a big problem to do some client-side joining. Of course OData + EFCore makes a great development stack but on top of CosmosDB I can't use $expand properly to pull also navigation property data. I don't (think) I have control over OData implementation so I cannot simply explicitly .AsEnumerable(). This may be true with other libraries also. Ideally for such situations in which performance is not a concern I could explicitly turn on some client-side processing flag. |
@jazzmanro the proposal in the above comment (#19526) would implement a pretty efficient join that's comparable to the split query feature we have for relational database, and doesn't pull entire document collections like AsEnumerable would do. Assuming there's no blocker to implementing this, it would be available by default without any need for an opt in (so OData would work out of the box). |
@roji sounds really great, wasn't sure exactly what the proposal would cover. Can't wait to have this. |
Almost two years later of "silence" - are there any plans for this feature ( I am running into a scenario where I have an aggregator root (entity) that contains another entity that has its own identity. I want to store the aggregator root and the entity in separate documents (call them "Invoices" and "Merchants")... yet if I do this, I will be unable to perform the following query: DbSet<Invoices>().Include(invoice => invoice.Merchant)
or..
DbSet<Invoices>().Include(invoice => invoice.Merchant).ToList() As @TheFanatr has pointed out previously, I get into the same problem and wild goose chasing - the $exception field being populated with |
following |
And support mapping owned types to other containers
The text was updated successfully, but these errors were encountered: