Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos: Optimize Find by querying by 'id' #17310

Closed
AndriySvyryd opened this issue Aug 20, 2019 · 14 comments
Closed

Cosmos: Optimize Find by querying by 'id' #17310

AndriySvyryd opened this issue Aug 20, 2019 · 14 comments
Labels
area-cosmos area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@AndriySvyryd
Copy link
Member

AndriySvyryd commented Aug 20, 2019

Use ReadItem
Include the partition key if present

@AndriySvyryd
Copy link
Member Author

AndriySvyryd commented Nov 22, 2019

Depends on #7391 to make the general case work, but before that is done that we can at least handle the case when the PK is id and the partition key if present.

@AndriySvyryd
Copy link
Member Author

@1iveowl The other option I mentioned in the comment above is to just handle the simple case where the PK consists of id and the partition key, so the user would be calling Find(partitionKey, id) or Find(id, partitionKey) depending on how they’ve configured it.

The next step up from this would be handling PKs that include the partition key, but don’t include the id directly. In those cases we usually can calculated the id value using the value generator configured on it, which by default is IdValueGenerator

The above can be accomplished by detecting the query pattern generated by EntityFinder and instead of translating the query based on SelectExpression you'd need to create something like ReadItemExpression. Thinking about this it's actually a much more difficult change than I originally assumed.

@smitpatel for thoughts or advice

@AndriySvyryd
Copy link
Member Author

AndriySvyryd commented Mar 19, 2020

@1iveowl If the above doesn't fit in the time you can dedicate for this creating the extension you've proposed - DbSet<>. FindWithPartitionKeyAsync(partitionKey, primaryKey) would also be an acceptable temporary solution. For the implementation you can create a class derived from EntityFinder that would handle this.

@smitpatel
Copy link
Contributor

ShapedQueryExpression does allow having an expression different than SelectExpression. It may require some conditional code in ShapedQueryCompilingExpressionVisitor. Or we can factor it differently to call shaper specific part outside of query pipeline. It would be a difficult change. Shaper compiler on cosmos is more complicated than core.

@1iveowl
Copy link
Contributor

1iveowl commented Mar 20, 2020

@AndriySvyryd, @smitpatel Let's lock-in on the API first.

A few questions

...so the user would be calling Find(partitionKey, id) or Find(id, partitionKey) depending on how they’ve configured it

Where and how is the configuration of the DbSet determining the order of the two values specified/implied?

... this would be handling PKs that include the partition key, but don’t include the id directly. In those cases we usually can calculated the id value using the value generator configured on it, which by default is [IdValueGenerator] ...

How will DbSet<>Find(???) look for this? Would it be like this: DbSet<>.Find(myPartitionKey);? If so, how would that work if there're multiple documents with the same entity type in Cosmos and with the same Partition Key? Should Find return a list of those documents in that case?

For Find to be an effective and generic way to read one specific document - fast and with ReadItem and from a specific partition - I think that it is a prerequisite that the developer provide both the Primary Key (or the cosmos id) and the Partition Key, and in a structured way where it's clear which is what.

Suggestion:

Could the API be something like this: DbSet<>.Find("myPartitionKey:myId")?

In which case, we assume a string and introduce a Cosmos specific syntax for Find using : (or some other character or string) as the delimiter between the partition key and the id.

If no delimiter (:) is present (i.e. DbSet<>.Find("<some id>")) we assume that this <some id> is the myPartitionKey and consequently return a list in the case where there're more than one document with the entity type, in the partition.

The reason we don't assume myId in the single value scenario, is that this wouldn't work if the access key supplied is a resource token, rather than a master key. Alternatively we'd need to also check what type of access key is supplied and thrown a meaningful exception based on that, to guide the developer.

@AndriySvyryd
Copy link
Member Author

Where and how is the configuration of the DbSet determining the order of the two values specified/implied?

@1iveowl The PK is configured in this way:

    modelBuilder.Entity<Car>().HasKey(c => new { c.State, c.LicensePlate, c.PartitionKey });

How will DbSet<>Find(???) look for this? Would it be like this: DbSet<>.Find(myPartitionKey);? If so, how would that work if there're multiple documents with the same entity type in Cosmos and with the same Partition Key? Should Find return a list of those documents in that case?

User always needs to specify all PK values in Find. But the PK doesn't need to be the Cosmos id. Like in the configuration above the user would call

    context.Set<Car>().Find("WA", "999ZZZ", "1234");

Currently this would translate into something like

SELECT * from c
WHERE c.State = "WA" AND c.Plate = "999ZZZ" AND c.PartitionKey = "1234"

But in most cases we know how to generate the id value, which in this case would be "WA|999ZZZ".
So ideally we would want to call container.ReadItemAsync<>("WA|999ZZZ", "1234") instead

For Find to be an effective and generic way to read one specific document - fast and with ReadItem and from a specific partition - I think that it is a prerequisite that the developer provide both the Primary Key (or the cosmos id) and the Partition Key, and in a structured way where it's clear which is what.

Yes, so the simple case is where the entity type is configured like this:

    modelBuilder.Entity<Car>().HasKey(c => new { c.id, c.PartitionKey });

So the id and partition key values are specified explicitly:

    context.Set<Car>().Find("WA|999ZZZ", "1234");

Could the API be something like this: DbSet<>.Find("myPartitionKey:myId")?

This doesn't follow current API conventions and would be confusing for existing EF users.

@1iveowl
Copy link
Contributor

1iveowl commented Mar 20, 2020

Thank you @AndriySvyryd. Good comments.

One more thing regarding:

modelBuilder.Entity<Car>().HasKey(c => new { c.State, c.LicensePlate, c.PartitionKey });

This helps us build the query, but how do we identify the partition key explicitly? The example tricks us, because it's so obvious that the third value is the partition key. However it might as well have looked like this:. modelBuilder.Entity<Car>().HasKey(c => new { c.ValueA, c.ValueB, c.ValueC }); and now we don't know which one is the partition key any longer.

I guess for the generic Find to work, we would need to require that the developer also specifies: modelBuilder.Entity<Person>().HasPartitionKey(o => o.ValueC), because then we would know for sure that ValueC is the partition key.

@AndriySvyryd
Copy link
Member Author

I guess for the generic Find to work, we would need to require that the developer also specifies: modelBuilder.Entity().HasPartitionKey(o => o.ValueC), because then we would know for sure that ValueC is the partition key.

Yes

@1iveowl
Copy link
Contributor

1iveowl commented Mar 20, 2020

Regarding:

    context.Set<Car>().Find("WA", "999ZZZ", "1234");

Currently this would translate into something like

SELECT * from c
WHERE c.State = "WA" AND c.Plate = "999ZZZ" AND c.PartitionKey = "1234"

But in most cases we know how to generate the id value, which in this case would be "WA|999ZZZ".
So ideally we would want to call container.ReadItemAsync<>("WA|999ZZZ", "1234") instead

In this example, the assumption is made, that the id equals WA|999ZZZ, because the EF.Core/Cosmos IdValueProvider has been used to generate the id.

However, in a scenario were the Cosmos documents were not created by EF.Core/Cosmosthe id might look like: Car-WA-999ZZZ or something entirely different. This raises a need for an additional feature for overriding the IdValueGenerator to adapt to the id generation scheme used.

@AndriySvyryd
Copy link
Member Author

However, in a scenario were the Cosmos documents were not created by EF.Core/Cosmosthe id might look like: Car-WA-999ZZZ or something entirely different. This raises a need for an additional feature for overriding the IdValueGenerator to adapt to the id generation scheme used.

It's up to the user to provide the correct value generator, we'd just use whatever is configured on id. And if it has no value generator, then we'll fall back to the slow query

@1iveowl
Copy link
Contributor

1iveowl commented Mar 20, 2020

It's up to the user to provide the correct value generator, we'd just use whatever is configured on id. And if it has no value generator, then we'll fall back to the slow query

Ahh yes, using PropertyBuilder.HasValueGenerator Method.

AndriySvyryd added a commit that referenced this issue Apr 20, 2020
Add event ids
Make GetProperty public
Fix bad merge

Part of #17310
AndriySvyryd added a commit that referenced this issue Apr 20, 2020
Add event ids
Make GetProperty public
Fix bad merge

Part of #17310
@bricelam bricelam modified the milestones: Backlog, 5.0.0 Aug 20, 2020
@ajcvickers
Copy link
Member

@bricelam Should this be in rc1? /cc @AndriySvyryd

@AndriySvyryd
Copy link
Member Author

It shipped in preview 4

@bricelam
Copy link
Contributor

(I was cleaning up some closed issues on the Backlog)

@bricelam bricelam modified the milestones: 5.0.0, 5.0.0-preview4 Aug 21, 2020
@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Oct 20, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0-preview4, 5.0.0 Nov 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-cosmos area-perf area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

No branches or pull requests

5 participants