Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad LINQ to SQL translation for filter on non-nullable boolean property #682

Closed
thomaslevesque opened this issue Aug 14, 2019 · 10 comments
Closed
Labels
discussion-wanted Need a discussion on an area LINQ

Comments

@thomaslevesque
Copy link
Contributor

(already reported in v2, problem is still here in v3)

Describe the bug

LINQ queries with a condition on a non-nullable boolean property are not correctly translated to SQL. The translation doesn't account for the fact that the property could be undefined in the JSON document.

To Reproduce

Assuming an object model like this:

public class Item
{
    [JsonProperty("id")]
    public string Id { get; set; }

    public bool IsDeleted { get; set; }
}

Make the following query:

var query = container.GetItemLinqQueryable<Item>().Where(i => !i.IsDeleted);

Expected behavior

Documents where IsDeleted is not defined should be returned, because in the C# model the property is not nullable, so its absence really means that it's false.

The generated SQL should be something like this:

SELECT * FROM root WHERE (NOT IS_DEFINED(root["IsDeleted"]) OR NOT root["IsDeleted"]) 

Or maybe

SELECT * FROM root WHERE (NOT (root["IsDeleted"] ?? false))

Actual behavior

Documents where IsDeleted is not defined are not returned, because NOT (undefined) doesn't evaluate to true.

The generated SQL looks like this:

SELECT * FROM root WHERE (NOT root["IsDeleted"])

Environment summary
SDK Version: .NET Core 2.1, Microsoft.Azure.Cosmos 3.1.1
OS Version (e.g. Windows, Linux, MacOSX): Windows 10

@j82w
Copy link
Contributor

j82w commented Aug 14, 2019

I would argue this behaving correctly. If you look at the ECMA standards undefined is not equal to false. Here is a related stackoverflow question.

c# the closes thing to undefined is null. A nullable bool that is null is not equal to false.

bool? test = null;
if(test == false){
  Console.Writeline("This will not be hit.");
}else{
 Console.Writeline("null is not equal to false");
}

What you are looking for is here.

var query = container.GetItemLinqQueryable<Item>().Where(i => !i.IsDeleted || !i.IsDefined());

@j82w j82w closed this as completed Aug 14, 2019
@thomaslevesque
Copy link
Contributor Author

Hi @j82w,

I agree that undefined isn't equal to false, however in this scenario, the query is driven by the C# model, where the IsDeleted property is not nullable. On deserialization, the property will be set to false. So for all intents and purposes, the property being undefined in the JSON document is equivalent to it being false in the C# model.

I'm aware of the IsDefined() trick, but I shouldn't have to do that in this situation. Having to use IsDefined() means that the abstraction (querying a C# object model) is leaking the fact that the property could in fact be undefined in the JSON document.

@j82w
Copy link
Contributor

j82w commented Aug 14, 2019

Hi @thomaslevesque ,

That's a good point. @kirankumarkolli what is your opinion?

@j82w j82w reopened this Aug 14, 2019
@j82w j82w added discussion-wanted Need a discussion on an area LINQ labels Aug 14, 2019
@joshlang
Copy link
Contributor

joshlang commented Aug 18, 2019

Hi @j82w,
I agree that undefined isn't equal to false, however in this scenario, the query is driven by the C# model, where the IsDeleted property is not nullable. On deserialization, the property will be set to false. So for all intents and purposes, the property being undefined in the JSON document is equivalent to it being false in the C# model.
I'm aware of the IsDefined() trick, but I shouldn't have to do that in this situation. Having to use IsDefined() means that the abstraction (querying a C# object model) is leaking the fact that the property could in fact be undefined in the JSON document.

I agree. We're running into this in our design too. We abstract out the Cosmos stuff but still need to educate devs and remind ourselves that "It might look like C#, but it's actually JSON so use .IsDefined()".

But I'm opinionated just because of the path we chose.

So even though I agree, I kinda would be worried about special-casing this. Also because of the path we chose (Type discrimination and exposing IQueryable) :)

Because what if there's two types of C# documents you're storing? One has your IsDeleted field, and the other doesn't.

container
    .GetItemLinqQueryable<IDictionary<string, object>>()
    .Where(x=>!(bool)x["IsDeleted"])
    .ToQueryDefinition()
    .QueryText;

>> SELECT VALUE root FROM root WHERE (NOT root["IsDeleted"]) 

If I'm using a Type field as a discriminator to know what to deserialize, I might be intending to filter out anything that doesn't have that field.

Further, what if it's an int instead? The default value in C# is zero. But what if I want to filter for when there's actually a zero in the document?

@kirankumarkolli
Copy link
Member

I agree that undefined isn't equal to false, however in this scenario, the query is driven by the C# model, where the IsDeleted property is not nullable. On deserialization, the property will be set to false. So for all intents and purposes, the property being undefined in the JSON document is equivalent to it being false in the C# model.

Isn't default-value after desalinization is also implementation dependent (ex: default value, default constructor).

@khdang
Copy link
Member

khdang commented Aug 20, 2019

The data and the data model don't match in this case (if the C# model is used to create data then we should only have boolean values for the property). It seems there're multiple versions of data (before and after IsDeleted is introduced). That means the scenario to filter on only the data set which has IsDeleted set to false is valid.

Translating !IsDeleted to both false equality and undefined assumes these two are equivalent (or implementation depended as Kiran mentioned above).

Consider the scenario where we have a property called IsNotDeleted, which has an opposite value of the IsDeleted property. Should we include undefined when filtering on IsNotDeleted = true? it depends on the scenario's definition of the undefined. So undefined can't seem to be implied with a particular value of the boolean property.

@khdang
Copy link
Member

khdang commented Aug 20, 2019

For the case where data model changes like in this issue, it seems there're a few options:

  • option 1: use with a model that can include changes (if we can foresee)
  • option 2: update all the data to match with the new model value
  • option 3: adjust the query to account for previous and current data model

@thomaslevesque
Copy link
Contributor Author

Because what if there's two types of C# documents you're storing? One has your IsDeleted field, and the other doesn't.

It depends on which type you're querying with. If the type has a non-nullable IsDeleted field, and the query includes something like !x.IsDeleted, translate to NOT (root["IsDeleted"] ?? false). If the type is a dictionary, as in your example, don't make any assumption and translate the query to NOT root["IsDeleted"].

If I'm using a Type field as a discriminator to know what to deserialize, I might be intending to filter out anything that doesn't have that field.

But in this case, shouldn't you filter on the Type field? Relying on the presence or absence of a field in the document seems brittle.

Further, what if it's an int instead? The default value in C# is zero. But what if I want to filter for when there's actually a zero in the document?

I don't think that's a very common scenario. If the property is not nullable in the C# model, you don't expect the field to be missing in the document ; and if it is missing, the value would be set to 0 in the C# object on deserialization anyway. If you really need to filter documents that do have an explicit value, you can always do it in SQL.

Isn't default-value after desalinization is also implementation dependent (ex: default value, default constructor).

Yes, but it doesn't really matter, I think. What matters here is the semantics of the model type you're using in the query. If a property is not nullable in the C# model, you shouldn't have to worry about whether it's present in the document or not.

The data and the data model don't match in this case (if the C# model is used to create data then we should only have boolean values for the property). It seems there're multiple versions of data (before and after IsDeleted is introduced). That means the scenario to filter on only the data set which has IsDeleted set to false is valid.

Indeed, it's a case where the IsDeleted property was introduced later.

Translating !IsDeleted to both false equality and undefined assumes these two are equivalent (or implementation depended as Kiran mentioned above).

Consider the scenario where we have a property called IsNotDeleted, which has an opposite value of the IsDeleted property. Should we include undefined when filtering on IsNotDeleted = true? it depends on the scenario's definition of the undefined. So undefined can't seem to be implied with a particular value of the boolean property.

That's a fair point. I didn't think about that. But maybe it could be handled with a [DefaultValue] attribute? If the property is not nullable, assume the default value (possibly specified with the [DefaultValue] attribute) when the property is missing in the document?

I realize it all seems unnecessarily complex, but the scenario I described is probably quite common, and having to specify || !x.IsDeleted.IsDefined() in the query isn't intuitive at all and is likely to cause a lot of bugs.

@khdang
Copy link
Member

khdang commented May 18, 2020

So in this case because the data and the model doesn't match, the specific behavior has to be specified. If we opt for a specific default behavior, it could lead to a different kind of confusion as in my previous comments.

Currently, we can set this behavior by adding the condition || !x.IsDeleted.IsDefined() as you mentioned. I think [DefaultValue] is an option we can consider. However, from a user point of view, he would also need to be aware of the need to specify the null-value behavior, in which case I would argue that it's comparably intuitive to add the not IsDefined(..) check.

We've taken note of this and will consider it for future change. I'm closing this issue for now.

@azzimuth
Copy link

I have a related problem but this time with a nullable field. Hope it's ok to share a link to SO: https://stackoverflow.com/questions/70062891/cosmos-db-iqueryable-on-nullable-fields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-wanted Need a discussion on an area LINQ
Projects
None yet
Development

No branches or pull requests

6 participants