Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For "Contains" queries of numeric values, just use the value in an "IN" statement instead of casting, converting, or using JSON. #33423

Closed
RobertPepkaSEL opened this issue Mar 28, 2024 · 4 comments

Comments

@RobertPepkaSEL
Copy link

Given a simple entity, such as:

class Todo { long Id { get; set; } }

With a SQL Server Table declaration like:

CREATE TABLE [Todos]
(
  [Id] BIGINT IDENTITY(1,1) NOT NULL CONSTRAINT [PK_Todos_Id] PRIMARY KEY,

and I want to find a set of these from a list:

var todoIds = new long[] { 1, 2, 3, 4, 5 };
context.Todos.Where(t => todoIds.Contains(t.Id)

Older Entity Framework Core generates a query along the lines of:

SELECT [t].[TodoId]
FROM [Todos] AS [t]
WHERE [t].[TodoId] IN (CAST(1 AS BIGINT), CAST(2 AS BIGINT), CAST(3 AS bigint), CAST(4 AS bigint), CAST(5 AS bigint));

And as of EFCore 8, it generates:

DECLARE @__ids_0 nvarchar(4000) = N'[1,2,3,4,5]';

SELECT [t].[Id]
FROM [Todos] AS [t]
WHERE [t].[Id] IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] bigint '$') AS [i]
)

Proposal:

If both the input type and the identifier types are numeric (int16, int32, int64), and the same type, just do an IN with the values directly. Don't convert, don't use OPENJSON conversion, just use the values. This will save time with conversion, string interpolation, and unnecessary inner queries. Ideally, EFCore would generate something like the following:

SELECT [t].[Id]
FROM [Todos] AS [t]
WHERE [t].[Id] IN (3, 4, 5, 6)
)

Alternatively, if a WHERE <> IN (SELECT ...) style solution is still desirable, use a table variable instead.

DECLARE @IdList TABLE (Id BIGINT);

-- Insert the desired values into the variable
INSERT INTO @IdList (Id)
VALUES (1), (2), (3), (4), (5);

-- Use the variable in the WHERE clause with the IN operator
SELECT [t].[Id]
FROM [Todos] AS [t]
WHERE [t].[Id] IN (SELECT Id FROM @IdList);
@roji
Copy link
Member

roji commented Mar 28, 2024

@RobertPepkaSEL you're discussing various issues above, which it's worth untangling.

  1. What makes you think the casts are in any way harmful and worth removing? They are simply a way for EF to strongly-type the constant values as bigint in SQL, since a number by default is int (try using int instead of long with EF 7 as your .NET ID type, and you'll see that there are no casts). It's true that these casts could be removed (since SQL Server would implicitly cast the int literals to bigint), but we have a bit of work before that can be done, and AFAIK there's no particular reason to do it (apart from SQL cleanliness)
  2. To understand why EF 8.0 uses OPENJSON instead of embedding simple number constants in the SQL, see IN() list queries are not parameterized, causing increased SQL Server CPU usage #13617. In a nutshell, embedding the constants means that there's a different SQL for each query, which defeats query plan caching and causes performance issues. This is completely unrelated to the casting discussed above.
  3. Regarding your suggesting to use a table variable (DECLARE @IdList TABLE ...), why do you think that's preferable to OPENJSON?

@RobertPepkaSEL
Copy link
Author

Thanks for responding and consideration!

What makes you think the casts are in any way harmful and worth removing? They are simply a way for EF to strongly-type the constant values as bigint in SQL, since a number by default is int (try using int instead of long with EF 7 as your .NET ID type, and you'll see that there are no casts). It's true that these casts could be removed (since SQL Server would implicitly cast the int literals to bigint), but we have a bit of work before that can be done, and AFAIK there's no particular reason to do it (apart from SQL cleanliness)

DBAs I work with help us analyze the SQL that entity framework generates, either during development, or after development when evaluating SQL Server's performance when there's a measured impact to a system. Output like this makes these much more challenging to read and understand. Obviously this is a simple example, but in the cases of very large lists, multiple lists, etc, the data casting overwhelms actual the query, cueing eye-rolling, etc.

To understand why EF 8.0 uses OPENJSON instead of embedding simple number constants in the SQL, see #13617. In a nutshell, embedding the constants means that there's a different SQL for each query, which defeats query plan caching and causes performance issues. This is completely unrelated to the casting discussed above.

That makes a ton of sense. Being able to have a cached query plan is really important for all kinds of queries. It's one of the chief complaints we get when evaluating the generated SQL queries, and a great reason to upgrade to EF 8.0 .

Regarding your suggesting to use a table variable (DECLARE @idlist TABLE ...), why do you think that's preferable to OPENJSON?

My initial ask was because that goes back to reviewing, analyzing, and approving the query. From a perspective of a DBA, the table form would look more idiomatic and understandable. It would avoid a step that requires OPENJSON, meaning it could continue to operate on older SQL Server versions like 2012. I haven't tested it, but I would guess that using a table and inserts may be faster than parsing JSON to get integers. Additionally, it may also meet the same kind of needs discussed in #13617 .

@roji
Copy link
Member

roji commented Mar 29, 2024

Output like this makes these much more challenging to read and understand.

OK. I think you're possibly overstating the complexity that CAST(8 AS bigint) adds to the SQL - it really is just a typed literal. But in any case, being able to eliminate the cast would require EF to be aware of which database types implicitly cast to which other types (e.g. int->bigint, but not bigint->int), which depends on #15586.

It would avoid a step that requires OPENJSON, meaning it could continue to operate on older SQL Server versions like 2012.

SQL Server 2012 went out of support in 2022; we also have an option to force SQL Server back to the older constant-based translation in case that's needed (see docs).

I haven't tested it, but I would guess that using a table and inserts may be faster than parsing JSON to get integers.

Guessing is always a bad idea when it comes to performance - my own testing showed the opposite, with OPENJSON being faster; declaring a temporary table (DECLARE @IdList TABLE (Id BIGINT)) has quite an overhead on SQL Server.

I'm going to go ahead and close this issue as everything seems to be working as designed, but please don't hesitate to post back if you have further questions or suggestions.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2024
@roji roji added the closed-no-further-action The issue is closed and no further action is planned. label Mar 29, 2024
@RobertPepkaSEL
Copy link
Author

Thank you for your feedback :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants