Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The behavior of Distinct in SqlServer and Cosmos is inconsistent #26547

Open
Varorbc opened this issue Nov 5, 2021 · 8 comments
Open

The behavior of Distinct in SqlServer and Cosmos is inconsistent #26547

Varorbc opened this issue Nov 5, 2021 · 8 comments
Assignees
Labels
area-cosmos area-query customer-reported needs-design punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-bug
Milestone

Comments

@Varorbc
Copy link
Contributor

Varorbc commented Nov 5, 2021

https://docs.microsoft.com/en-us/dotnet/api/system.linq.queryable.distinct?view=net-6.0
https://docs.microsoft.com/en-us/ef/core/what-is-new/ef-core-6.0/whatsnew#distinct-queries

Include your code

[Table("Table_1")]
public partial class Table1
{
    [Key]
    public int Id { get; set; }

    public DateTime CreateTime { get; set; }
}

public partial class TestContext : DbContext
{
    public TestContext()
    {
    }

    public TestContext(DbContextOptions<TestContext> options)
        : base(options)
    {
    }

    public virtual DbSet<Table1> Table1 { get; set; }
}

var services = new ServiceCollection();
services.AddDbContext<TestContext>();
var serviceProvider = services.BuildServiceProvider();

var testContext = serviceProvider.GetRequiredService<TestContext>();
var sql = testContext.Table1.OrderBy(a => a.Id).Distinct().ToQueryString();
Console.WriteLine(sql);

Sql Server Behavior

SELECT DISTINCT [t].[Id], [t].[CreateTime]
FROM [Table_1] AS [t]

Cosmos Behavior

SELECT DISTINCT c
FROM root c
WHERE (c["Discriminator"] = "Table1")
ORDER BY c["Id"]

Include provider and version information

EF Core version:6.0.0-rc.2.21480.5
Database provider: Microsoft.EntityFrameworkCore.SqlServer/Microsoft.EntityFrameworkCore.Cosmos
Target framework: .NET 6.0
Operating system:Win11
IDE: Visual Studio 2022 17.0

@roji
Copy link
Member

roji commented Nov 5, 2021

@Varorbc providers aren't necessarily meant to behave in the same way, around this question or around others - it's perfectly OK to have different behaviors across providers when the database itself warrants that.

As discussed in #25696, the LINQ documentation for the Distinct operator clearly specifies that "the result sequence is unordered". And as discussed in that issue, one main issue in relational databases is that Distinct frequently causes queries to be pushed down into subqueries, but in SQL, subqueries return unordered resultsets; so if we can't guarantee ordering after Distinct in some/most cases, we generally shouldn't in any case (otherwise behavior is inconsistent depending on whether there's a SQL subquery or not).

I don't know if the same is true for Cosmos or not - it's a different database with different behavior; for one thing, uncorrelated subqueries aren't supported at all. So we'd have to investigate the Cosmos behavior more thoroughly to see what makes sense there.

Having said all that, I do agree this is something we should look into.

@John0King
Copy link

John0King commented Nov 5, 2021

@roji

And as discussed in that issue, one main issue in relational databases is that Distinct frequently causes queries to be pushed down into subqueries, but in SQL, subqueries return unordered resultsets;

can you explain more or a sample ?

and follow sql work perfectly for sqlserver

  SELECT DISTINCT <tb_filed> FROM <table>  ORDER BY <tb_filed> DESC

@roji
Copy link
Member

roji commented Nov 5, 2021

@John0King see #25696, this is discussed there (e.g. #25696 (comment))

@roji
Copy link
Member

roji commented Nov 5, 2021

SELECT DISTINCT <tb_filed> FROM <table> ORDER BY <tb_filed> DESC

@John0King note that I'm discussing subqueries

@John0King
Copy link

note that I'm discussing subqueries

😄 I am looking for a sample that do not work with discussing and subquerys, so I can understand you more clearly.

so far , it seems orderby first and then distinct is acceptable , and it even reasonable for C#

> var hashSet = new HashSet<int>();
. for (var i = 0; i < 10; i++)
. {
.     hashSet.Add(i);
. }
. 
. hashSet.Remove(2);
. hashSet.Add(1000);
. 
. hashSet.OrderBy(x=>x).Distinct().ToArray()
// int[10] { 0, 1, 3, 4, 5, 6, 7, 8, 9, 1000 }

> hashSet.OrderByDescending(x=>x).Distinct().ToArray()
// int[10] { 1000, 9, 8, 7, 6, 5, 4, 3, 1, 0 }

@roji
Copy link
Member

roji commented Nov 5, 2021

@John0King please fully read #25696, this has already been discussed there. Even if in LINQ to Objects Distinct "happens" to return ordered items (at least in some cases), the documentation clearly specifies that this isn't in the contract, so it could change.

@AndriySvyryd
Copy link
Member

AndriySvyryd commented Nov 5, 2021

Related info: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/sql-query-keywords#distinct

The below queries are unsupported:
ORDER BY clause in the outer query
	SELECT VALUE COUNT(1) FROM (SELECT DISTINCT VALUE c.lastName FROM c) AS lastName ORDER BY lastName

@roji
Copy link
Member

roji commented Nov 5, 2021

Note to self: check specifically the behavior of ordering in the inner query (which contains distinct) as well as the outer query.

@ajcvickers ajcvickers added this to the 7.0.0 milestone Nov 5, 2021
@ajcvickers ajcvickers modified the milestones: 7.0.0, Backlog Apr 23, 2022
@ajcvickers ajcvickers added punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. and removed propose-punt labels Apr 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-cosmos area-query customer-reported needs-design punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-bug
Projects
None yet
Development

No branches or pull requests

5 participants