Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR]: Extract aggregate topk function to aggregate_topk.slt #8948

Merged
merged 4 commits into from
Jan 23, 2024
Merged

[MINOR]: Extract aggregate topk function to aggregate_topk.slt #8948

merged 4 commits into from
Jan 23, 2024

Conversation

mustafasrepo
Copy link
Contributor

@mustafasrepo mustafasrepo commented Jan 22, 2024

Update: this PR just moves a intermittently failing test into its own file. It still fails on different platforms

Which issue does this PR close?

Closes #.

Rationale for this change

Sometimes in my locale, the tests in this PR fails. However, in CI it seems that there is no problem. However, with current configuration, there are more than 1 possible valid result for these queries.

With this PR I changed these tests such that there is only one possible answer.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 22, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU -- I hit the same thing as well

@alamb alamb changed the title [MINOR]: (Test change) Add new orders to make result unique [MINOR]: (Test change) Add new orders to make result unique in aggregate.slt Jan 22, 2024
@alamb
Copy link
Contributor

alamb commented Jan 22, 2024

cc @avantgardnerio

@avantgardnerio
Copy link
Contributor

@avantgardnerio
Copy link
Contributor

That being said, I think this code could / should be expanded to include multiple order bys.

@avantgardnerio
Copy link
Contributor

IIRC, it conceptually supports anything that implements Ord, so we can implement support by adding support for the OwnedRow type.

@mustafasrepo
Copy link
Contributor Author

I think this skips the rule it is intended to test if there are multiple order bys:

https://github.com/apache/arrow-datafusion/blob/795c71f3e8249847593a58dafe578ffcbcd3e012/datafusion/core/src/physical_optimizer/topk_aggregation.rs#L90

What if we change top order by to ORDER BY trace_id. This would also produce single correct result.
What I mean is that changing test from

select trace_id, other, MIN(timestamp) from traces group by trace_id, other order by MIN(timestamp) asc limit 4;

to

select trace_id, other, MIN(timestamp) from traces group by trace_id, other order by trace_id asc limit 4;

instead of

select trace_id, other, MIN(timestamp) from traces group by trace_id, other order by MIN(timestamp) asc, trace_id DESC limit 4;

Will it test desired property?

@alamb
Copy link
Contributor

alamb commented Jan 23, 2024

Will it test desired property?

I don't think so -- this test is pretty specific to ordering by MIN(..) as it invokes a special code path. However, given this is not reflected in the tests I think that is a gap.

Here is my suggestion:

  1. Move these tests into a new .slt file (aggregtes_topk.slt maybe?) in one PR
  2. Figure out how to improve the tests to be both determisitic as a second PR (I think we will have to adjust the data and/or the N in the limit), also adding an explain test to ensure the desired property is covered

My rationale for moving to a new file is that then this failure won't mask other failures in the (very large) aggregates.slt file and it will help to make that file more manageable

@mustafasrepo
Copy link
Contributor Author

Here is my suggestion:

  1. Move these tests into a new .slt file (aggregtes_topk.slt maybe?) in one PR
  2. Figure out how to improve the tests to be both determisitic as a second PR (I think we will have to adjust the data and/or the N in the limit), also adding an explain test to ensure the desired property is covered

My rationale for moving to a new file is that then this failure won't mask other failures in the (very large) aggregates.slt file and it will help to make that file more manageable

I moved the topk aggregate tests to their own file as your suggestion in 1 (Please note that I retracted the changes in the tests, this PR is just a restructure PR now.).

@alamb alamb changed the title [MINOR]: (Test change) Add new orders to make result unique in aggregate.slt [MINOR]: Extract aggregate topk function to aggregate_topk.slt Jan 23, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mustafasrepo

I'll merge this PR and we can keep working to make the test deterministic as a follow on PR

@alamb alamb merged commit 04e147b into apache:main Jan 23, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants