Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port sql tests written in Rust to sqllogictests #4495

Closed
Tracked by #4460
mvanschellebeeck opened this issue Dec 4, 2022 · 8 comments
Closed
Tracked by #4460

Port sql tests written in Rust to sqllogictests #4495

mvanschellebeeck opened this issue Dec 4, 2022 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed sqllogictest SQL Logic Tests (.slt)

Comments

@mvanschellebeeck
Copy link
Contributor

Describe the solution you'd like
Port tests from datafusion/core/tests/sql to sqllogictests

@mvanschellebeeck mvanschellebeeck added the enhancement New feature or request label Dec 4, 2022
@mvanschellebeeck
Copy link
Contributor Author

@xudong963, @alamb - as a follow up should we also remove the ported test from the .rs files?

@xudong963
Copy link
Member

@xudong963, @alamb - as a follow up should we also remove the ported test from the .rs files?

Yes, I think it's nice to remove!

@xudong963 xudong963 added the sqllogictest SQL Logic Tests (.slt) label Dec 4, 2022
@alamb
Copy link
Contributor

alamb commented Dec 4, 2022

Yes, I think it's nice to remove!

I agree. Thank you

I also suspect that as we try to port more we will find other gaps (e.g. the lack of support for INSERT and normalization), and I hope to use those gaps as a way to drive adding new features to the sqllogic framework (either in DataFusion or contributing back upstream)

@alamb
Copy link
Contributor

alamb commented Dec 4, 2022

BTW I plan to port more of the tests in information_schema to the sqllogic framework. One thing I will need to do is to support "expected errors"

@melgenek
Copy link
Contributor

melgenek commented Jan 28, 2023

I migrated decimal.rs #5086. I took the tests as is, and transformed them into .slt. The problem is that ordering in these tests is not defined. For example, this test doesn't have an order by clause.

query RRI?R
select * from decimal_simple where c1 > c5;
----
0.00002 0.000000000002 3 false 0.000019
0.00003 0.000000000003 5 true 0.000011
0.00005 0.000000000005 8 false 0.000033

It seems that the fact that this test passes now in ci, and had passed before when it was in the decimal.rs, is that the Datafusion implementation hasn't yet changed significantly enough to cause the order of the results to change.

@jackwener wasn't this lucky with his union tests, and they eventually failed in the master branch #5095.

I'd like to introduce some determinism to the decimal tests, and probably some other tests that don't have explicit ordering. My question is what is the best way to do this?

I see that Datafusion uses both rowsort and order by. DuckDB states that it prefers an explicit order by. But, for example, CocroachDB and Risingwave use rowsort quite extensively.

Are there any guidelines for using or not using rowsort and order by in Datafusion?

@xudong963
Copy link
Member

Are there any guidelines for using or not using rowsort and order by in Datafusion?

I think both are ok, you can even use control sortmode rowsort for the whole file, which is easier.

@alamb
Copy link
Contributor

alamb commented Jan 29, 2023

Are there any guidelines for using or not using rowsort and order by in Datafusion?

@melgenek -- thank you raising this question; The use of rowsort vs order by I think depends on what the test is testing. In my view, if the test might be changed due to the fact there is a sort / order by on it (due to a different codepath for example) then rowsort should be used. If the ordering doesn't change the meaning of the test then order by is fine

I think in most cases, adding order by is probably just fine, but to maintain the semantics of existing tests I would probably default to using rowsort initially rather than changing the queries in the test

@alamb
Copy link
Contributor

alamb commented May 2, 2023

since this has had so many PRs attached to it, I am opening #6195 to track the final push. Thank you again to everyone who helped with this ticket

@alamb alamb closed this as completed May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

No branches or pull requests

4 participants