Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor: port some join tests to sqllogictests #5567

Merged
merged 5 commits into from
Mar 15, 2023
Merged

Conversation

ygf11
Copy link
Contributor

@ygf11 ygf11 commented Mar 13, 2023

Which issue does this PR close?

Part of #4495 and #4870

Rationale for this change

What changes are included in this PR?

This pr mainly ports tests whose context is join_context to join.slt.

Are these changes tested?

Yes.

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Mar 13, 2023
statement ok
set datafusion.optimizer.repartition_joins = false;

include ./join.slt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most join tests need run twice with different repartition_joins config(on/off).

This slt file is for tests when the config is off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL "include 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified the queries run as expected

     Running tests/sqllogictests/src/main.rs (/Users/alamb/Software/target-df2/debug/deps/sqllogictests-18a04f4d360aa062)
[join_disable_repartition_joins.slt] Running query: "set datafusion.optimizer.repartition_joins = false;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS students(name TEXT, mark INT) AS VALUES
('Stuart', 28),
('Amina', 89),
('Christen', 50),
('Salma', 77),
('Samantha', 21);"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS grades(grade INT, min INT, max INT) AS VALUES
(1, 0, 14),
(2, 15, 35),
(3, 36, 55),
(4, 56, 79),
(5, 80, 100);"
[join_disable_repartition_joins.slt] Running query: "SELECT s.*, g.grade FROM students s join grades g on s.mark between g.min and g.max WHERE grade > 2 ORDER BY s.mark DESC"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS students;"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS grades;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS test1(a int, b int) as select 1 as a, 2 as b;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS test2(a int, b int) as select 1 as a, 2 as b;"
[join_disable_repartition_joins.slt] Running query: "SELECT * FROM test2 FULL JOIN test1 ON true;"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS test1;"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS test2;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS t1(t1_id INT, t1_name TEXT, t1_int INT) AS VALUES
(11, 'a', 1),
(22, 'b', 2),
(33, 'c', 3),
(44, 'd', 4);"
...

@ygf11 ygf11 marked this pull request as ready for review March 14, 2023 09:44
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @ygf11

cc @jackwener @liukun4515 @mingmwang

statement ok
set datafusion.optimizer.repartition_joins = false;

include ./join.slt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL "include 👍

statement ok
set datafusion.optimizer.repartition_joins = false;

include ./join.slt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified the queries run as expected

     Running tests/sqllogictests/src/main.rs (/Users/alamb/Software/target-df2/debug/deps/sqllogictests-18a04f4d360aa062)
[join_disable_repartition_joins.slt] Running query: "set datafusion.optimizer.repartition_joins = false;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS students(name TEXT, mark INT) AS VALUES
('Stuart', 28),
('Amina', 89),
('Christen', 50),
('Salma', 77),
('Samantha', 21);"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS grades(grade INT, min INT, max INT) AS VALUES
(1, 0, 14),
(2, 15, 35),
(3, 36, 55),
(4, 56, 79),
(5, 80, 100);"
[join_disable_repartition_joins.slt] Running query: "SELECT s.*, g.grade FROM students s join grades g on s.mark between g.min and g.max WHERE grade > 2 ORDER BY s.mark DESC"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS students;"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS grades;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS test1(a int, b int) as select 1 as a, 2 as b;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS test2(a int, b int) as select 1 as a, 2 as b;"
[join_disable_repartition_joins.slt] Running query: "SELECT * FROM test2 FULL JOIN test1 ON true;"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS test1;"
[join_disable_repartition_joins.slt] Running query: "drop table IF EXISTS test2;"
[join_disable_repartition_joins.slt] Running query: "CREATE TABLE IF NOT EXISTS t1(t1_id INT, t1_name TEXT, t1_int INT) AS VALUES
(11, 'a', 1),
(22, 'b', 2),
(33, 'c', 3),
(44, 'd', 4);"
...


# turn on repartition_joins
statement ok
set datafusion.optimizer.repartition_joins = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. The only thing a bit concerning me, is

  • sqllogictest we output the sql, which makes the output noisy and probably not very useful, I was thinking to create a ticket to ouputit file name and indented test name. the similar way as cargo test
  • do you guys think we should create a .slt file per use case like disabled repartition join? or reuse join.slt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqllogictest we output the sql, which makes the output noisy and probably not very useful, I was thinking to create a ticket to ouputit file name and indented test name. the similar way as cargo test

I think this is a good idea

do you guys think we should create a .slt file per use case like disabled repartition join? or reuse join.slt?

I like the idea of reusing join.slt but it will likely make some tests in the future harder if they vary based on the setting (e.g. plan tests)

Copy link
Contributor Author

@ygf11 ygf11 Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you guys think we should create a .slt file per use case like disabled repartition join? or reuse join.slt?

I think it's worth creating this file now. I first tried adding everything to join.slt and found a lot of duplicated code, since most tests need to be run twice.

# query x
statement ok
set datafusion.optimizer.repartition_joins = false;

query ITI rowsort
SELECT xxx
----
a a a

# switch to another config
statement ok
set datafusion.optimizer.repartition_joins = true;

query ITI rowsort
SELECT xxx
----
a a a

Using include reduces a lot of codes, originally the addition was 1100+, now it is 655.

For the tests whose setting is complex, we can also add them to join.slt instead of adding a new slt file.

@alamb
Copy link
Contributor

alamb commented Mar 15, 2023

Thanks @ygf11 !

@alamb alamb merged commit 37138a5 into apache:main Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants