Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: struct field don't push down to TableScan #8774

Merged
merged 6 commits into from
Jan 8, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 173 additions & 3 deletions datafusion/optimizer/src/optimize_projections.rs
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,6 @@ fn outer_columns_helper(expr: &Expr, columns: &mut HashSet<Column>) -> bool {
let exprs = insubquery.subquery.outer_ref_columns.iter();
outer_columns_helper_multi(exprs, columns)
}
Expr::IsNotNull(expr) | Expr::IsNull(expr) => outer_columns_helper(expr, columns),
Expr::Cast(cast) => outer_columns_helper(&cast.expr, columns),
Expr::Sort(sort) => outer_columns_helper(&sort.expr, columns),
Expr::AggregateFunction(aggregate_fn) => {
Expand Down Expand Up @@ -677,6 +676,22 @@ fn outer_columns_helper(expr: &Expr, columns: &mut HashSet<Column>) -> bool {
.as_ref()
.map_or(true, |expr| outer_columns_helper(expr, columns))
}
Expr::SimilarTo(similar_to) => {
outer_columns_helper(&similar_to.expr, columns)
&& outer_columns_helper(&similar_to.pattern, columns)
}
Expr::TryCast(try_cast) => outer_columns_helper(&try_cast.expr, columns),
Expr::GetIndexedField(index) => outer_columns_helper(&index.expr, columns),
Expr::Not(expr)
| Expr::IsNotFalse(expr)
| Expr::IsFalse(expr)
| Expr::IsTrue(expr)
| Expr::IsNotTrue(expr)
| Expr::IsUnknown(expr)
| Expr::IsNotUnknown(expr)
| Expr::IsNotNull(expr)
| Expr::IsNull(expr)
| Expr::Negative(expr) => outer_columns_helper(expr, columns),
Expr::Column(_) | Expr::Literal(_) | Expr::Wildcard { .. } => true,
_ => false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about removing this catchall (and and thus explicitly enumerating all Expr types so this type of bug can't happen again)?

Suggested change
_ => false,

An alternate idea would be to remove this explicit walk down the tree entirely and use https://docs.rs/datafusion/latest/datafusion/logical_expr/utils/fn.expr_to_columns.html instead?

This pushdown code may predate the more generaic tree walks

Copy link
Contributor Author

@haohuaijin haohuaijin Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Already removed the catchall.

An alternate idea would be to remove this explicit walk down the tree entirely and use https://docs.rs/datafusion/latest/datafusion/logical_expr/utils/fn.expr_to_columns.html instead?

the outer_columns_helper function is used to collect all outer_ref_column that expr_to_columns function doesn't return these columns. Because expr_to_columns use TreeNode to walks the expr tree, but TreeNode don't walk the outer_ref_column
https://github.com/apache/arrow-datafusion/blob/dd4263f843e093c807d63edf73a571b1ba2669b5/datafusion/expr/src/tree_node/expr.rs#L69-L77

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the outer_columns_helper function is used to collect all outer_ref_column that expr_to_columns function doesn't return these columns. Because expr_to_columns use TreeNode to walks the expr tree, but TreeNode don't walk the outer_ref_column

I see -- thank you for the explanation, which makes sense. It fiddled a little with the code, and I think I have a cleanup here that uses the TreeNode walking to avoid having to enumerate Expr variants #8787

}
Expand Down Expand Up @@ -978,8 +993,8 @@ mod tests {
use arrow::datatypes::{DataType, Field, Schema};
use datafusion_common::{Result, TableReference};
use datafusion_expr::{
binary_expr, col, count, lit, logical_plan::builder::LogicalPlanBuilder,
table_scan, Expr, LogicalPlan, Operator,
binary_expr, col, count, lit, logical_plan::builder::LogicalPlanBuilder, not,
table_scan, try_cast, Expr, LogicalPlan, Operator,
};

fn assert_optimized_plan_equal(plan: &LogicalPlan, expected: &str) -> Result<()> {
Expand Down Expand Up @@ -1060,4 +1075,159 @@ mod tests {
\n TableScan: ?table? projection=[]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_struct_field_push_down() -> Result<()> {
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int64, false),
Field::new_struct(
"s",
vec![
Field::new("x", DataType::Int64, false),
Field::new("y", DataType::Int64, false),
],
false,
),
]));

let table_scan = table_scan(TableReference::none(), &schema, None)?.build()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("s").field("x")])?
.build()?;
let expected = "Projection: (?table?.s)[x]\
\n TableScan: ?table? projection=[s]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_neg_push_down() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![-col("a")])?
.build()?;

let expected = "Projection: (- test.a)\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_null() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_null()])?
.build()?;

let expected = "Projection: test.a IS NULL\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_not_null() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_not_null()])?
.build()?;

let expected = "Projection: test.a IS NOT NULL\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_true() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_true()])?
.build()?;

let expected = "Projection: test.a IS TRUE\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_not_true() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_not_true()])?
.build()?;

let expected = "Projection: test.a IS NOT TRUE\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_false() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_false()])?
.build()?;

let expected = "Projection: test.a IS FALSE\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_not_false() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_not_false()])?
.build()?;

let expected = "Projection: test.a IS NOT FALSE\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_unknown() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_unknown()])?
.build()?;

let expected = "Projection: test.a IS UNKNOWN\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_is_not_unknown() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![col("a").is_not_unknown()])?
.build()?;

let expected = "Projection: test.a IS NOT UNKNOWN\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_not() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![not(col("a"))])?
.build()?;

let expected = "Projection: NOT test.a\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}

#[test]
fn test_try_cast() -> Result<()> {
let table_scan = test_table_scan()?;
let plan = LogicalPlanBuilder::from(table_scan)
.project(vec![try_cast(col("a"), DataType::Float64)])?
.build()?;

let expected = "Projection: TRY_CAST(test.a AS Float64)\
\n TableScan: test projection=[a]";
assert_optimized_plan_equal(&plan, expected)
}
}
2 changes: 1 addition & 1 deletion datafusion/sqllogictest/test_files/dictionary.slt
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ select count(*) from m1 where tag_id = '1000' and time < '2024-01-03T14:46:35+01
10

query RRR
select min(f5), max(f5), avg(f5) from m2 where tag_id = '1000' and time < '2024-01-03T14:46:35+01:00' group by type;
select min(f5), max(f5), avg(f5) from m2 where tag_id = '1000' and time < '2024-01-03T14:46:35+01:00' group by type order by min(f5);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make result stable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different fix is here: #8769

----
100 600 350
700 1000 850
Expand Down
Loading