Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Update docstring on unwrap_cast_in_comparison #4145

Merged
merged 2 commits into from
Nov 9, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 31 additions & 14 deletions datafusion/optimizer/src/unwrap_cast_in_comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,25 +32,42 @@ use datafusion_expr::{
};
use std::sync::Arc;

/// The rule can be used to the numeric binary comparison with literal expr, like below pattern:
/// `cast(left_expr as data_type) comparison_op literal_expr` or `literal_expr comparison_op cast(right_expr as data_type)`.
/// The data type of two sides must be equal, and must be signed numeric type now, and will support more data type later.
/// [`UnwrapCastInComparison`] attempts to remove casts from
/// comparisons to literals ([`ScalarValue`]s) by applying the casts
/// to the literals if possible. It is inspired by the optimizer rule
/// `UnwrapCastInBinaryComparison` of Spark.
///
/// If the binary comparison expr match above rules, the optimizer will check if the value of `literal`
/// is in within range(min,max) which is the range(min,max) of the data type for `left_expr` or `right_expr`.
/// Removing casts often improves performance because:
/// 1. The cast is done once (to the literal) rather than to every value
/// 2. Can enable other optimizations such as predicate pushdown that
/// don't support casting
///
/// If this is true, the literal expr will be casted to the data type of expr on the other side, and the result of
/// binary comparison will be `left_expr comparison_op cast(literal_expr, left_data_type)` or
/// `cast(literal_expr, right_data_type) comparison_op right_expr`. For better optimization,
/// the expr of `cast(literal_expr, target_type)` will be precomputed and converted to the new expr `new_literal_expr`
/// which data type is `target_type`.
/// If this false, do nothing.
/// The rule is applied to expressions of the following forms:
///
/// 1. `cast(left_expr as data_type) comparison_op literal_expr`
/// 2. `literal_expr comparison_op cast(left_expr as data_type)`
/// 3. `cast(literal_expr) IN (expr1, expr2, ...)`
/// 4. `literal_expr IN (cast(expr1) , cast(expr2), ...)`
///
/// If the expression matches one of the forms above, the rule will
/// ensure the value of `literal` is in within range(min, max) of the
/// expr's data_type, and if the scalar is within range, the literal
/// will be casted to the data type of expr on the other side, and the
/// cast will be removed from the other side.
///
/// This is inspired by the optimizer rule `UnwrapCastInBinaryComparison` of Spark.
/// # Example
///
/// `Filter: cast(c1 as INT64) > INT64(10)` will be optimized to `Filter: c1 > CAST(INT64(10) AS INT32),
/// and continue to be converted to `Filter: c1 > INT32(10)`, if the DataType of c1 is INT32.
/// If the DataType of c1 is INT32. Given the filter
///
/// ```text
/// Filter: cast(c1 as INT64) > INT64(10)`
/// ```
///
/// This rule will remove the cast and rewrite the expression to:
///
/// ```text
/// Filter: c1 > INT32(10)
/// ```
///
#[derive(Default)]
pub struct UnwrapCastInComparison {}
Expand Down