Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enhance case expression type coercion #5820

Merged
merged 6 commits into from
Apr 3, 2023

Conversation

Jefffrey
Copy link
Contributor

@Jefffrey Jefffrey commented Apr 1, 2023

Which issue does this PR close?

Closes #5538

Closes #5828

Rationale for this change

Given following:

SELECT
  CASE a
    WHEN b THEN 0
    WHEN c THEN 1
    ELSE 2
  END;

Should attempt to coerce a, b and c to common type.

And for:

SELECT
  CASE
    WHEN a THEN 0
    WHEN b THEN 1
    ELSE 2
  END;

Should coerce a and b to boolean type.

What changes are included in this PR?

Ensure case and when expressions are coerced to common type, similar to how currently then and else expressions are coerced to common type.

Also ensure when expressions for when case expression isn't provided to coerce to boolean type.

Are these changes tested?

Added to sqllogictest

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Apr 1, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jefffrey -- the code looks good to me

Prior to merge I think some additional tests are needed in this PR.

I also left some small code style suggestions, but I don't think they are required.

// ELSE b3
// END
//
// Then all aN (a1, a2, a3) must be converted to a common data type in the first example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines 357 to 361
// prepare types
let case_type = match &case.expr {
None => Ok(None),
Some(expr) => expr.get_type(&self.schema).map(Some),
}?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this more concisely / functionally using transpose like this if you want

Suggested change
// prepare types
let case_type = match &case.expr {
None => Ok(None),
Some(expr) => expr.get_type(&self.schema).map(Some),
}?;
// prepare types
let case_type = case.expr.as_ref()
.map(|expr| expr.get_type(&self.schema))
.transpose()?;

The same basic transformation applies to the other match statements below -- I don't think it is a big deal I just figured I would point it out to you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks that does look cleaner, will apply it

Comment on lines 218 to 222
# select case when type coercion
query I
select CASE 10.5 WHEN 0 THEN 1 ELSE 10 END as col;
----
10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some additional tests?

  1. Alternate Syntax:
select CASE 
  WHEN 10 = 5 THEN 1 
  WHEN 'true' THEN 1 
  ELSE 10 
END as col;

Also negative cases like

select CASE 10.5 WHEN 'not a number' THEN 1 ELSE 10 END as col;
select CASE 
  WHEN 10 = 5 THEN 1 
  WHEN 'not boolean' THEN 1 
  ELSE 10 
END as col;
select CASE 
  WHEN 10 = 5 THEN 1 
  WHEN 5 THEN 'not a number'
  ELSE 10 
END as col;
select CASE 
  WHEN 10 = 5 THEN 1 
  WHEN 5 THEN 0
  ELSE 'goo' 
END as col;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way in sqllogictests to change the config of DataFusion? Since for negative cases to produce error, would need to set datafusion.optimizer.skip_failed_rules to false, unless you mean for the negative case to fail as in the original issue?

Otherwise can add the negative case as a unit test in actual code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with sqllogictests turned out to be kinda finicky, as this PR is meant to enhance the type coercion, but not actually testing the actual casting itself (since some of those negative cases would pass the type coercion but fail at execution time due to actual contents of the data). So I figured it'd be easier to add more tests as unit tests for proper testing.

@Jefffrey Jefffrey marked this pull request as draft April 1, 2023 22:46
@Jefffrey
Copy link
Contributor Author

Jefffrey commented Apr 1, 2023

I've found another bug related to case expression casting, will raise issue and also fix as part of this PR since will be affecting similar code

Edit: the bug #5828

Edit2: fix added

@github-actions github-actions bot added the physical-expr Physical Expressions label Apr 2, 2023
@Jefffrey Jefffrey changed the title fix: Coerce case expression and when to common type fix: Enhance case expression type coercion Apr 2, 2023
@Jefffrey Jefffrey marked this pull request as ready for review April 2, 2023 00:02
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think we need a few more tests and this PR will be ready to go

let when_value = as_boolean_array(&when_value)
.expect("WHEN expression did not return a BooleanArray");
let when_value = as_boolean_array(&when_value).map_err(|e| {
DataFusionError::Context(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah figured better to wrap in Context to hopefully provide more informative error message

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 3, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @Jefffrey

.map(|expr| expr.get_type(&schema))
.transpose()?;

// find common coercible types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this code really easy to read. Thank you @Jefffrey

@alamb alamb merged commit 3ff8d06 into apache:main Apr 3, 2023
@Jefffrey Jefffrey deleted the case_when_type_coercion branch April 3, 2023 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
2 participants