Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix strpos invocation with dictionary and null #12712

Merged
merged 1 commit into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 15 additions & 21 deletions datafusion/functions/src/unicode/strpos.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ use arrow::datatypes::{ArrowNativeType, DataType, Int32Type, Int64Type};

use crate::string::common::StringArrayType;
use crate::utils::{make_scalar_function, utf8_to_int_type};
use datafusion_common::{exec_err, plan_err, Result};
use datafusion_common::{exec_err, Result};
use datafusion_expr::TypeSignature::Exact;
use datafusion_expr::{ColumnarValue, ScalarUDFImpl, Signature, Volatility};

#[derive(Debug)]
Expand All @@ -40,8 +41,20 @@ impl Default for StrposFunc {

impl StrposFunc {
pub fn new() -> Self {
use DataType::*;
Self {
signature: Signature::user_defined(Volatility::Immutable),
signature: Signature::one_of(
vec![
Exact(vec![Utf8, Utf8]),
Exact(vec![Utf8, LargeUtf8]),
Exact(vec![LargeUtf8, Utf8]),
Exact(vec![LargeUtf8, LargeUtf8]),
Exact(vec![Utf8View, Utf8View]),
Exact(vec![Utf8View, Utf8]),
Exact(vec![Utf8View, LargeUtf8]),
],
Volatility::Immutable,
Comment on lines +47 to +56
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restores the code before 1b3608d . These are the types actually supported by invoke.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we invert the order? From another function we have this comment:

// Planner attempts coercion to the target type starting with the most preferred candidate.
// For example, given input (Utf8View, Int64), it first tries coercing to (Utf8View, Int64).
// If that fails, it proceeds to (Utf8, Int64).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment. I just restored the code that used to be here.
How function implementor can know what is the preferred order? Shouldn't this rather be engine's responsibility, if its given a choice? Unless of course the preferred candidate is function-specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question and I don't have the answer. I just have followed that advice since I first saw it a month or two back. I agree that I would hope the coercing logic could select the most optimal candidate.

),
aliases: vec![String::from("instr"), String::from("position")],
}
}
Expand Down Expand Up @@ -71,25 +84,6 @@ impl ScalarUDFImpl for StrposFunc {
fn aliases(&self) -> &[String] {
&self.aliases
}

fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
match arg_types {
[first, second ] => {
match (first, second) {
(DataType::LargeUtf8 | DataType::Utf8View | DataType::Utf8, DataType::LargeUtf8 | DataType::Utf8View | DataType::Utf8) => Ok(arg_types.to_vec()),
(DataType::Null, DataType::Null) => Ok(vec![DataType::Utf8, DataType::Utf8]),
(DataType::Null, _) => Ok(vec![DataType::Utf8, second.to_owned()]),
(_, DataType::Null) => Ok(vec![first.to_owned(), DataType::Utf8]),
(DataType::Dictionary(_, value_type), DataType::LargeUtf8 | DataType::Utf8View | DataType::Utf8) => match **value_type {
DataType::LargeUtf8 | DataType::Utf8View | DataType::Utf8 | DataType::Null | DataType::Binary => Ok(vec![*value_type.clone(), second.to_owned()]),
_ => plan_err!("The STRPOS/INSTR/POSITION function can only accept strings, but got {:?}.", **value_type),
},
_ => plan_err!("The STRPOS/INSTR/POSITION function can only accept strings, but got {:?}.", arg_types)
}
},
_ => plan_err!("The STRPOS/INSTR/POSITION function can only accept strings, but got {:?}", arg_types)
}
}
}

fn strpos(args: &[ArrayRef]) -> Result<ArrayRef> {
Expand Down
10 changes: 10 additions & 0 deletions datafusion/sqllogictest/test_files/functions.slt
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,16 @@ SELECT strpos(arrow_cast('helloworld', 'Dictionary(Int32, Utf8)'), 'world')
----
6

query I
SELECT strpos('helloworld', NULL)
----
NULL

query I
SELECT strpos(arrow_cast('helloworld', 'Dictionary(Int32, Utf8)'), NULL)
----
NULL

statement ok
CREATE TABLE products (
product_id INT PRIMARY KEY,
Expand Down
4 changes: 3 additions & 1 deletion datafusion/sqllogictest/test_files/scalar.slt
Original file line number Diff line number Diff line change
Expand Up @@ -1907,8 +1907,10 @@ select position('' in '')
1


query error POSITION function can only accept strings
query I
select position(1 in 1)
----
1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query used to fail also before 1b3608d , i don't know why.

According to

// Any type can be coerced into strings
(Utf8 | LargeUtf8, _) => Some(type_into.clone()),
, every type is implicitly coercible into a Utf8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this function only accept string? Why do we return 1 now



query I
Expand Down