Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support 'LargeList' in array_pop_front and array_pop_back #8569

Merged
merged 6 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 66 additions & 24 deletions datafusion/physical-expr/src/array_expressions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -742,22 +742,78 @@ where
)?))
}

/// array_pop_back SQL function
pub fn array_pop_back(args: &[ArrayRef]) -> Result<ArrayRef> {
if args.len() != 1 {
return exec_err!("array_pop_back needs one argument");
}
fn general_pop_front_list<O: OffsetSizeTrait>(
array: &GenericListArray<O>,
) -> Result<ArrayRef>
where
i64: TryInto<O>,
{
let from_array = Int64Array::from(vec![2; array.len()]);
let to_array = Int64Array::from(
array
.iter()
.map(|arr| arr.map_or(0, |arr| arr.len() as i64))
.collect::<Vec<i64>>(),
);
general_array_slice::<O>(array, &from_array, &to_array)
}

let list_array = as_list_array(&args[0])?;
let from_array = Int64Array::from(vec![1; list_array.len()]);
fn general_pop_back_list<O: OffsetSizeTrait>(
array: &GenericListArray<O>,
) -> Result<ArrayRef>
where
i64: TryInto<O>,
{
let from_array = Int64Array::from(vec![1; array.len()]);
let to_array = Int64Array::from(
list_array
array
.iter()
.map(|arr| arr.map_or(0, |arr| arr.len() as i64 - 1))
.collect::<Vec<i64>>(),
);
let args = vec![args[0].clone(), Arc::new(from_array), Arc::new(to_array)];
array_slice(args.as_slice())
general_array_slice::<O>(array, &from_array, &to_array)
}

/// array_pop_front SQL function
pub fn array_pop_front(args: &[ArrayRef]) -> Result<ArrayRef> {
let array_data_type = args[0].data_type();
match array_data_type {
DataType::List(_) => {
let array = as_list_array(&args[0])?;
general_pop_front_list::<i32>(array)
}
DataType::LargeList(_) => {
let array = as_large_list_array(&args[0])?;
general_pop_front_list::<i64>(array)
}
_ => exec_err!(
"array_pop_front does not support type: {:?}",
array_data_type
),
}
}

/// array_pop_back SQL function
pub fn array_pop_back(args: &[ArrayRef]) -> Result<ArrayRef> {
if args.len() != 1 {
return exec_err!("array_pop_back needs one argument");
}

let array_data_type = args[0].data_type();
match array_data_type {
DataType::List(_) => {
let array = as_list_array(&args[0])?;
general_pop_back_list::<i32>(array)
}
DataType::LargeList(_) => {
let array = as_large_list_array(&args[0])?;
general_pop_back_list::<i64>(array)
}
_ => exec_err!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing to
https://github.com/apache/arrow-datafusion/pull/8571/files#diff-48cc9cf1bfdb0214a9f625b384d1c4fd5967a9da61e8f22a5dc1c4c5800563b4R1301
no data types check, and the output error type mismatches, we may want to make array functions to have consistent codebase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get your point. What do you mean by no data type check? Because here are largelist and list type checks. @comphead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusion,
what I was talking is

check_datatypes("array_positions", &[arr.values(), element])?;

which you do for array_positions method, but not doing for here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In array_positions, there are two arguments, array and element, which should be same data type. While for pop functions, they only accept one argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is true, i missed that part. thanks @Weijun-H
what about output error types? in array positions you return not_impl and in this PR its exec_err, but likely the conditions looks same and the error type should also be the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, they should be the same, as exec_err . I will change them in a follow pr.

"array_pop_back does not support type: {:?}",
array_data_type
),
}
}

/// Appends or prepends elements to a ListArray.
Expand Down Expand Up @@ -881,20 +937,6 @@ pub fn gen_range(args: &[ArrayRef]) -> Result<ArrayRef> {
Ok(arr)
}

/// array_pop_front SQL function
pub fn array_pop_front(args: &[ArrayRef]) -> Result<ArrayRef> {
let list_array = as_list_array(&args[0])?;
let from_array = Int64Array::from(vec![2; list_array.len()]);
let to_array = Int64Array::from(
list_array
.iter()
.map(|arr| arr.map_or(0, |arr| arr.len() as i64))
.collect::<Vec<i64>>(),
);
let args = vec![args[0].clone(), Arc::new(from_array), Arc::new(to_array)];
array_slice(args.as_slice())
}

/// Array_append SQL function
pub fn array_append(args: &[ArrayRef]) -> Result<ArrayRef> {
if args.len() != 2 {
Expand Down
75 changes: 75 additions & 0 deletions datafusion/sqllogictest/test_files/array.slt
Original file line number Diff line number Diff line change
Expand Up @@ -913,18 +913,33 @@ select array_pop_back(make_array(1, 2, 3, 4, 5)), array_pop_back(make_array('h',
----
[1, 2, 3, 4] [h, e, l, l]

query ??
select array_pop_back(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)')), array_pop_back(arrow_cast(make_array('h', 'e', 'l', 'l', 'o'), 'LargeList(Utf8)'));
----
[1, 2, 3, 4] [h, e, l, l]

# array_pop_back scalar function #2 (after array_pop_back, array is empty)
query ?
select array_pop_back(make_array(1));
----
[]

query ?
select array_pop_back(arrow_cast(make_array(1), 'LargeList(Int64)'));
----
[]

# array_pop_back scalar function #3 (array_pop_back the empty array)
query ?
select array_pop_back(array_pop_back(make_array(1)));
----
[]

query ?
select array_pop_back(array_pop_back(arrow_cast(make_array(1), 'LargeList(Int64)')));
----
[]

# array_pop_back scalar function #4 (array_pop_back the arrays which have NULL)
query ??
select array_pop_back(make_array(1, 2, 3, 4, NULL)), array_pop_back(make_array(NULL, 'e', 'l', NULL, 'o'));
Expand All @@ -937,24 +952,44 @@ select array_pop_back(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4]]

query ?
select array_pop_back(arrow_cast(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), make_array(1, 2, 3), make_array(1, 7, 4), make_array(4, 5, 6)), 'LargeList(List(Int64))'));
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4]]

# array_pop_back scalar function #6 (array_pop_back the nested arrays with NULL)
query ?
select array_pop_back(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), make_array(1, 2, 3), make_array(1, 7, 4), NULL));
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4]]

query ?
select array_pop_back(arrow_cast(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), make_array(1, 2, 3), make_array(1, 7, 4), NULL), 'LargeList(List(Int64))'));
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4]]

# array_pop_back scalar function #7 (array_pop_back the nested arrays with NULL)
query ?
select array_pop_back(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), NULL, make_array(1, 7, 4)));
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], ]

query ?
select array_pop_back(arrow_cast(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), NULL, make_array(1, 7, 4)), 'LargeList(List(Int64))'));
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], ]

# array_pop_back scalar function #8 (after array_pop_back, nested array is empty)
query ?
select array_pop_back(make_array(make_array(1, 2, 3)));
----
[]

query ?
select array_pop_back(arrow_cast(make_array(make_array(1, 2, 3)), 'LargeList(List(Int64))'));
----
[]

# array_pop_back with columns
query ?
select array_pop_back(column1) from arrayspop;
Expand All @@ -966,6 +1001,16 @@ select array_pop_back(column1) from arrayspop;
[]
[, 10, 11]

query ?
select array_pop_back(arrow_cast(column1, 'LargeList(Int64)')) from arrayspop;
----
[1, 2]
[3, 4, 5]
[6, 7, 8, ]
[, ]
[]
[, 10, 11]

## array_pop_front (aliases: `list_pop_front`)

# array_pop_front scalar function #1
Expand All @@ -974,36 +1019,66 @@ select array_pop_front(make_array(1, 2, 3, 4, 5)), array_pop_front(make_array('h
----
[2, 3, 4, 5] [e, l, l, o]

query ??
select array_pop_front(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)')), array_pop_front(arrow_cast(make_array('h', 'e', 'l', 'l', 'o'), 'LargeList(Utf8)'));
----
[2, 3, 4, 5] [e, l, l, o]

# array_pop_front scalar function #2 (after array_pop_front, array is empty)
query ?
select array_pop_front(make_array(1));
----
[]

query ?
select array_pop_front(arrow_cast(make_array(1), 'LargeList(Int64)'));
----
[]

# array_pop_front scalar function #3 (array_pop_front the empty array)
query ?
select array_pop_front(array_pop_front(make_array(1)));
----
[]

query ?
select array_pop_front(array_pop_front(arrow_cast(make_array(1), 'LargeList(Int64)')));
----
[]

# array_pop_front scalar function #5 (array_pop_front the nested arrays)
query ?
select array_pop_front(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), make_array(1, 2, 3), make_array(1, 7, 4), make_array(4, 5, 6)));
----
[[2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4], [4, 5, 6]]

query ?
select array_pop_front(arrow_cast(make_array(make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), make_array(1, 2, 3), make_array(1, 7, 4), make_array(4, 5, 6)), 'LargeList(List(Int64))'));
----
[[2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4], [4, 5, 6]]

# array_pop_front scalar function #6 (array_pop_front the nested arrays with NULL)
query ?
select array_pop_front(make_array(NULL, make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), make_array(1, 2, 3), make_array(1, 7, 4)));
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4]]

query ?
select array_pop_front(arrow_cast(make_array(NULL, make_array(1, 2, 3), make_array(2, 9, 1), make_array(7, 8, 9), make_array(1, 2, 3), make_array(1, 7, 4)), 'LargeList(List(Int64))'));
----
[[1, 2, 3], [2, 9, 1], [7, 8, 9], [1, 2, 3], [1, 7, 4]]

# array_pop_front scalar function #8 (after array_pop_front, nested array is empty)
query ?
select array_pop_front(make_array(make_array(1, 2, 3)));
----
[]

query ?
select array_pop_front(arrow_cast(make_array(make_array(1, 2, 3)), 'LargeList(List(Int64))'));
----
[]

## array_slice (aliases: list_slice)

# array_slice scalar function #1 (with positive indexes)
Expand Down