-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new UDFs for array max/min/sort #5505
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -138,15 +138,13 @@ The square root of a value. | |
|
||
## Collections | ||
|
||
### `ARRAY_LENGTH` | ||
### `ARRAY` | ||
|
||
```sql | ||
ARRAY_LENGTH(ARRAY[1, 2, 3]) | ||
ARRAY[col1, col2, ...] | ||
``` | ||
|
||
Given an array, return the number of elements in the array. | ||
|
||
If the supplied parameter is NULL the method returns NULL. | ||
Construct an array from a variable number of inputs. | ||
|
||
### ``ARRAY_CONTAINS`` | ||
|
||
|
@@ -158,31 +156,60 @@ Given an array, checks if a search value is contained in the array. | |
|
||
Accepts any `ARRAY` type. The type of the second param must match the element type of the `ARRAY`. | ||
|
||
### `JSON_ARRAY_CONTAINS` | ||
### `ARRAY_LENGTH` | ||
|
||
```sql | ||
JSON_ARRAY_CONTAINS('[1, 2, 3]', 3) | ||
ARRAY_LENGTH(ARRAY[1, 2, 3]) | ||
``` | ||
|
||
Given a `STRING` containing a JSON array, checks if a search value is contained in the array. | ||
Given an array, return the number of elements in the array. | ||
|
||
Returns `false` if the first parameter does not contain a JSON array. | ||
If the supplied parameter is NULL the method returns NULL. | ||
|
||
### `ARRAY` | ||
### ``ARRAY_MAX`` | ||
|
||
```sql | ||
ARRAY[col1, col2, ...] | ||
ARRAY_MAX(["foo", "bar", "baz"]) | ||
``` | ||
|
||
Construct an array from a variable number of inputs. | ||
Returns the maximum value from within a given array of primitive elements (not arrays of other arrays, or maps, or structs, or combinations thereof). | ||
|
||
### `MAP` | ||
Array entries are compared according to their natural sort order, which sorts the various data-types per the following examples: | ||
- ```array_max[-1, 2, NULL, 0] -> 2``` | ||
- ```array_max[false, NULL, true] -> true``` | ||
- ```array_max["Foo", "Bar", NULL, "baz"] -> "baz"``` (lower-case characters are "greater" than upper-case characters) | ||
|
||
If the array field is NULL, or contains only NULLs, then NULL is returned. | ||
|
||
### ``ARRAY_MIN`` | ||
|
||
```sql | ||
MAP(key VARCHAR := value, ...) | ||
ARRAY_MIN(["foo", "bar", "baz"]) | ||
``` | ||
|
||
Construct a map from specific key-value tuples. | ||
Returns the minimum value from within a given array of primitive elements (not arrays of other arrays, or maps, or structs, or combinations thereof). | ||
|
||
Array entries are compared according to their natural sort order, which sorts the various data-types per the following examples: | ||
- ```array_min[-1, 2, NULL, 0] -> -1``` | ||
- ```array_min[false, NULL, true] -> false``` | ||
- ```array_min["Foo", "Bar", NULL, "baz"] -> "Bar"``` | ||
|
||
If the array field is NULL, or contains only NULLs, then NULL is returned. | ||
|
||
### ``ARRAY_SORT`` | ||
|
||
```sql | ||
ARRAY_SORT(["foo", "bar", "baz"]) | ||
``` | ||
|
||
Given an array of primitive elements (not arrays of other arrays, or maps, or structs, or combinations thereof), returns an array of the same elements sorted according to their natural sort order. Any NULLs contained in the array will always be moved to the end. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be worth adding a note and example or two about the optional There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep, oversight on my part. adding now |
||
|
||
For example: | ||
- ```array_sort[-1, 2, NULL, 0] -> [-1, 0, 2, NULL]``` | ||
- ```array_sort[false, NULL, true] -> [false, true, NULL]``` | ||
- ```array_sort["Foo", "Bar", NULL, "baz"] -> ["Bar", "Foo", "baz", NULL]``` | ||
|
||
If the array field is NULL then NULL is returned. | ||
|
||
### `AS_MAP` | ||
|
||
|
@@ -212,6 +239,25 @@ Returns the 1-indexed position of `str` in `args`, or 0 if not found. | |
If `str` is NULL, the return value is 0, because NULL is not considered | ||
to be equal to any value. FIELD is the complement to ELT. | ||
|
||
### `JSON_ARRAY_CONTAINS` | ||
|
||
```sql | ||
JSON_ARRAY_CONTAINS('[1, 2, 3]', 3) | ||
``` | ||
|
||
Given a `STRING` containing a JSON array, checks if a search value is contained in the array. | ||
|
||
Returns `false` if the first parameter does not contain a JSON array. | ||
|
||
### `MAP` | ||
|
||
```sql | ||
MAP(key VARCHAR := value, ...) | ||
``` | ||
|
||
Construct a map from specific key-value tuples. | ||
|
||
|
||
### `SLICE` | ||
|
||
```sql | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
/* | ||
* Copyright 2020 Confluent Inc. | ||
* | ||
* Licensed under the Confluent Community License (the "License"; you may not use this file except | ||
* in compliance with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.confluent.io/confluent-community-license | ||
* | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License | ||
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and limitations under the | ||
* License. | ||
*/ | ||
|
||
package io.confluent.ksql.function.udf.array; | ||
|
||
import io.confluent.ksql.function.udf.Udf; | ||
import io.confluent.ksql.function.udf.UdfDescription; | ||
import io.confluent.ksql.function.udf.UdfParameter; | ||
import java.util.List; | ||
|
||
/** | ||
* This UDF traverses the elements of an Array field to find and return the maximum contained value. | ||
*/ | ||
@UdfDescription( | ||
name = "array_max", | ||
description = "Return the maximum value from within an array of primitive values, according to" | ||
+ " their natural sort order. If the array is NULL, or contains only NULLs, return NULL.") | ||
public class ArrayMax { | ||
|
||
@Udf | ||
public <T extends Comparable<? super T>> T arrayMax(@UdfParameter( | ||
description = "The array to sort") final List<T> input) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm guessing this isn't the intended description. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch, thx! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed |
||
if (input == null) { | ||
return null; | ||
} | ||
|
||
T candidate = (T) null; | ||
for (T thisVal : input) { | ||
if (thisVal != null) { | ||
if (candidate == null) { | ||
candidate = thisVal; | ||
} else if (thisVal.compareTo(candidate) > 0) | ||
candidate = thisVal; | ||
} | ||
} | ||
return candidate; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
/* | ||
* Copyright 2020 Confluent Inc. | ||
* | ||
* Licensed under the Confluent Community License (the "License"; you may not use this file except | ||
* in compliance with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.confluent.io/confluent-community-license | ||
* | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License | ||
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and limitations under the | ||
* License. | ||
*/ | ||
|
||
package io.confluent.ksql.function.udf.array; | ||
|
||
import io.confluent.ksql.function.udf.Udf; | ||
import io.confluent.ksql.function.udf.UdfDescription; | ||
import io.confluent.ksql.function.udf.UdfParameter; | ||
import java.util.List; | ||
|
||
/** | ||
* This UDF traverses the elements of an Array field to find and return the minimum contained value. | ||
*/ | ||
@UdfDescription( | ||
name = "array_min", | ||
description = "Return the minimum value from within an array of primitive values, according to" | ||
+ " their natural sort order. If the array is NULL, or contains only NULLs, return NULL.") | ||
public class ArrayMin { | ||
|
||
@Udf | ||
public <T extends Comparable<? super T>> T arrayMin(@UdfParameter( | ||
description = "The array to sort") final List<T> input) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm guessing this isn't the intended description. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
if (input == null) { | ||
return null; | ||
} | ||
|
||
T candidate = (T) null; | ||
for (T thisVal : input) { | ||
if (thisVal != null) { | ||
if (candidate == null) { | ||
candidate = thisVal; | ||
} else if (thisVal.compareTo(candidate) < 0) | ||
candidate = thisVal; | ||
} | ||
} | ||
return candidate; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
/* | ||
* Copyright 2020 Confluent Inc. | ||
* | ||
* Licensed under the Confluent Community License (the "License"; you may not use this file except | ||
* in compliance with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.confluent.io/confluent-community-license | ||
* | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License | ||
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and limitations under the | ||
* License. | ||
*/ | ||
|
||
package io.confluent.ksql.function.udf.array; | ||
|
||
import static java.util.Comparator.naturalOrder; | ||
import static java.util.Comparator.nullsLast; | ||
|
||
import com.google.common.collect.Lists; | ||
import io.confluent.ksql.function.udf.Udf; | ||
import io.confluent.ksql.function.udf.UdfDescription; | ||
import io.confluent.ksql.function.udf.UdfParameter; | ||
import java.util.Collections; | ||
import java.util.List; | ||
|
||
/** | ||
* This UDF sorts the elements of an array according to their natural sort order. | ||
*/ | ||
@UdfDescription( | ||
name = "array_sort", | ||
description = "Sort an array of primitive values, according to their natural sort order. Any " | ||
+ "NULLs in the array will be placed at the end.") | ||
public class ArraySort { | ||
|
||
private static final List<String> SORT_DIRECTION_ASC = Lists.newArrayList("ASC", "ASCENDING"); | ||
private static final List<String> SORT_DIRECTION_DESC = Lists.newArrayList("DESC", "DESCENDING"); | ||
|
||
@Udf | ||
public <T extends Comparable<? super T>> List<T> arraySortDefault(@UdfParameter( | ||
description = "The array to sort") final List<T> input) { | ||
return arraySortWithDirection(input, "ASC"); | ||
} | ||
|
||
@Udf | ||
public <T extends Comparable<? super T>> List<T> arraySortWithDirection(@UdfParameter( | ||
description = "The array to sort") final List<T> input, | ||
@UdfParameter( | ||
description = "Marks the end of the series (inclusive)") final String direction) { | ||
if (input == null) { | ||
return null; | ||
} | ||
if (SORT_DIRECTION_ASC.contains(direction.toUpperCase())) { | ||
input.sort(nullsLast(naturalOrder())); | ||
} else if (SORT_DIRECTION_DESC.contains(direction.toUpperCase())) { | ||
input.sort(nullsLast(Collections.reverseOrder())); | ||
} else { | ||
return null; | ||
} | ||
return input; | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
/* | ||
* Copyright 2020 Confluent Inc. | ||
* | ||
* Licensed under the Confluent Community License (the "License"); you may not use this file except | ||
* in compliance with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.confluent.io/confluent-community-license | ||
* | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License | ||
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and limitations under the | ||
* License. | ||
*/ | ||
|
||
package io.confluent.ksql.function.udf.array; | ||
|
||
import static org.hamcrest.CoreMatchers.nullValue; | ||
import static org.hamcrest.MatcherAssert.assertThat; | ||
import static org.hamcrest.Matchers.is; | ||
import java.math.BigDecimal; | ||
import java.util.Arrays; | ||
import java.util.List; | ||
import org.junit.Test; | ||
|
||
public class ArrayMaxTest { | ||
|
||
private final ArrayMax udf = new ArrayMax(); | ||
|
||
@Test | ||
public void shouldFindBoolMax() { | ||
final List<Boolean> input = Arrays.asList(true, false, false); | ||
assertThat(udf.arrayMax(input), is(Boolean.TRUE)); | ||
} | ||
|
||
@Test | ||
public void shouldFindIntMax() { | ||
final List<Integer> input = Arrays.asList(1, 3, -2); | ||
assertThat(udf.arrayMax(input), is(3)); | ||
} | ||
|
||
@Test | ||
public void shouldFindBigIntMax() { | ||
final List<Long> input = Arrays.asList(1L, 3L, -2L); | ||
assertThat(udf.arrayMax(input), is(Long.valueOf(3))); | ||
} | ||
|
||
@Test | ||
public void shouldFindDoubleMax() { | ||
final List<Double> input = | ||
Arrays.asList(Double.valueOf(1.1), Double.valueOf(3.1), Double.valueOf(-1.1)); | ||
assertThat(udf.arrayMax(input), is(Double.valueOf(3.1))); | ||
} | ||
|
||
@Test | ||
public void shouldFindStringMax() { | ||
final List<String> input = Arrays.asList("foo", "food", "bar"); | ||
assertThat(udf.arrayMax(input), is("food")); | ||
} | ||
|
||
@Test | ||
public void shouldFindStringMaxMixedCase() { | ||
final List<String> input = Arrays.asList("foo", "Food", "bar"); | ||
assertThat(udf.arrayMax(input), is("foo")); | ||
} | ||
|
||
@Test | ||
public void shouldFindDecimalMax() { | ||
final List<BigDecimal> input = | ||
Arrays.asList(BigDecimal.valueOf(1.2), BigDecimal.valueOf(1.3), BigDecimal.valueOf(-1.2)); | ||
assertThat(udf.arrayMax(input), is(BigDecimal.valueOf(1.3))); | ||
} | ||
|
||
@Test | ||
public void shouldReturnNullForNullInput() { | ||
assertThat(udf.arrayMax((List<String>) null), is(nullValue())); | ||
} | ||
|
||
@Test | ||
public void shouldReturnNullForListOfNullInput() { | ||
final List<Integer> input = Arrays.asList(null, null, null); | ||
assertThat(udf.arrayMax(input), is(nullValue())); | ||
} | ||
|
||
@Test | ||
public void shouldReturnValueForMixedInput() { | ||
final List<String> input = Arrays.asList(null, "foo", null, "bar", null); | ||
assertThat(udf.arrayMax(input), is("foo")); | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this array literal isn't intended to be a valid argument, but it may be worth using single quotes here since
ARRAY["foo", "bar"]
wouldn't parse.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed all the occurrences i could find