Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new UDFs for array max/min/sort #5505

Merged
merged 4 commits into from
May 29, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 61 additions & 15 deletions docs/developer-guide/ksqldb-reference/scalar-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,15 +138,13 @@ The square root of a value.

## Collections

### `ARRAY_LENGTH`
### `ARRAY`

```sql
ARRAY_LENGTH(ARRAY[1, 2, 3])
ARRAY[col1, col2, ...]
```

Given an array, return the number of elements in the array.

If the supplied parameter is NULL the method returns NULL.
Construct an array from a variable number of inputs.

### ``ARRAY_CONTAINS``

Expand All @@ -158,31 +156,60 @@ Given an array, checks if a search value is contained in the array.

Accepts any `ARRAY` type. The type of the second param must match the element type of the `ARRAY`.

### `JSON_ARRAY_CONTAINS`
### `ARRAY_LENGTH`

```sql
JSON_ARRAY_CONTAINS('[1, 2, 3]', 3)
ARRAY_LENGTH(ARRAY[1, 2, 3])
```

Given a `STRING` containing a JSON array, checks if a search value is contained in the array.
Given an array, return the number of elements in the array.

Returns `false` if the first parameter does not contain a JSON array.
If the supplied parameter is NULL the method returns NULL.

### `ARRAY`
### ``ARRAY_MAX``

```sql
ARRAY[col1, col2, ...]
ARRAY_MAX(["foo", "bar", "baz"])
```

Construct an array from a variable number of inputs.
Returns the maximum value from within a given array of primitive elements (not arrays of other arrays, or maps, or structs, or combinations thereof).

### `MAP`
Array entries are compared according to their natural sort order, which sorts the various data-types per the following examples:
- ```array_max[-1, 2, NULL, 0] -> 2```
- ```array_max[false, NULL, true] -> true```
- ```array_max["Foo", "Bar", NULL, "baz"] -> "baz"``` (lower-case characters are "greater" than upper-case characters)

If the array field is NULL, or contains only NULLs, then NULL is returned.

### ``ARRAY_MIN``

```sql
MAP(key VARCHAR := value, ...)
ARRAY_MIN(["foo", "bar", "baz"])
```

Construct a map from specific key-value tuples.
Returns the minimum value from within a given array of primitive elements (not arrays of other arrays, or maps, or structs, or combinations thereof).

Array entries are compared according to their natural sort order, which sorts the various data-types per the following examples:
- ```array_min[-1, 2, NULL, 0] -> -1```
- ```array_min[false, NULL, true] -> false```
- ```array_min["Foo", "Bar", NULL, "baz"] -> "Bar"```

If the array field is NULL, or contains only NULLs, then NULL is returned.

### ``ARRAY_SORT``

```sql
ARRAY_SORT(["foo", "bar", "baz"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this array literal isn't intended to be a valid argument, but it may be worth using single quotes here since ARRAY["foo", "bar"] wouldn't parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed all the occurrences i could find

```

Given an array of primitive elements (not arrays of other arrays, or maps, or structs, or combinations thereof), returns an array of the same elements sorted according to their natural sort order. Any NULLs contained in the array will always be moved to the end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth adding a note and example or two about the optional ASC or DESC order specifier since that's pretty useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, oversight on my part. adding now


For example:
- ```array_sort[-1, 2, NULL, 0] -> [-1, 0, 2, NULL]```
- ```array_sort[false, NULL, true] -> [false, true, NULL]```
- ```array_sort["Foo", "Bar", NULL, "baz"] -> ["Bar", "Foo", "baz", NULL]```

If the array field is NULL then NULL is returned.

### `AS_MAP`

Expand Down Expand Up @@ -212,6 +239,25 @@ Returns the 1-indexed position of `str` in `args`, or 0 if not found.
If `str` is NULL, the return value is 0, because NULL is not considered
to be equal to any value. FIELD is the complement to ELT.

### `JSON_ARRAY_CONTAINS`

```sql
JSON_ARRAY_CONTAINS('[1, 2, 3]', 3)
```

Given a `STRING` containing a JSON array, checks if a search value is contained in the array.

Returns `false` if the first parameter does not contain a JSON array.

### `MAP`

```sql
MAP(key VARCHAR := value, ...)
```

Construct a map from specific key-value tuples.


### `SLICE`

```sql
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Copyright 2020 Confluent Inc.
*
* Licensed under the Confluent Community License (the "License"; you may not use this file except
* in compliance with the License. You may obtain a copy of the License at
*
* http://www.confluent.io/confluent-community-license
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and limitations under the
* License.
*/

package io.confluent.ksql.function.udf.array;

import io.confluent.ksql.function.udf.Udf;
import io.confluent.ksql.function.udf.UdfDescription;
import io.confluent.ksql.function.udf.UdfParameter;
import java.util.List;

/**
* This UDF traverses the elements of an Array field to find and return the maximum contained value.
*/
@UdfDescription(
name = "array_max",
description = "Return the maximum value from within an array of primitive values, according to"
+ " their natural sort order. If the array is NULL, or contains only NULLs, return NULL.")
public class ArrayMax {

@Udf
public <T extends Comparable<? super T>> T arrayMax(@UdfParameter(
description = "The array to sort") final List<T> input) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this isn't the intended description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thx!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

if (input == null) {
return null;
}

T candidate = (T) null;
for (T thisVal : input) {
if (thisVal != null) {
if (candidate == null) {
candidate = thisVal;
} else if (thisVal.compareTo(candidate) > 0)
candidate = thisVal;
}
}
return candidate;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Copyright 2020 Confluent Inc.
*
* Licensed under the Confluent Community License (the "License"; you may not use this file except
* in compliance with the License. You may obtain a copy of the License at
*
* http://www.confluent.io/confluent-community-license
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and limitations under the
* License.
*/

package io.confluent.ksql.function.udf.array;

import io.confluent.ksql.function.udf.Udf;
import io.confluent.ksql.function.udf.UdfDescription;
import io.confluent.ksql.function.udf.UdfParameter;
import java.util.List;

/**
* This UDF traverses the elements of an Array field to find and return the minimum contained value.
*/
@UdfDescription(
name = "array_min",
description = "Return the minimum value from within an array of primitive values, according to"
+ " their natural sort order. If the array is NULL, or contains only NULLs, return NULL.")
public class ArrayMin {

@Udf
public <T extends Comparable<? super T>> T arrayMin(@UdfParameter(
description = "The array to sort") final List<T> input) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this isn't the intended description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (input == null) {
return null;
}

T candidate = (T) null;
for (T thisVal : input) {
if (thisVal != null) {
if (candidate == null) {
candidate = thisVal;
} else if (thisVal.compareTo(candidate) < 0)
candidate = thisVal;
}
}
return candidate;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
/*
* Copyright 2020 Confluent Inc.
*
* Licensed under the Confluent Community License (the "License"; you may not use this file except
* in compliance with the License. You may obtain a copy of the License at
*
* http://www.confluent.io/confluent-community-license
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and limitations under the
* License.
*/

package io.confluent.ksql.function.udf.array;

import static java.util.Comparator.naturalOrder;
import static java.util.Comparator.nullsLast;

import com.google.common.collect.Lists;
import io.confluent.ksql.function.udf.Udf;
import io.confluent.ksql.function.udf.UdfDescription;
import io.confluent.ksql.function.udf.UdfParameter;
import java.util.Collections;
import java.util.List;

/**
* This UDF sorts the elements of an array according to their natural sort order.
*/
@UdfDescription(
name = "array_sort",
description = "Sort an array of primitive values, according to their natural sort order. Any "
+ "NULLs in the array will be placed at the end.")
public class ArraySort {

private static final List<String> SORT_DIRECTION_ASC = Lists.newArrayList("ASC", "ASCENDING");
private static final List<String> SORT_DIRECTION_DESC = Lists.newArrayList("DESC", "DESCENDING");

@Udf
public <T extends Comparable<? super T>> List<T> arraySortDefault(@UdfParameter(
description = "The array to sort") final List<T> input) {
return arraySortWithDirection(input, "ASC");
}

@Udf
public <T extends Comparable<? super T>> List<T> arraySortWithDirection(@UdfParameter(
description = "The array to sort") final List<T> input,
@UdfParameter(
description = "Marks the end of the series (inclusive)") final String direction) {
if (input == null) {
return null;
}
if (SORT_DIRECTION_ASC.contains(direction.toUpperCase())) {
input.sort(nullsLast(naturalOrder()));
} else if (SORT_DIRECTION_DESC.contains(direction.toUpperCase())) {
input.sort(nullsLast(Collections.reverseOrder()));
} else {
return null;
}
return input;
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Copyright 2020 Confluent Inc.
*
* Licensed under the Confluent Community License (the "License"); you may not use this file except
* in compliance with the License. You may obtain a copy of the License at
*
* http://www.confluent.io/confluent-community-license
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and limitations under the
* License.
*/

package io.confluent.ksql.function.udf.array;

import static org.hamcrest.CoreMatchers.nullValue;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.is;
import java.math.BigDecimal;
import java.util.Arrays;
import java.util.List;
import org.junit.Test;

public class ArrayMaxTest {

private final ArrayMax udf = new ArrayMax();

@Test
public void shouldFindBoolMax() {
final List<Boolean> input = Arrays.asList(true, false, false);
assertThat(udf.arrayMax(input), is(Boolean.TRUE));
}

@Test
public void shouldFindIntMax() {
final List<Integer> input = Arrays.asList(1, 3, -2);
assertThat(udf.arrayMax(input), is(3));
}

@Test
public void shouldFindBigIntMax() {
final List<Long> input = Arrays.asList(1L, 3L, -2L);
assertThat(udf.arrayMax(input), is(Long.valueOf(3)));
}

@Test
public void shouldFindDoubleMax() {
final List<Double> input =
Arrays.asList(Double.valueOf(1.1), Double.valueOf(3.1), Double.valueOf(-1.1));
assertThat(udf.arrayMax(input), is(Double.valueOf(3.1)));
}

@Test
public void shouldFindStringMax() {
final List<String> input = Arrays.asList("foo", "food", "bar");
assertThat(udf.arrayMax(input), is("food"));
}

@Test
public void shouldFindStringMaxMixedCase() {
final List<String> input = Arrays.asList("foo", "Food", "bar");
assertThat(udf.arrayMax(input), is("foo"));
}

@Test
public void shouldFindDecimalMax() {
final List<BigDecimal> input =
Arrays.asList(BigDecimal.valueOf(1.2), BigDecimal.valueOf(1.3), BigDecimal.valueOf(-1.2));
assertThat(udf.arrayMax(input), is(BigDecimal.valueOf(1.3)));
}

@Test
public void shouldReturnNullForNullInput() {
assertThat(udf.arrayMax((List<String>) null), is(nullValue()));
}

@Test
public void shouldReturnNullForListOfNullInput() {
final List<Integer> input = Arrays.asList(null, null, null);
assertThat(udf.arrayMax(input), is(nullValue()));
}

@Test
public void shouldReturnValueForMixedInput() {
final List<String> input = Arrays.asList(null, "foo", null, "bar", null);
assertThat(udf.arrayMax(input), is("foo"));
}

}
Loading