Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add bool_and and bool_or aggregate functions #314

Merged
merged 4 commits into from
Sep 8, 2022

Conversation

vibhatha
Copy link
Contributor

@vibhatha vibhatha commented Sep 6, 2022

Adding the bool_and and bool_or aggregation functions.

@vibhatha
Copy link
Contributor Author

vibhatha commented Sep 6, 2022

cc @jvanstraten

Copy link
Contributor

@jvanstraten jvanstraten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null behavior of these functions is ridiculous, but apparently SQL defines them this way. LGTM

Copy link
Contributor

@jvanstraten jvanstraten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agh, just when I click the approve button I see the file this is defined in. functions_boolean.yaml exists. Why aren't they in there?

@vibhatha
Copy link
Contributor Author

vibhatha commented Sep 6, 2022

Agh, just when I click the approve button I see the file this is defined in. functions_boolean.yaml exists. Why aren't they in there?

Ah... yes. Better to move there.

@vibhatha
Copy link
Contributor Author

vibhatha commented Sep 6, 2022

Moved. cc @jvanstraten

@ianmcook
Copy link
Contributor

ianmcook commented Sep 6, 2022

What do these functions return when all values in the input are null?

@vibhatha
Copy link
Contributor Author

vibhatha commented Sep 6, 2022

@ianmcook good question. This is my understanding so far, as per these references 1 and 2, the latter says all the null value inputs are ignored. I belive then we have to define the default behavior to return FALSE as per my understanding. Which is lacking from the current definition. cc @jvanstraten can we include default values?

Is there a better way to do this?

@ianmcook
Copy link
Contributor

ianmcook commented Sep 6, 2022

FWIW, shell.duckdb.org returns empty cells (which I think means null values?) in this case

image

@jvanstraten
Copy link
Contributor

I would have assumed true for an empty bool_and and false for an empty bool_or, but I suppose I shouldn't have been blindly applying regular boolean logic to query engines. Experimentally, because I stopped trusting documentation for null behavior a while ago:

bool_and:

input postgres duckdb
empty null null
null null null
true true true
false false false
null, true true true
null, false false false
true, false false false

bool_or:

input postgres duckdb
empty null null
null null null
true true true
false false false
null, true true true
null, false false false
true, false true true
DuckDB

Version = 0.5.0

import duckdb

con = duckdb.connect(database=':memory:')

con.execute("create table tab_empty(x bool);")
con.execute("create table tab_null(x bool);")
con.execute("insert into tab_null values (null);")
con.execute("create table tab_true(x bool);")
con.execute("insert into tab_true values (true);")
con.execute("create table tab_false(x bool);")
con.execute("insert into tab_false values (false);")
con.execute("create table tab_null_true(x bool);")
con.execute("insert into tab_null_true values (null), (true);")
con.execute("create table tab_null_false(x bool);")
con.execute("insert into tab_null_false values (null), (false);")
con.execute("create table tab_true_false(x bool);")
con.execute("insert into tab_true_false values (true), (false);")
con.fetchall()

for table in ["tab_empty", "tab_null", "tab_true", "tab_false", "tab_null_true", "tab_null_false", "tab_true_false",]:
    con.execute(f"select bool_and(x), bool_or(x) from {table}")
    print(table, con.fetchall())
PostgreSQL

Version = 14 (using db-fiddle)

create table tab_empty(x bool);
create table tab_null(x bool);
insert into tab_null values (null);
create table tab_true(x bool);
insert into tab_true values (true);
create table tab_false(x bool);
insert into tab_false values (false);
create table tab_null_true(x bool);
insert into tab_null_true values (null), (true);
create table tab_null_false(x bool);
insert into tab_null_false values (null), (false);
create table tab_true_false(x bool);
insert into tab_true_false values (true), (false);
select bool_and(x), bool_or(x) from tab_empty

etc.

So they are, thankfully, consistent (when I first saw this I thought they wouldn't be), but nulls are indeed ignored (contrary to every other function that I've seen operate on bools), and an empty input (after ignoring nulls) is further special-cased to return null (which at least is consistent with other aggregates, just not consistent with how the rest of the world handles boolean algebra).

Comment on lines 113 to 138
-
name: "bool_and"
description: >
Aggregate boolean values and returns a boolean value.
If all of the values in the aggregation are true, function returns TRUE,
else it returns FALSE if any of the returned values are false. NULL values
are ignored by this function.
impls:
- args:
- value: boolean
name: x
nullability: DECLARED_OUTPUT
return: boolean
-
name: "bool_or"
description: >
Aggregate boolean values and returns a boolean value.
If any value in the aggregation is true, function returns TRUE,
else it returns FALSE if all of the returned values are false. NULL values
are ignored by this function.
impls:
- args:
- value: boolean
name: x
nullability: DECLARED_OUTPUT
return: boolean
Copy link
Contributor

@jvanstraten jvanstraten Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I might as well just do it like this:

Suggested change
-
name: "bool_and"
description: >
Aggregate boolean values and returns a boolean value.
If all of the values in the aggregation are true, function returns TRUE,
else it returns FALSE if any of the returned values are false. NULL values
are ignored by this function.
impls:
- args:
- value: boolean
name: x
nullability: DECLARED_OUTPUT
return: boolean
-
name: "bool_or"
description: >
Aggregate boolean values and returns a boolean value.
If any value in the aggregation is true, function returns TRUE,
else it returns FALSE if all of the returned values are false. NULL values
are ignored by this function.
impls:
- args:
- value: boolean
name: x
nullability: DECLARED_OUTPUT
return: boolean
-
name: "bool_and"
description: >
If any value in the input is false, false is returned. If the input is
empty or only contains nulls, null is returned. Otherwise, true is
returned.
impls:
- args:
- value: boolean
name: x
nullability: DECLARED_OUTPUT
return: boolean?
-
name: "bool_or"
description: >
If any value in the input is true, true is returned. If the input is
empty or only contains nulls, null is returned. Otherwise, false is
returned.
impls:
- args:
- value: boolean
name: x
nullability: DECLARED_OUTPUT
return: boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

@jvanstraten jvanstraten changed the title feat: support bool_and and bool_or feat: add bool_and and bool_or aggregate functions Sep 6, 2022
Copy link
Contributor

@jvanstraten jvanstraten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that the functions ended up in the list of scalar functions instead of aggregate functions. Looks good otherwise...

@vibhatha
Copy link
Contributor Author

vibhatha commented Sep 8, 2022

@jvanstraten I added a aggregate_functions section.

@jvanstraten jvanstraten merged commit 52fa523 into substrait-io:main Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants