-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNNEST should not be allowed within CASE statements #13466
Comments
select case
when True then [1]
when False then generate_series(0,2)
end as x; |
@pkoppstein yes indeed, unnest() surfaces the issue here, since it actually modifies the number of rows in the query it becomes evident that the THEN part was evaluated even though WHEN was false Another way to prove that this is how duckdb currently works is to just throw in costly calculatations that should not be evaluated, like so: # without generate_series
%timeit -n 10000 (duckdb.query("select case when True then [1] end as x"))
# 44.8 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
false_stmt = ["when False then generate_series(0,100000)"]
# 1 generate_series
%timeit -n 10000 (duckdb.query(f"select case when True then [1] {' '.join(false_stmt*1)} end as x"))
# 69.4 µs ± 2.06 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# 10 generate_series
%timeit -n 10000 (duckdb.query(f"select case when True then [1] {' '.join(false_stmt*10)} end as x"))
# 369 µs ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) |
FYI
|
Thanks for the report!
postgres=# select
case
when True then 1
when False then unnest(array[0,1,2])
end as x;
ERROR: set-returning functions are not allowed in CASE
LINE 4: when False then unnest(array[0,1,2])
^
-- HINT: You might be able to move the set-returning function into a LATERAL FROM item. I think that is also sane and we might want to switch to throwing an exception.
DuckDB short-circuits case statements. In your example this most likely does not happen because the individual case statements are constants (i.e. D create table ranges as select 999999999999999 range_bound;
D select case when true then [] else generate_series(0, range_bound) end as r from ranges;
┌─────────┐
│ r │
│ int64[] │
├─────────┤
│ [] │
└─────────┘
-- or with multiple values:
D create or replace table ranges as select 0 as range_bound union all select 999999999999999 range_bound;
D select case when range_bound > 10 then [] else generate_series(0, range_bound) end as r from ranges;
┌─────────┐
│ r │
│ int64[] │
├─────────┤
│ [0] │
│ [] │
└─────────┘
|
I see, beeing able to use unnest() as part of the select statement is nice (bigquery also does not allow this).
From a users perspective I don't think its intuitive or expected at all that the False part here has any impact on the output. Thanks! |
I agree that an exception is better, but the current behavior is consistent with the way aggregates work in SQL, which is the other way of changing result cardinality from within the postgres=# select case when true then 1 else count(*) end from generate_series(0,1000);
count
-------
1
(1 row) Now you can argue that is also confusing and I would agree :) |
wow, ok I see, that one is also definitely weird, puts a dent in SQL beeing declarative for me :) I'm fine with raising an exception in the case of unnest. But I'll leave the use-case I had here in case it might influence discussions down the line. For certain values of a given column, I needed to duplicate those rows with an incremental counter in a new column, e.g:
|
Perhaps that can be solved by moving the with input as (
select 1 as x
union all
select 5 as x
)
select
x,
unnest(case
when x = 1 then [x]
when x > 1 then generate_series(0,x)
end) as y
from input ; |
It does indeed! Nice one, thanks :) |
@Mytherin - You changed the title from CASE-statement does not short-circuit UNNEST evaluation to UNNEST should not be allowed within CASE statements But is that the appropriate conclusion? Consider:
This seems quite correct: (1) the logic of |
Related: #14012 (which should be titled something like "Errors during constant folding should be caught and only be raised where required" ) |
What happens?
When building a case statement it seems the THEN part is allways evaluated. This became evident when I had
unnest(generate_series(..))
in one of my case-stements, resulting in unexpected output.To Reproduce
Results in:
While I expected:
OS:
aarch64
DuckDB Version:
1.0.0
DuckDB Client:
Python
Full Name:
Thomas Rolfsnes
Affiliation:
Story House Egmont
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: