Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimizer] Infer is not null predicate from where clause #2254

Closed
Tracked by #2255
jackwener opened this issue Apr 17, 2022 · 4 comments
Closed
Tracked by #2255

[Optimizer] Infer is not null predicate from where clause #2254

jackwener opened this issue Apr 17, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@jackwener
Copy link
Member

jackwener commented Apr 17, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There are some predicate include implicit is not null.

select * from t1 where c1 > 0;  
-> 
select * from t1 where c1 > 0 and c1 is not null;

Why do this?

Expr in select expr and where expr is different.

Same expr in select don't include implicit is not null.

After this,

We can do this

@jackwener jackwener added the enhancement New feature or request label Apr 17, 2022
@jackwener
Copy link
Member Author

jackwener commented Apr 17, 2022

Note: It isn't a optimizer rule !

It is because the following transformation can't terminate in Finite State Machine of rule.

select * from t1 where c1 > 0;  
-> 
select * from t1 where c1 > 0 and c1 is not null;
->
select * from t1 where c1 > 0 and c1 is not null and c1 is not null;
->
.....

So, In fact, this is a rewrite in planner build (Specifically, it is in plan_selection).

@alamb
Copy link
Contributor

alamb commented Apr 18, 2022

I think this is a very subtle area of SQL and it depends on how the expression is being used

For example,

select * from t1 where c1 > 0;  

and

select * from t1 where c1 > 0 and c1 is not null;

Are semantically equivalent (produce the same answers), but these are not:

select c1 > 0  from t1;

and

select c1 > 0 and c1 is not null from t1;

The reason is that the semantics of the WHERE clause are that only rows that evaluate to true are passed (if the row evaluates to NULL or false then they don't pass).

However, in general, you still have to handle the three way logic with NULL -- I am not at all convinced we do this entirely correctly in the expression simplification pass. We should probably have two modes: used in where and used elsewhere or something 🤔

@jackwener
Copy link
Member Author

Great summary, this rewrite is from spark, I will read and consider carefully about this part.

I want to do this just because some expr rewrite is include this problem.

@jackwener
Copy link
Member Author

jackwener commented Apr 19, 2022

I am not at all convinced we do this entirely correctly in the expression simplification pass. We should probably have two modes: used in where and used elsewhere or something 🤔

Yes, This is where the problem lies. Other optimizer rules depend on implicit not null in where, which must handle this problem.

I think we should infer is not null, so other rule don't need to do infer not null but use directly is not null inferred by this rewrite. For example, convert outer join to left/right outer join or inner join can use is not null directly instead of doing it by self in rule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants