Don't ignore failed optimizer rules #4615

crepererum · 2022-12-14T11:40:00Z

Describe the bug
Some logical optimizer rules like type coercion are actually essential for correct query execution, but are currently silently ignored when they failed.

To Reproduce
TBD

Expected behavior
Don't skip failed rules (this probably requires us to remove the non-failing OptimizerRule::optimize method).

Workaround
This behavior is controlled by a config option (that defaults to true meaning to silently skip failed errors).

You can see this in the datafusion cli:

DataFusion CLI v20.0.0
❯ show all;
+-----------------------------------------------------------+------------+
| name                                                      | setting    |
+-----------------------------------------------------------+------------+
...
| datafusion.optimizer.skip_failed_rules                    | true       |
...
+-----------------------------------------------------------+------------+
35 rows in set. Query took 0.038 seconds.

Datafusion can be configured to fail hard if an optimizer rule fails like

❯ -- default behavior
❯ set datafusion.optimizer.skip_failed_rules = false;
0 rows in set. Query took 0.023 seconds.

Additional context
See this discussion.

The text was updated successfully, but these errors were encountered:

jackwener · 2022-12-14T11:45:07Z

Make sense to me. It's a good discussion point.
Some rule is essential, we shouldn't tolerate them failed.
BTW, I think we also should make optimizer rule more stable instead of ignoring error.
IMHO, I think it's not good that tolerate error in optimizer. If a rule want to tolerate failed, I think it may be better handle inside itself.

alamb · 2022-12-14T20:22:00Z

I think the original "ignore optimizer rule if error" was added (maybe by @andygrove or @avantgardnerio ) because at that time there was no way to distinguish between "Optimizer didn't error but could not handle the query" and "optimizer hit a real error"

Now that Optimizer::try_optimize exists, which can distinguish between those two cases, I think removing OptimizerRile::optimize is 👍

alamb · 2022-12-14T20:28:51Z

Related #4619

HaoYang670 · 2022-12-15T02:20:36Z

Some logical optimizer rules like type coercion are actually essential for correct query execution

Some rule is essential, we shouldn't tolerate them failed

at that time there was no way to distinguish between "Optimizer didn't error but could not handle the query" and "optimizer hit a real error"

The type coercion and other rules for correctness should be moved in front of the optimizer.
It is the duty of the optimizer to make the query plan to achieve better performance, but not to correct the query.
In another word, the Datafusion should work even if we remove the optimizer.

It could be good if we could extract the type coercion from the optimizer and put it into a type checker for correcting the query. The pipeline could be very similar to a compiler:

SQL ---(parser)---> logical plan ---(type checker)---> verified logical plan ---(logical optimizer)---> optimized logical plan ...
          |                                 |
         \|/                               \|/
       Syntax error                     Type error

If we could do this, one different idea of the return type of the optimizer is that I think the optimizer should return an Option<LogicalPlan> instead of Result<Option<LogicalPlan>> or Result<LogicalPlan>. Because we should not allow an error (even if it is an Internal Error) returned from the optimizer, with the assumption that the input plan has been verified. The variant of the returned values should only be:

None: when the plan cannot be optimized.
Some(plan): return the optimized plan
panic!(..): there is a bug in the optimizer.

crepererum · 2022-12-15T08:59:53Z

On application of failing during an optimization may also be to detect malformed plans early. E.g. while working on #4370 I've figured that we could reject malformed regex patterns that are available during planning as literal early. I'm wondering if this should then be done by failing the optimization (and by fixing this very issue) or by inserting "terminal"/"always error" expressions into the plan that are later picked up by the physical planner.

alamb · 2022-12-15T19:18:38Z

The type coercion and other rules for correctness should be moved in front of the optimizer.
It is the duty of the optimizer to make the query plan to achieve better performance, but not to correct the query.
In another word, the Datafusion should work even if we remove the optimizer.

I agree this would be a good change

@liukun4515 made a similar observation / suggestion #3582

I'm wondering if this should then be done by failing the optimization (and by fixing this very issue) or by inserting "terminal"/"always error" expressions into the plan that are later picked up by the physical planner.

I recommend the inserting an "always error" expr node. The rationale being that you could have branches that are never actually hit in the query

A trivial example is like

CASE
  WHEN FALSE THEN 1 / 0 
  WHEN TRUE THEN 5.0
END

alamb · 2022-12-15T19:19:20Z

panic!(..): there is a bug in the optimizer.

I would personally rather have an InternalError rather than a panic

andygrove · 2022-12-16T00:02:48Z

original "ignore optimizer rule if error" was added (maybe by @andygrove or @avantgardnerio ) because at that time there was no way to distinguish between "Optimizer didn't error but could not handle the query" and "optimizer hit a real error"

That's correct. Now that we have the try_optimize method I support failing on errors and we will soon fix cases where rules incorrectly return errors.

HaoYang670 · 2022-12-21T05:50:41Z

I'm wondering if this should then be done by failing the optimization (and by fixing this very issue) or by inserting "terminal"/"always error" expressions into the plan that are later picked up by the physical planner.

My suggestion is to fail as early as possible if we have detected some errors in the logical plan.

I recommend the inserting an "always error" expr node. The rationale being that you could have branches that are never actually hit in the query

A trivial example is like
CASE
  WHEN FALSE THEN 1 / 0 
  WHEN TRUE THEN 5.0
END

Why do we cover up the error, instead of exposing them? Besides, we should not return a compile error at runtime, even if the branch won't be hit. No static typed language would do this (I guess SQL is not a dynamic typed language 🤔).

HaoYang670 · 2022-12-21T05:53:12Z

I tried to remove the config skip_failed_rules and met several test failures tracked in #4685.
Feel free to update the task-list if you find more and welcome to fix them if you have time!

alamb · 2022-12-21T20:25:39Z

Why do we cover up the error, instead of exposing them? Besides, we should not return a compile error at runtime, even if the branch won't be hit. No static typed language would do this (I guess SQL is not a dynamic typed language 🤔).

Well I think the rationale / question is if SQL expression semantics allow / require eager evaluation or if delayed evaluation

A more realistic example might be to use the CASE to explicitly guard against a divide by zero

CASE
  WHEN num <> 0 THEN value / num
  ELSE  0.0
END

So in that case evaluating value/num might cause a divide by zero if it were done prior to evaluating the WHEN guard 🤔

HaoYang670 · 2022-12-22T03:48:35Z

Well I think the rationale / question is if SQL expression semantics allow / require eager evaluation or if delayed evaluation

These are the variants for runtime behavior, not related to compile time checking I think.
I've done some tests on Postgres, and find the expression when FALSE then ... will be removed before type checking, so if you have the expr when FALSE then 1/0 in your query, it is always valid: https://onecompiler.com/postgresql/3yskvh35x

But if the invalid expression can not be removed at compile time, a Divide by zero error will always return before executing the query, even if the invalid code won't be touched at runtime: https://onecompiler.com/postgresql/3yskvqaxf

crepererum · 2023-01-02T08:13:30Z

The "when to check errors" is highly dependent on the database. I've worked with MS cloud SQL and Exasol before and in some cases, type errors (e.g. due to incompatible operators or functions that only accept a certain type) were only raised if at least a single row was passed throw it (post joins and where selections). These errors happened at query execution (not planning). From a user's perspective this was highly confusing. I recommend we take a look at the SQL standard and if it allows for it we bail out of invalid queries as early as possible, no matter if there's any data to query.

alamb · 2023-01-02T15:18:52Z

I believe that the SQL standard says it is "implementation defined":

6.3.3.3 Rule evaluation order

...

Where the precedence is not determined by the Formats or by parentheses, effective evaluation of expressions
is generally performed from left to right. However, it is implementation-dependent whether expressions are
actually evaluated left to right, particularly when operands or operators might cause conditions to be raised or
if the results of the expressions can be determined without completely evaluating all parts of the expression.

wolffcm · 2023-03-24T19:09:53Z

We recently ran into this when implementing gap-filling rule HandleGapFill for IOx which makes use of the logical planner:

it looks for a call to a UDF DATE_BIN_GAPFILL and if it exists it is removed (there is no actual implementation)
Inserts a GapFill node into the plan

https://github.com/influxdata/influxdb_iox/issues/7330

In this case the rule is really performing "lowering" (compiler jargon) in the sense that we need to get rid of this UDF for the query to be valid. Any error that occurs should be fatal for the query, so we can return an appropriate message to the user.

So I just want to voice that TypeCoercion isn't in a category by itself, in the sense that it needs to run and succeed for the query to be valid.

alamb · 2023-03-27T15:32:58Z

I updated the description with a workaround (to set datafusion.optimizer.skip_failed_rules to false).

alamb · 2023-03-27T16:49:51Z

FWIW @HaoYang670 's ticket #4685 describes the work needed to make sure all queries pass with skip_failed_rules set to false

alamb · 2023-04-10T10:43:04Z

I hope to look into this sometime over the next two weeks

alamb · 2023-05-09T14:30:37Z

@jackwener has a PR up to fix this ❤ #6265

crepererum added the bug Something isn't working label Dec 14, 2022

crepererum mentioned this issue Dec 14, 2022

add try_optimize() for all rules. #4599

Merged

HaoYang670 mentioned this issue Dec 20, 2022

Types are not checked when building case expression #4660

Closed

2 tasks

jonmmease mentioned this issue Jan 9, 2023

Numeric coercion failure for expressions passed to UDF #4864

Closed

jackwener mentioned this issue Jan 10, 2023

Incorrect results for join condition against current master branch #4844

Closed

This was referenced Feb 10, 2023

fix: correct expected error in test #5224

Merged

Move type_coercion to the front of logical optimizer #5235

Closed

jackwener mentioned this issue May 6, 2023

Disable skip_failed_rules optimizer config by default #6265

Merged

jackwener closed this as completed in #6265 May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't ignore failed optimizer rules #4615

Don't ignore failed optimizer rules #4615

crepererum commented Dec 14, 2022 •

edited by alamb

Loading

jackwener commented Dec 14, 2022 •

edited

Loading

alamb commented Dec 14, 2022

alamb commented Dec 14, 2022

HaoYang670 commented Dec 15, 2022 •

edited

Loading

crepererum commented Dec 15, 2022

alamb commented Dec 15, 2022

alamb commented Dec 15, 2022

andygrove commented Dec 16, 2022

HaoYang670 commented Dec 21, 2022 •

edited

Loading

HaoYang670 commented Dec 21, 2022

alamb commented Dec 21, 2022 •

edited

Loading

HaoYang670 commented Dec 22, 2022

crepererum commented Jan 2, 2023

alamb commented Jan 2, 2023

wolffcm commented Mar 24, 2023

alamb commented Mar 27, 2023

alamb commented Mar 27, 2023

alamb commented Apr 10, 2023

alamb commented May 9, 2023

Don't ignore failed optimizer rules #4615

Don't ignore failed optimizer rules #4615

Comments

crepererum commented Dec 14, 2022 • edited by alamb Loading

jackwener commented Dec 14, 2022 • edited Loading

alamb commented Dec 14, 2022

alamb commented Dec 14, 2022

HaoYang670 commented Dec 15, 2022 • edited Loading

crepererum commented Dec 15, 2022

alamb commented Dec 15, 2022

alamb commented Dec 15, 2022

andygrove commented Dec 16, 2022

HaoYang670 commented Dec 21, 2022 • edited Loading

HaoYang670 commented Dec 21, 2022

alamb commented Dec 21, 2022 • edited Loading

HaoYang670 commented Dec 22, 2022

crepererum commented Jan 2, 2023

alamb commented Jan 2, 2023

wolffcm commented Mar 24, 2023

alamb commented Mar 27, 2023

alamb commented Mar 27, 2023

alamb commented Apr 10, 2023

alamb commented May 9, 2023

crepererum commented Dec 14, 2022 •

edited by alamb

Loading

jackwener commented Dec 14, 2022 •

edited

Loading

HaoYang670 commented Dec 15, 2022 •

edited

Loading

HaoYang670 commented Dec 21, 2022 •

edited

Loading

alamb commented Dec 21, 2022 •

edited

Loading