-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21871][SQL] Fix infinite loop when bytecode size is larger than spark.sql.codegen.hugeMethodLimit #19440
Conversation
Test build #82485 has finished for PR 19440 at commit
|
s"`${SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key}`:\n$treeString") | ||
child match { | ||
// For batch file source scan, we should continue executing it | ||
case f: FileSourceScanExec if f.supportsBatch => // do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a little weird WholeStageCodegenExec
has specific error handling for each spark plan. Could we handle this error inside FileSourceScanExec
? For example, how about checking if parent.isInstanceOf[WholeStageCodegenExec]
in FileSourceScanExec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do it in FileSourceScanExec
, we are unable to know which causes the fallback. Now, we have at least two reasons that trigger the fallback.
Ideally, we should not call WholeStageCodegenExec
in doExecute
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I totally agree that we need to refactor this in future. Anyway, it's ok for now.
Thanks for pining! LGTM except for one comment. |
|
||
withSQLConf(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key -> "202", | ||
SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key -> "8000") { | ||
// donot return batch, because whole stage codegen is disabled for wide table (>202 columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment wrong or I misunderstand it? Looks like it returns batch as it asserts supportsBatch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is copied and pasted. will fix it.
return child.execute() | ||
s"`${SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key}`:\n$treeString") | ||
child match { | ||
// For batch file source scan, we should continue executing it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to explain why we should continue it. Otherwise later readers may not understand it immediately.
LGTM except two minor comments. |
Test build #82494 has finished for PR 19440 at commit
|
Thanks! Merged to master. |
@gatorsmile i have a question, should also be handled from other execs.? For example, like https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala#L306 and https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala#L111 |
…n spark.sql.codegen.hugeMethodLimit When exceeding `spark.sql.codegen.hugeMethodLimit`, the runtime fallbacks to the Volcano iterator solution. This could cause an infinite loop when `FileSourceScanExec` can use the columnar batch to read the data. This PR is to fix the issue. Added a test Author: gatorsmile <[email protected]> Closes apache#19440 from gatorsmile/testt.
What changes were proposed in this pull request?
When exceeding
spark.sql.codegen.hugeMethodLimit
, the runtime fallbacks to the Volcano iterator solution. This could cause an infinite loop whenFileSourceScanExec
can use the columnar batch to read the data. This PR is to fix the issue.How was this patch tested?
Added a test