-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve AQE support by capturing SQLPlan versions #1354
Conversation
Signed-off-by: Ahmed Hussein <[email protected]>
Signed-off-by: Ahmed Hussein <[email protected]>
@wjxiz1992 and @leewyang |
Tested with eventlogs that previously failed to generate |
Signed-off-by: Ahmed Hussein <[email protected]>
Signed-off-by: Ahmed Hussein <[email protected]>
Thanks @leewyang ! |
core/src/main/scala/com/nvidia/spark/rapids/tool/analysis/AppSQLPlanAnalyzer.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanModel.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/store/SQLPlanModel.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Ahmed Hussein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amahussein for adding this feature. LGTM!
Contributes to #1172, Fixes #1351
This issue is to change the tools structure to support multiple version of SqlPlan.
Before that PR, if AQE is enabled, only the last plan is kept in the AppBase.SqlPlans map.
For sake of memory optimization, this PR adds the full implementation that supports capturing multiple versions but only caches the DSInformation extracted from old plans.
This allows the tools to generate the metadata of ReadV1 from original plans in case the metadata has been truncated in the finalPlan.
Change in output
data_source_information.csv
sql_plan_version
: an integer that represents the version number of the SqlPlaninfo where this row comes from.from_final_plan
: Boolean True/False to indicate whether this row comes from final plan or not.Sample output file after the change:
data_source_information_after_change.csv
Headers of the new file:
Code Changes:
AppBase.sqlPlans
, replacing it withSqlManager
classSqlPlanModel
that keeps track of properties related to SQLplan and the versions.SQLPlanModelWithDSCaching
fromSqlPlanModel
will not keep track of all previous PlanInfo. Instead, it only caches the DataSourceRecord if any.Future Work:
SQLPlanModel