-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add min, median and max columns to AccumProfileResults #522
Conversation
Signed-off-by: Niranjan Artal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nartal1
I have some few questions since I am not familiar with that part of logic.
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/CollectInformation.scala
Outdated
Show resolved
Hide resolved
None | ||
// If metricType is size, average or timing, we want to use the update value to get the | ||
// min, median, max, and total. Otherwise, we want to use the value. | ||
if (metric.metricType == SIZE_METRIC || metric.metricType == TIMING_METRIC || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define this as a boolean function? Jus to:
- be able to expand if we want to support other metrics types specific to platform.
- to be consistent to our way of handling
ignoredExpressions
and types of supported joines.
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/CollectInformation.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ClassWarehouse.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/CollectInformation.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/CollectInformation.scala
Show resolved
Hide resolved
Co-authored-by: Ahmed Hussein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nartal1
This fixes #521
This PR calculates and adds min, median and max columns to sql_plan_metrics_for_application.csv output file. These parameters helps in analyzing which Exec is taking more time or if the median data size of any particular task is large which could slow down the entire stage.
Sample Output: