Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min, median and max columns to AccumProfileResults #522

Merged
merged 4 commits into from
Aug 31, 2023

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Aug 29, 2023

This fixes #521

This PR calculates and adds min, median and max columns to sql_plan_metrics_for_application.csv output file. These parameters helps in analyzing which Exec is taking more time or if the median data size of any particular task is large which could slow down the entire stage.

Sample Output:

appIndex,sqlID,nodeID,nodeName,accumulatorId,name,min,median,max,total,metricType,stageIds
1,7,0,"Execute InsertIntoHadoopFsRelationCommand",0,"number of written files",0,0,0,2,"sum","1"
1,7,0,"Execute InsertIntoHadoopFsRelationCommand",1,"written output",0,0,0,751,"size","1"
1,7,1,"Exchange",6,"data size",0,16,16,16,"size","0,1"
1,7,1,"Exchange",7,"number of partitions",0,0,0,320,"sum","0,1"
1,15,5,"HashAggregate",335,"peak memory",0,262144,83886080,113802739712,"size","10"
1,15,5,"HashAggregate",337,"time in aggregation build",0,23612,51375,125070838,"timing","10"
1,15,7,"WholeStageCodegen (3)",340,"duration",0,23618,52598,125466719,"timing","10"
1,15,8,"BroadcastHashJoin",341,"number of output rows",0,0,0,72483658,"sum","10"
1,15,10,"Filter",342,"number of output rows",0,0,0,59007195589,"sum","10"
1,15,11,"ColumnarToRow",343,"number of output rows",0,0,0,79911857034,"sum","10"
1,15,11,"ColumnarToRow",344,"number of input batches",0,0,0,19512293,"sum","10"
1,15,3,"Exchange",332,"shuffle write time",0,1164323,804158883,5600970399,"nsTiming","10,11"

@nartal1 nartal1 added the core_tools Scope the core module (scala) label Aug 29, 2023
@nartal1 nartal1 self-assigned this Aug 29, 2023
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1
I have some few questions since I am not familiar with that part of logic.

None
// If metricType is size, average or timing, we want to use the update value to get the
// min, median, max, and total. Otherwise, we want to use the value.
if (metric.metricType == SIZE_METRIC || metric.metricType == TIMING_METRIC ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define this as a boolean function? Jus to:

  • be able to expand if we want to support other metrics types specific to platform.
  • to be consistent to our way of handling ignoredExpressions and types of supported joines.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1

@nartal1 nartal1 merged commit 48ccfec into NVIDIA:dev Aug 31, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profiling tool: Add min, med and max metrics to AccumProfileResults
2 participants