Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30651][SQL] Add detailed information for Aggregate operators in EXPLAIN FORMATTED #27368

Closed
wants to merge 9 commits into from

Conversation

Eric5553
Copy link
Contributor

@Eric5553 Eric5553 commented Jan 27, 2020

What changes were proposed in this pull request?

Currently EXPLAIN FORMATTED only report input attributes of HashAggregate/ObjectHashAggregate/SortAggregate, while EXPLAIN EXTENDED provides more information of Keys, Functions, etc. This PR enhanced EXPLAIN FORMATTED to sync with original explain behavior.

Why are the changes needed?

The newly added EXPLAIN FORMATTED got less information comparing to the original EXPLAIN EXTENDED

Does this PR introduce any user-facing change?

Yes, taking HashAggregate explain result as example.

SQL

EXPLAIN FORMATTED
  SELECT
    COUNT(val) + SUM(key) as TOTAL,
    COUNT(key) FILTER (WHERE val > 1)
  FROM explain_temp1;

EXPLAIN EXTENDED

== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(val#6), sum(cast(key#5 as bigint)), count(key#5)], output=[TOTAL#62L, count(key) FILTER (WHERE (val > 1))#71L])
+- Exchange SinglePartition, true, [id=#89]
   +- HashAggregate(keys=[], functions=[partial_count(val#6), partial_sum(cast(key#5 as bigint)), partial_count(key#5) FILTER (WHERE (val#6 > 1))], output=[count#75L, sum#76L, count#77L])
      +- *(1) ColumnarToRow
         +- FileScan parquet default.explain_temp1[key#5,val#6] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/XXX/spark-dev/spark/spark-warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,val:int>

EXPLAIN FORMATTED - BEFORE

== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
...
...

EXPLAIN FORMATTED - AFTER

== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
Keys: []
Functions: [count(val#6), sum(cast(key#5 as bigint)), count(key#5)]
Results: [(count(val#6)#84L + sum(cast(key#5 as bigint))#85L) AS TOTAL#78L, count(key#5)#86L AS count(key) FILTER (WHERE (val > 1))#87L]
Output: [TOTAL#78L, count(key) FILTER (WHERE (val > 1))#87L]
...
...

How was this patch tested?

Three tests added in explain.sql for HashAggregate/ObjectHashAggregate/SortAggregate.

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Jan 27, 2020

@Eric5553 Thanks for working on this . Looks good to me. cc @cloud-fan

@dilipbiswal
Copy link
Contributor

@Eric5553 Since the implementation is same for variations of aggregate operator, i was wondering if it makes sense to have a base class where we put these common code ? what do you think ?

@maropu
Copy link
Member

maropu commented Jan 28, 2020

ok to test

@Eric5553
Copy link
Contributor Author

Eric5553 commented Jan 28, 2020

@Eric5553 Since the implementation is same for variations of aggregate operator, i was wondering if it makes sense to have a base class where we put these common code ? what do you think ?

@dilipbiswal Thanks so much for review! Yeah, this is a concern when I implement for the three aggregate operators. The groupingExpressions(shown as 'Keys') and aggregateExpressions(shown as 'Functions') are defined in each aggregate operator but not in common super class. So I think we cannot abstract the verboseStringWithOperatorId logic here until we abstract these aggregate attributes.

I think the visitor pattern proposed in the discussion of your initial PR would provide more flexibility. By then, we could separate input/output as a common rule for example.

I can give a try if you got any suggestion on this concern, thanks!

@maropu
Copy link
Member

maropu commented Jan 28, 2020

You cannot do it like this?


abstract class XXXX extends UnaryExecNode  {
  def groupingExpressions: Seq[NamedExpression]
  def aggregateExpressions: Seq[AggregateExpression]
  ...
}

case class HashAggregateExec(
    requiredChildDistributionExpressions: Option[Seq[Expression]],
    groupingExpressions: Seq[NamedExpression],
    aggregateExpressions: Seq[AggregateExpression],
    ...)
  extends XXXX with BlockingOperatorWithCodegen with AliasAwareOutputPartitioning {
  ...

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117455 has finished for PR 27368 at commit 44a84d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

@maropu Sure, I'll try with it. Thanks!

@Eric5553
Copy link
Contributor Author

@maropu @dilipbiswal I've abstracted the EXPLAIN FORMATTED logic in c5946a3c1c41341a88df2101bbfe44385d3f5c37, please help review. And do we need to filter more common logic for the three aggregate operators? Thanks!

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117468 has finished for PR 27368 at commit c5946a3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class AggregateExec(

@Eric5553
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117477 has finished for PR 27368 at commit 9fabb05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class AggregateExec(

@dilipbiswal
Copy link
Contributor

And do we need to filter more common logic for the three aggregate operators

I think it's a good idea. What do you think @maropu

About the changes, in the cases where Keys or Functions are empty, does it make sense
to not print them ? cc @maropu @cloud-fan for their opinion.

@Eric5553
Copy link
Contributor Author

Eric5553 commented Feb 4, 2020

About the changes, in the cases where Keys or Functions are empty, does it make sense
to not print them ? cc @maropu @cloud-fan for their opinion.

I think we can keep empty Keys or Functions printed, which means the node has no Keys or Functions. Otherwise we don't know if the explain message means no Keys/Functions or we missed the details for them.
What do you think? @dilipbiswal @maropu @cloud-fan

@cloud-fan
Copy link
Contributor

makes sense to me

@SparkQA
Copy link

SparkQA commented Feb 4, 2020

Test build #117837 has finished for PR 27368 at commit 9c4fc24.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 5, 2020

I think we can keep empty Keys or Functions printed, which means the node has no Keys or Functions. Otherwise we don't know if the explain message means no Keys/Functions or we missed the details for them.
What do you think? @dilipbiswal @maropu @cloud-fan

+1, too.

And do we need to filter more common logic for the three aggregate operators
I think it's a good idea. What do you think @maropu

Looks fine to me, but can you address it in follow-up?

@maropu
Copy link
Member

maropu commented Feb 5, 2020

I left some minor comments and the other parts looks fine to me.

@Eric5553
Copy link
Contributor Author

Eric5553 commented Feb 5, 2020

I left some minor comments and the other parts looks fine to me.

I've addressed them in ec029df372bccf11ece3349b35cfa87232886505. Thanks so much for the review!

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117900 has finished for PR 27368 at commit ec029df.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class BaseAggregateExec extends UnaryExecNode

@Eric5553
Copy link
Contributor Author

Eric5553 commented Feb 5, 2020

retest this please

1 similar comment
@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117925 has finished for PR 27368 at commit ec029df.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class BaseAggregateExec extends UnaryExecNode

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117939 has finished for PR 27368 at commit cd3b444.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class BaseAggregateExec extends UnaryExecNode

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118068 has finished for PR 27368 at commit 5b91b19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Also cc @maryannxue @hvanhovell

val groupingExpressions: Seq[NamedExpression]
val aggregateExpressions: Seq[AggregateExpression]
val aggregateAttributes: Seq[Attribute]
val resultExpressions: Seq[NamedExpression]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be def, then we don't need to add override val in the aggregate classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan @HyukjinKwon Thanks for review, updated to def in dd0988a.

/**
* Holds common logic for aggregate operators
*/
abstract class BaseAggregateExec extends UnaryExecNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make it trait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, changed to trait to make it consistent with other operators, e.g. HashJoin BaseLimitExec.

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118154 has finished for PR 27368 at commit dd0988a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait BaseAggregateExec extends UnaryExecNode

@Eric5553
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118171 has finished for PR 27368 at commit dd0988a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait BaseAggregateExec extends UnaryExecNode

@cloud-fan
Copy link
Contributor

EXPLAIN FORMATTED is a new feature in 3.0 and this is a followup, so I'm merging to 3.0 as well.

Thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 5919bd3 Feb 12, 2020
cloud-fan pushed a commit that referenced this pull request Feb 12, 2020
…n EXPLAIN FORMATTED

### What changes were proposed in this pull request?
Currently `EXPLAIN FORMATTED` only report input attributes of HashAggregate/ObjectHashAggregate/SortAggregate, while `EXPLAIN EXTENDED` provides more information of Keys, Functions, etc. This PR enhanced `EXPLAIN FORMATTED` to sync with original explain behavior.

### Why are the changes needed?
The newly added `EXPLAIN FORMATTED` got less information comparing to the original `EXPLAIN EXTENDED`

### Does this PR introduce any user-facing change?
Yes, taking HashAggregate explain result as example.

**SQL**
```
EXPLAIN FORMATTED
  SELECT
    COUNT(val) + SUM(key) as TOTAL,
    COUNT(key) FILTER (WHERE val > 1)
  FROM explain_temp1;
```

**EXPLAIN EXTENDED**
```
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(val#6), sum(cast(key#5 as bigint)), count(key#5)], output=[TOTAL#62L, count(key) FILTER (WHERE (val > 1))#71L])
+- Exchange SinglePartition, true, [id=#89]
   +- HashAggregate(keys=[], functions=[partial_count(val#6), partial_sum(cast(key#5 as bigint)), partial_count(key#5) FILTER (WHERE (val#6 > 1))], output=[count#75L, sum#76L, count#77L])
      +- *(1) ColumnarToRow
         +- FileScan parquet default.explain_temp1[key#5,val#6] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/XXX/spark-dev/spark/spark-warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,val:int>
```

**EXPLAIN FORMATTED - BEFORE**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
...
...
```

**EXPLAIN FORMATTED - AFTER**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
Keys: []
Functions: [count(val#6), sum(cast(key#5 as bigint)), count(key#5)]
Results: [(count(val#6)#84L + sum(cast(key#5 as bigint))#85L) AS TOTAL#78L, count(key#5)#86L AS count(key) FILTER (WHERE (val > 1))#87L]
Output: [TOTAL#78L, count(key) FILTER (WHERE (val > 1))#87L]
...
...
```

### How was this patch tested?
Three tests added in explain.sql for HashAggregate/ObjectHashAggregate/SortAggregate.

Closes #27368 from Eric5553/ExplainFormattedAgg.

Authored-by: Eric Wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 5919bd3)
Signed-off-by: Wenchen Fan <[email protected]>
@Eric5553
Copy link
Contributor Author

@gatorsmile @cloud-fan @dilipbiswal @maropu @HyukjinKwon , thanks so much for your help!

cloud-fan pushed a commit that referenced this pull request Feb 21, 2020
### What changes were proposed in this pull request?
The style of `EXPLAIN FORMATTED` output needs to be improved. We’ve already got some observations/ideas in
#27368 (comment)
#27368 (comment)

Observations/Ideas:
1. Using comma as the separator is not clear, especially commas are used inside the expressions too.
2. Show the column counts first? For example, `Results [4]: …`
3. Currently the attribute names are automatically generated, this need to refined.
4. Add arguments field in common implementations as `EXPLAIN EXTENDED` did by calling `argString` in `TreeNode.simpleString`. This will eliminate most existing minor differences between
`EXPLAIN EXTENDED` and `EXPLAIN FORMATTED`.
5. Another improvement we can do is: the generated alias shouldn't include attribute id. collect_set(val, 0, 0)#123 looks clearer than collect_set(val#456, 0, 0)#123

This PR is currently addressing comments 2 & 4, and open for more discussions on improving readability.

### Why are the changes needed?
The readability of `EXPLAIN FORMATTED` need to be improved, which will help user better understand the query plan.

### Does this PR introduce any user-facing change?
Yes, `EXPLAIN FORMATTED` output style changed.

### How was this patch tested?
Update expect results of test cases in explain.sql

Closes #27509 from Eric5553/ExplainFormattedRefine.

Authored-by: Eric Wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Feb 21, 2020
### What changes were proposed in this pull request?
The style of `EXPLAIN FORMATTED` output needs to be improved. We’ve already got some observations/ideas in
#27368 (comment)
#27368 (comment)

Observations/Ideas:
1. Using comma as the separator is not clear, especially commas are used inside the expressions too.
2. Show the column counts first? For example, `Results [4]: …`
3. Currently the attribute names are automatically generated, this need to refined.
4. Add arguments field in common implementations as `EXPLAIN EXTENDED` did by calling `argString` in `TreeNode.simpleString`. This will eliminate most existing minor differences between
`EXPLAIN EXTENDED` and `EXPLAIN FORMATTED`.
5. Another improvement we can do is: the generated alias shouldn't include attribute id. collect_set(val, 0, 0)#123 looks clearer than collect_set(val#456, 0, 0)#123

This PR is currently addressing comments 2 & 4, and open for more discussions on improving readability.

### Why are the changes needed?
The readability of `EXPLAIN FORMATTED` need to be improved, which will help user better understand the query plan.

### Does this PR introduce any user-facing change?
Yes, `EXPLAIN FORMATTED` output style changed.

### How was this patch tested?
Update expect results of test cases in explain.sql

Closes #27509 from Eric5553/ExplainFormattedRefine.

Authored-by: Eric Wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 1f0300f)
Signed-off-by: Wenchen Fan <[email protected]>
@Eric5553 Eric5553 deleted the ExplainFormattedAgg branch March 13, 2020 06:50
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…n EXPLAIN FORMATTED

### What changes were proposed in this pull request?
Currently `EXPLAIN FORMATTED` only report input attributes of HashAggregate/ObjectHashAggregate/SortAggregate, while `EXPLAIN EXTENDED` provides more information of Keys, Functions, etc. This PR enhanced `EXPLAIN FORMATTED` to sync with original explain behavior.

### Why are the changes needed?
The newly added `EXPLAIN FORMATTED` got less information comparing to the original `EXPLAIN EXTENDED`

### Does this PR introduce any user-facing change?
Yes, taking HashAggregate explain result as example.

**SQL**
```
EXPLAIN FORMATTED
  SELECT
    COUNT(val) + SUM(key) as TOTAL,
    COUNT(key) FILTER (WHERE val > 1)
  FROM explain_temp1;
```

**EXPLAIN EXTENDED**
```
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(val#6), sum(cast(key#5 as bigint)), count(key#5)], output=[TOTAL#62L, count(key) FILTER (WHERE (val > 1))#71L])
+- Exchange SinglePartition, true, [id=apache#89]
   +- HashAggregate(keys=[], functions=[partial_count(val#6), partial_sum(cast(key#5 as bigint)), partial_count(key#5) FILTER (WHERE (val#6 > 1))], output=[count#75L, sum#76L, count#77L])
      +- *(1) ColumnarToRow
         +- FileScan parquet default.explain_temp1[key#5,val#6] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/XXX/spark-dev/spark/spark-warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,val:int>
```

**EXPLAIN FORMATTED - BEFORE**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
...
...
```

**EXPLAIN FORMATTED - AFTER**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
Keys: []
Functions: [count(val#6), sum(cast(key#5 as bigint)), count(key#5)]
Results: [(count(val#6)#84L + sum(cast(key#5 as bigint))#85L) AS TOTAL#78L, count(key#5)#86L AS count(key) FILTER (WHERE (val > 1))#87L]
Output: [TOTAL#78L, count(key) FILTER (WHERE (val > 1))#87L]
...
...
```

### How was this patch tested?
Three tests added in explain.sql for HashAggregate/ObjectHashAggregate/SortAggregate.

Closes apache#27368 from Eric5553/ExplainFormattedAgg.

Authored-by: Eric Wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
The style of `EXPLAIN FORMATTED` output needs to be improved. We’ve already got some observations/ideas in
apache#27368 (comment)
apache#27368 (comment)

Observations/Ideas:
1. Using comma as the separator is not clear, especially commas are used inside the expressions too.
2. Show the column counts first? For example, `Results [4]: …`
3. Currently the attribute names are automatically generated, this need to refined.
4. Add arguments field in common implementations as `EXPLAIN EXTENDED` did by calling `argString` in `TreeNode.simpleString`. This will eliminate most existing minor differences between
`EXPLAIN EXTENDED` and `EXPLAIN FORMATTED`.
5. Another improvement we can do is: the generated alias shouldn't include attribute id. collect_set(val, 0, 0)apache#123 looks clearer than collect_set(val#456, 0, 0)apache#123

This PR is currently addressing comments 2 & 4, and open for more discussions on improving readability.

### Why are the changes needed?
The readability of `EXPLAIN FORMATTED` need to be improved, which will help user better understand the query plan.

### Does this PR introduce any user-facing change?
Yes, `EXPLAIN FORMATTED` output style changed.

### How was this patch tested?
Update expect results of test cases in explain.sql

Closes apache#27509 from Eric5553/ExplainFormattedRefine.

Authored-by: Eric Wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Nnicolini pushed a commit to palantir/spark that referenced this pull request Jun 11, 2020
…n EXPLAIN FORMATTED

Currently `EXPLAIN FORMATTED` only report input attributes of HashAggregate/ObjectHashAggregate/SortAggregate, while `EXPLAIN EXTENDED` provides more information of Keys, Functions, etc. This PR enhanced `EXPLAIN FORMATTED` to sync with original explain behavior.

The newly added `EXPLAIN FORMATTED` got less information comparing to the original `EXPLAIN EXTENDED`

Yes, taking HashAggregate explain result as example.

**SQL**
```
EXPLAIN FORMATTED
  SELECT
    COUNT(val) + SUM(key) as TOTAL,
    COUNT(key) FILTER (WHERE val > 1)
  FROM explain_temp1;
```

**EXPLAIN EXTENDED**
```
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(val#6), sum(cast(key#5 as bigint)), count(key#5)], output=[TOTAL#62L, count(key) FILTER (WHERE (val > 1))#71L])
+- Exchange SinglePartition, true, [id=#89]
   +- HashAggregate(keys=[], functions=[partial_count(val#6), partial_sum(cast(key#5 as bigint)), partial_count(key#5) FILTER (WHERE (val#6 > 1))], output=[count#75L, sum#76L, count#77L])
      +- *(1) ColumnarToRow
         +- FileScan parquet default.explain_temp1[key#5,val#6] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/XXX/spark-dev/spark/spark-warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<key:int,val:int>
```

**EXPLAIN FORMATTED - BEFORE**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
...
...
```

**EXPLAIN FORMATTED - AFTER**
```
== Physical Plan ==
* HashAggregate (5)
+- Exchange (4)
   +- HashAggregate (3)
      +- * ColumnarToRow (2)
         +- Scan parquet default.explain_temp1 (1)

...
...
(5) HashAggregate [codegen id : 2]
Input: [count#91L, sum#92L, count#93L]
Keys: []
Functions: [count(val#6), sum(cast(key#5 as bigint)), count(key#5)]
Results: [(count(val#6)#84L + sum(cast(key#5 as bigint))#85L) AS TOTAL#78L, count(key#5)#86L AS count(key) FILTER (WHERE (val > 1))#87L]
Output: [TOTAL#78L, count(key) FILTER (WHERE (val > 1))#87L]
...
...
```

Three tests added in explain.sql for HashAggregate/ObjectHashAggregate/SortAggregate.

Closes apache#27368 from Eric5553/ExplainFormattedAgg.

Authored-by: Eric Wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants