Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support EXPLAIN ANALYZE <QUERY> clause #608

Closed
waynexia opened this issue Nov 21, 2022 · 5 comments
Closed

support EXPLAIN ANALYZE <QUERY> clause #608

waynexia opened this issue Nov 21, 2022 · 5 comments
Assignees
Labels
C-feature Category Features
Milestone

Comments

@waynexia
Copy link
Member

waynexia commented Nov 21, 2022

What problem does the new feature solve?

DataFusion supports EXPLAIN ANALYZE <QUERY> clause, which can report the execution details like time spent, data processed or memory occupied. The follow is an example I got by EXPLAIN ANALYZE *tpc-h-q1*:

+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: [l_returnflag@0 ASC NULLS LAST,l_linestatus@1 ASC NULLS LAST], metrics=[output_rows=4, elapsed_compute=6.502µs, spill_count=0, spilled_bytes=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|                   |   CoalescePartitionsExec, metrics=[output_rows=4, elapsed_compute=4.666µs, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                   |     ProjectionExec: expr=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus, SUM(lineitem.l_quantity)@2 as sum_qty, SUM(lineitem.l_extendedprice)@3 as sum_base_price, SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)@4 as sum_disc_price, SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax)@5 as sum_charge, AVG(lineitem.l_quantity)@6 as avg_qty, AVG(lineitem.l_extendedprice)@7 as avg_price, AVG(lineitem.l_discount)@8 as avg_disc, COUNT(UInt8(1))@9 as count_order], metrics=[output_rows=4, elapsed_compute=791ns, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                          |
|                   |       AggregateExec: mode=FinalPartitioned, gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus], aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice), SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice), AVG(lineitem.l_discount), COUNT(UInt8(1))], metrics=[output_rows=4, elapsed_compute=29.541µs, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|                   |         CoalesceBatchesExec: target_batch_size=4096, metrics=[output_rows=8, elapsed_compute=12.415µs, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                   |           RepartitionExec: partitioning=Hash([Column { name: "l_returnflag", index: 0 }, Column { name: "l_linestatus", index: 1 }], 2), metrics=[fetch_time=4.652945874s, repart_time=9.54µs, send_time=832ns]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|                   |             AggregateExec: mode=Partial, gby=[l_returnflag@5 as l_returnflag, l_linestatus@6 as l_linestatus], aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice), SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice), AVG(lineitem.l_discount), COUNT(UInt8(1))], metrics=[output_rows=8, elapsed_compute=424.034181ms, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                   |               ProjectionExec: expr=[CAST(l_extendedprice@1 AS Decimal128(38, 4)) * CAST(Some(100),23,2 - CAST(l_discount@2 AS Decimal128(23, 2)) AS Decimal128(38, 4)) as CAST(lineitem.l_extendedprice AS Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(lineitem.l_discount AS Decimal128(23, 2)) AS Decimal128(38, 4))CAST(Decimal128(Some(100),23,2) - CAST(lineitem.l_discount AS Decimal128(23, 2)) AS Decimal128(38, 4))Decimal128(Some(100),23,2) - CAST(lineitem.l_discount AS Decimal128(23, 2))CAST(lineitem.l_discount AS Decimal128(23, 2))lineitem.l_discountDecimal128(Some(100),23,2)CAST(lineitem.l_extendedprice AS Decimal128(38, 4))lineitem.l_extendedprice, l_quantity@0 as l_quantity, l_extendedprice@1 as l_extendedprice, l_discount@2 as l_discount, l_tax@3 as l_tax, l_returnflag@4 as l_returnflag, l_linestatus@5 as l_linestatus], metrics=[output_rows=5916591, elapsed_compute=135.965413ms, spill_count=0, spilled_bytes=0, mem_used=0] |
|                   |                 CoalesceBatchesExec: target_batch_size=4096, metrics=[output_rows=5916591, elapsed_compute=48.487374ms, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|                   |                   FilterExec: l_shipdate@6 <= 10471, metrics=[output_rows=5916591, elapsed_compute=47.528668ms, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                   |                     RepartitionExec: partitioning=RoundRobinBatch(2), metrics=[send_time=1.895954ms, fetch_time=2.323869047s, repart_time=1ns]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|                   |                       CsvExec: files=[Users/ruihang/repo/arrow-datafusion/benchmarks/data/lineitem.tbl], has_header=false, limit=None, projection=[l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate], metrics=[]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

What does the feature do?

Implement the necessary functionalities, and try to add more features on DF (like support memory consumption calculating). Document of PostgreSQL

Implementation challenges

No response

@waynexia waynexia added the C-feature Category Features label Nov 21, 2022
@xtang xtang mentioned this issue Nov 30, 2022
24 tasks
@xtang xtang added this to the Release v0.1 milestone Nov 30, 2022
@francis-du
Copy link
Contributor

Hi,do you have any other details about this issue? I support the explain <Query> syntax and am very interested in this Issue. Do we need to use DataFusion or implement a new statistical analysis model.

@waynexia
Copy link
Member Author

DataFusion's EXPLAIN ANALYZE is brilliant. For the current stage we can just add support to invoke this functionality provided by DataFusion.

@francis-du
Copy link
Contributor

DataFusion's EXPLAIN ANALYZE is brilliant. For the current stage we can just add support to invoke this functionality provided by DataFusion.

OK, thank you. I got it. Can you assign this issue to me?

@francis-du
Copy link
Contributor

francis-du commented Jan 5, 2023

I think we already support EXPLAIN ANALYZE <QUERY> syntax in #546.
image

@waynexia
Copy link
Member Author

waynexia commented Jan 5, 2023

Verified... Sorry I didn't notice that 🫣

I'll close this issue as completed. Thanks (again) @francis-du

@waynexia waynexia closed this as completed Jan 5, 2023
@fengjiachun fengjiachun modified the milestones: v0.3, v0.1 Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category Features
Projects
None yet
Development

No branches or pull requests

4 participants