Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update benchmarks on clickbench #5276

Closed
dudzicp opened this issue Feb 14, 2023 · 38 comments
Closed

Update benchmarks on clickbench #5276

dudzicp opened this issue Feb 14, 2023 · 38 comments
Labels
enhancement New feature or request

Comments

@dudzicp
Copy link

dudzicp commented Feb 14, 2023

In the clickbench Datafusion is currently ~5x slower than DuckDB on comparable machine. I think it would make sense to update the benchmark results with the latest improvements so that datafusion will be a viable/reasonable option for new projects

@dudzicp dudzicp added the enhancement New feature or request label Feb 14, 2023
@ozankabak
Copy link
Contributor

I think this is something we should do as soon as possible. We should not needlessly create a false impression of "slow" in the eyes of the community at large. Why try to correct it later when we can avoid it?

Our benchmark was not run on a comparable machine, and it is a very old version. I suggest we run it on c6a.metal, 500gb gp2 (that's the machine the majority is gravitating towards) with our current version. After that, we can identify which queries we are slow at, investigate the reasons and improve.

We (the Synnada team) is happy to help with investigations and take on improvement tasks.

@alamb
Copy link
Contributor

alamb commented Feb 14, 2023

I agree this would be great to do. @waitingkuo and @andygrove have worked on this in the past and perhaps they have thoughts.

@andygrove
Copy link
Member

I wasn't really too involved with this. I have been focussing more on TPC-H and TPC-DS, and I also see DuckDB performing about 5x faster than DataFusion. I have not spent time determining why this is the case yet.

The latest results are here. This is yet another half-finished project of mine, so I apologize for the lack of configuration info. I am mostly running with default configs.

https://sqlbenchmarks.io/sqlbench-h/results/env/workstation/sf10/single_node/

@waitingkuo
Copy link
Contributor

waitingkuo commented Feb 15, 2023

the codes to reproduce the benchmark are here https://github.com/ClickHouse/ClickBench/tree/main/datafusion
The way to update the data in the clickbench website is to update the json file in the results folder and send a PR https://github.com/ClickHouse/ClickBench/tree/main/datafusion/results

The current result listed in the website was benchmarked by datafusion v11, which was release around 6 months ago.

to imporve:

  • 1. use datafusion v18 to rerun the benchmark codes and send the PR. t

  • 2. at the time I wrote the benchmark codes, datafusion didnt support some features. e.g. it didn't support schema from parquet, so for some data type like timestamp, we need to load it as string and then cast it to timestamp explicitly. e.g.

this is the original sql queries

SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM ...

https://github.com/ClickHouse/ClickBench/blob/main/duckdb-parquet/queries.sql#L41

this is the modified version for datafusion

SELECT "URLHash", "EventDate"::INT::DATE, COUNT(*) AS PageViews FROM

https://github.com/ClickHouse/ClickBench/blob/main/datafusion/queries.sql#L41

I did modify some quries so that it works in datafusion. To fix this, we need to verify whether the new datafusion work or not. If so we could update the quries. If not, we could fire the issue to improve

I'll do the first option soon (update to v18), it should be a quick improvement.
It'll take more time to do the second approach. Welcome for the contribution. I'll get back to this if there's no one work on this.

@ozankabak
Copy link
Contributor

@waitingkuo, if you create the issues that identify what is required to run the queries properly, we can help with addressing them.

@waitingkuo
Copy link
Contributor

@ozankabak thank you, i'll submit another ticket to do so.

I just submit the pr to update it to v18
ClickHouse/ClickBench#77

here's the summary:

1 of the query get 20 times boost
1 of the query remain 0.1 times performance
1 of the query still not working

except from these 3 outliers, we have 20% improvement in average

the histogram chart
x axis means: old execution time / new execution time the larger the more improvement, 1.0 means no change
image

here's the result query by query

Query 0: 0.9776802049030369
Query 1: 1.707491082045184
Query 2: 1.2640586797066016
Query 3: 0.9203807875378623
Query 4: 0.9241755388210855
Query 5: 0.94172723106135
Query 6: 1.649497487437186
Query 7: 1.8705738705738708
Query 8: 0.9532640949554897
Query 9: 0.09891081294396213
Query 10: 0.9790727043117446
Query 11: 1.0668953687821612
Query 12: 1.0256627922162727
Query 13: 0.9520421488719772
Query 14: 1.0535746389404925
Query 15: 1.1839474435875463
Query 16: 1.2163045644807655
Query 17: 1.0196377825847225
Query 18: 1.2491737143968116
Query 19: 0.9741868869385649
Query 20: 1.2916241203256522
Query 21: 0.9167798624671026
Query 22: 0.9743192470077857
Query 23: 1.0809550243337944
Query 24: 1.0715070085911764
Query 25: 1.868698910081744
Query 26: 1.8450380677343134
Query 27: 0.8860306599462013
Query 28: 18.859884645982497
Query 29: 2.9667709147771695
Query 30: 1.0043050430504306
Query 31: 0.9922544851934578
Query 32: doesn't work
Query 33: 1.2395144767868875
Query 34: 1.2316816529558063
Query 35: 1.1153739994479712
Query 36: 0.9655502392344499
Query 37: 0.9329896907216495
Query 38: 0.979089790897909
Query 39: 1.0419161676646707
Query 40: 0.9942028985507246
Query 41: 1.028301886792453
Query 42: 1.039426523297491

outlier queries:

Query 9: 0.09891081294396213 -> 10 times worse

SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;

i have no clue for now, perhaps due to DISTINCT
@alamb @tustvold @comphead do you have any idea for this?

Query 28: 18.859884645982497 -> almost 20 times better

SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

i think this is benifit from the improvement of regexp_replace

Query 32: query doesn't work

SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;

when i run it, it's killed (same as the previous version)

willy@willybench:~/ClickBench/datafusion$ datafusion-cli -f create.sql q33.sql 
DataFusion CLI v18.0.0
0 rows in set. Query took 0.039 seconds.
Killed

@dudzicp
Copy link
Author

dudzicp commented Feb 15, 2023

With results from @andygrove being consistent with clickbench (~5x difference compared to duckdb), I think it make more sense to focus more on profiling to narrow the gap, rather than re running the benchmarks. Also when running the benchmarks, make sure you enable important config options mentioned by @korowa: #5142 (comment)

@comphead
Copy link
Contributor

Query 9: 0.09891081294396213 -> 10 times worse

SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;
i have no clue for now, perhaps due to DISTINCT
@alamb @tustvold @comphead do you have any idea for this?

@waitingkuo I can try to experiment if this is due to distinct.

@ozankabak
Copy link
Contributor

@waitingkuo, I have a few quick questions: (1) Which machine did your new results run on, is it the same machine? (2) Which features are missing to run the queries as is?

@waitingkuo
Copy link
Contributor

@ozankabak i used azure's Standard_F16s_v2 vm to do so which has 16 vcpu and 32GB memory
https://learn.microsoft.com/en-us/azure/virtual-machines/fsv2-series

i didnt looks into the issue that the process killed yet. my quick guess is that it runs out of memory

@waitingkuo
Copy link
Contributor

@comphead thank you
the quickest way is pull clickbench github repo and go tho this folder
https://github.com/ClickHouse/ClickBench/tree/main/datafusion

and then

# Download benchmark target data
wget --continue https://datasets.clickhouse.com/hits_compatible/hits.parquet
# launch datafusion-cli
datafusion-cli
DataFusion CLI v18.0.0
> CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'hits.parquet';
> SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;

to run the full benchmark, you could do bash benchmark.sh which contains

  1. install latest datafusion
  2. download benchmark dataset
  3. execute 3 times per query
    it take around 20-30 minutes for me for a full benchmark generation

@jhorstmann
Copy link
Contributor

AFAIK one big difference regarding performance is that most other engines in the clickbench suite first load the data into memory or some other internal format, while datafusion scans directly from the parquet files. Extending the benchmark runner to support both use cases and report separate results would be very useful. The tcph benchmark runner for example has such a feature

@waitingkuo
Copy link
Contributor

@jhorstmann thank you for pointing out this

some system like duckdb or clickbench does provide two kinds of setting. e.g.
image

the original comparison @dudzicp posted is to compare duckdb(parquet) with datafusion. I think it's fair enough. Perhaps we could rename datafusion to datafusion(parquet) so that other people could easily understanding what they're comparing. (btw, we can judge it by loading time and datasize as well)

@alamb
Copy link
Contributor

alamb commented Feb 16, 2023

Perhaps we could rename datafusion to datafusion(parquet) so that other people could easily understanding what they're comparing. (btw, we can judge it by loading time and datasize as well)

I think this would be very helpful. Yes I think we should be only comparing datafusion performance to parquet files

@alamb
Copy link
Contributor

alamb commented Feb 16, 2023

@waitingkuo I can try to experiment if this is due to distinct.

@comphead thank you -- it would be super helpful if you could open a ticket with your analysis (if you haven't already)

I think doing something like posting the EXPLAIN plan and the output of EXPLAIN ANALYZE would be very helpful

@comphead
Copy link
Contributor

@alamb I'm running this on my machine now, not sure if its DISTINCT, once I have more details I'll file a ticket.

But anyway COUNT(DISTINCT X) is proven to be slow in most environments. If data is sorted by X you can jump between distinct values without scanning it, kinda skip scan. Otherwise people uses approximate distinct functions which are faster but has some inaccuracy

@comphead
Copy link
Contributor

I'll file a ticket if that possible to squeeze something from current distinct computation

❯ SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;
+----------+-----------------------+----------+---------------------------+-----------------------------+
| RegionID | SUM(hits.AdvEngineID) | c        | AVG(hits.ResolutionWidth) | COUNT(DISTINCT hits.UserID) |
+----------+-----------------------+----------+---------------------------+-----------------------------+
| 229      | 2077656               | 18295832 | 1506.085243130785         | 2845673                     |
| 2        | 441662                | 6687587  | 1479.8386542111527        | 1081016                     |
| 208      | 285925                | 4261812  | 1285.2593246722286        | 831676                      |
| 169      | 100887                | 3320229  | 1465.9073732564832        | 604583                      |
| 32       | 81498                 | 1843518  | 1538.0376568061718        | 216010                      |
| 34       | 161779                | 1792369  | 1548.360152401654         | 299479                      |
| 184      | 55526                 | 1755192  | 1506.8082967561384        | 322661                      |
| 42       | 108820                | 1542717  | 1587.1085208758313        | 243181                      |
| 107      | 120470                | 1516690  | 1548.6028970982863        | 272448                      |
| 51       | 98212                 | 1435578  | 1579.8860354505293        | 211505                      |
+----------+-----------------------+----------+---------------------------+-----------------------------+
10 rows in set. Query took 559.109 seconds.
❯ SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;
+----------+-----------------------+----------+---------------------------+
| RegionID | SUM(hits.AdvEngineID) | c        | AVG(hits.ResolutionWidth) |
+----------+-----------------------+----------+---------------------------+
| 229      | 2077656               | 18295832 | 1506.085243130785         |
| 2        | 441662                | 6687587  | 1479.8386542111527        |
| 208      | 285925                | 4261812  | 1285.2593246722286        |
| 169      | 100887                | 3320229  | 1465.9073732564832        |
| 32       | 81498                 | 1843518  | 1538.0376568061718        |
| 34       | 161779                | 1792369  | 1548.360152401654         |
| 184      | 55526                 | 1755192  | 1506.8082967561384        |
| 42       | 108820                | 1542717  | 1587.1085208758313        |
| 107      | 120470                | 1516690  | 1548.6028970982863        |
| 51       | 98212                 | 1435578  | 1579.8860354505293        |
+----------+-----------------------+----------+---------------------------+
10 rows in set. Query took 32.514 seconds.

@alamb
Copy link
Contributor

alamb commented Feb 16, 2023

I'll file a ticket if that possible to squeeze something from current distinct computation

Makes sense

Also in general this query looks like it is a test of how fast grouping can be done (in addition to the fact that our DISTINCT aggregator is pretty slow at the moment)

There are a bunch of ideas floating around about how to make Grouping faster -- for example #4973 and some details on #1570

I think someone just has to commit the time to hammering it out

@dudzicp
Copy link
Author

dudzicp commented Feb 16, 2023

One thing I noticed is that duckdb loads partitioned parquet files: https://github.com/ClickHouse/ClickBench/blob/main/duckdb-parquet/benchmark.sh as opposed to datafusion: https://github.com/ClickHouse/ClickBench/blob/main/datafusion/benchmark.sh. I guess this can impact parallelism?

@korowa
Copy link
Contributor

korowa commented Feb 17, 2023

One thing I noticed is that duckdb loads partitioned parquet files: https://github.com/ClickHouse/ClickBench/blob/main/duckdb-parquet/benchmark.sh as opposed to datafusion: https://github.com/ClickHouse/ClickBench/blob/main/datafusion/benchmark.sh. I guess this can impact parallelism?

Not sure about DuckDB -- it seems to be able to parallelise reads of single file (at least latest version), but it definitely caused lack of parallelism for Datafusion

@Dandandan
Copy link
Contributor

The benchmark though should add parallelism by repartitioning after scanning - so I would expect aggregates to be executed in parallel (even though the scan won't be).

Distinct count might be mostly slow as the current implementation is very allocation-heavy.

An additional suggestion to better performance on the benchmark is to add a target-cpu configuration which might yield a ~10% improvement - currently it compiles with the default target.

@alamb
Copy link
Contributor

alamb commented Feb 17, 2023

Not sure about DuckDB -- it seems to be able to parallelise reads of single file (at least latest version), but it definitely caused lack of parallelism for Datafusion

@korowa s added the feature to parallelize reads of a single parquet file in DataFusion. #5295 proposes enabling this by default

Distinct count might be mostly slow as the current implementation is very allocation-heavy.

👍

@alamb
Copy link
Contributor

alamb commented Feb 17, 2023

The benchmark though should add parallelism by repartitioning after scanning - so I would expect aggregates to be executed in parallel (even though the scan won't be).

BTW When I ran

❯ SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;

Locally using datafusion-cli it kept all my cores quite busy, so I don't think parallelism is the bottleneck for that particular query

@dudzicp
Copy link
Author

dudzicp commented Feb 17, 2023

I have profiled the slowest query locally:

+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type              | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics      | GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=10, elapsed_compute=28.984µs, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|                        |   SortPreservingMergeExec: [c@2 DESC], metrics=[output_rows=160, elapsed_compute=64.394µs, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|                        |     SortExec: fetch=10, expr=[c@2 DESC], metrics=[output_rows=160, elapsed_compute=358.247µs, spill_count=0, spilled_bytes=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                        |       ProjectionExec: expr=[RegionID@0 as RegionID, SUM(hits.AdvEngineID)@1 as SUM(hits.AdvEngineID), COUNT(UInt8(1))@2 as c, AVG(hits.ResolutionWidth)@3 as AVG(hits.ResolutionWidth), COUNT(DISTINCT hits.UserID)@4 as COUNT(DISTINCT hits.UserID)], metrics=[output_rows=9040, elapsed_compute=64.742µs, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                        |         AggregateExec: mode=FinalPartitioned, gby=[RegionID@0 as RegionID], aggr=[SUM(hits.AdvEngineID), COUNT(UInt8(1)), AVG(hits.ResolutionWidth), COUNT(DISTINCT hits.UserID)], metrics=[output_rows=9040, elapsed_compute=13.982147565s, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|                        |           CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=72741, elapsed_compute=1.81247319s, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                        |             RepartitionExec: partitioning=Hash([Column { name: "RegionID", index: 0 }], 16), input_partitions=16, metrics=[repart_time=255.989251ms, send_time=3.671844ms, fetch_time=849.033770928s]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|                        |               AggregateExec: mode=Partial, gby=[RegionID@0 as RegionID], aggr=[SUM(hits.AdvEngineID), COUNT(UInt8(1)), AVG(hits.ResolutionWidth), COUNT(DISTINCT hits.UserID)], metrics=[output_rows=72741, elapsed_compute=844.322413574s, spill_count=0, spilled_bytes=0, mem_used=0]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                        |                 RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1, metrics=[repart_time=1ns, send_time=49.116011103s, fetch_time=2.894295896s]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                        |                   ParquetExec: limit=None, partitions={1 group: [[home/pdudzic/workspace/arrow-datafusion/data/hits.parquet]]}, projection=[RegionID, UserID, ResolutionWidth, AdvEngineID], metrics=[output_rows=99997497, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, row_groups_pruned=0, predicate_evaluation_errors=0, bytes_scanned=370608715, pushdown_rows_filtered=0, num_predicate_creation_errors=0, page_index_rows_filtered=0, time_elapsed_scanning_until_data=890.342µs, time_elapsed_scanning_total=51.99199476s, pushdown_eval_time=2ns, time_elapsed_processing=2.018826404s, page_index_eval_time=2ns, time_elapsed_opening=30.170968ms]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| Plan with Full Metrics | GlobalLimitExec: skip=0, fetch=10, metrics=[start_timestamp{partition=0}=2023-02-17 11:14:48.310026303 UTC, end_timestamp{partition=0}=2023-02-17 11:15:44.955640152 UTC, elapsed_compute{partition=0}=28.984µs, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=10]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|                        |   SortPreservingMergeExec: [c@2 DESC], metrics=[start_timestamp{partition=0}=2023-02-17 11:14:48.310028748 UTC, end_timestamp{partition=0}=2023-02-17 11:15:44.955633517 UTC, elapsed_compute{partition=0}=64.394µs, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=160]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|                        |     SortExec: fetch=10, expr=[c@2 DESC], metrics=[elapsed_compute=358.247µs, spill_count=0, spilled_bytes=0, output_rows=160, start_timestamp=2023-02-17 11:14:48.312445549 UTC, end_timestamp=2023-02-17 11:15:44.955523935 UTC]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|                        |       ProjectionExec: expr=[RegionID@0 as RegionID, SUM(hits.AdvEngineID)@1 as SUM(hits.AdvEngineID), COUNT(UInt8(1))@2 as c, AVG(hits.ResolutionWidth)@3 as AVG(hits.ResolutionWidth), COUNT(DISTINCT hits.UserID)@4 as COUNT(DISTINCT hits.UserID)], metrics=[start_timestamp{partition=0}=2023-02-17 11:14:48.312442196 UTC, end_timestamp{partition=0}=2023-02-17 11:15:44.026171651 UTC, elapsed_compute{partition=0}=3.841µs, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=540, start_timestamp{partition=3}=2023-02-17 11:14:48.315138364 UTC, end_timestamp{partition=3}=2023-02-17 11:15:43.221529298 UTC, elapsed_compute{partition=3}=5.238µs, spill_count{partition=3}=0, spilled_bytes{partition=3}=0, mem_used{partition=3}=0, output_rows{partition=3}=544, start_timestamp{partition=1}=2023-02-17 11:14:48.315636125 UTC, end_timestamp{partition=1}=2023-02-17 11:15:43.478808450 UTC, elapsed_compute{partition=1}=3.911µs, spill_count{partition=1}=0, spilled_bytes{partition=1}=0, mem_used{partition=1}=0, output_rows{partition=1}=593, start_timestamp{partition=5}=2023-02-17 11:14:48.315643598 UTC, end_timestamp{partition=5}=2023-02-17 11:15:43.390033087 UTC, elapsed_compute{partition=5}=4.331µs, spill_count{partition=5}=0, spilled_bytes{partition=5}=0, mem_used{partition=5}=0, output_rows{partition=5}=557, start_timestamp{partition=9}=2023-02-17 11:14:48.319894441 UTC, end_timestamp{partition=9}=2023-02-17 11:15:42.996058615 UTC, elapsed_compute{partition=9}=5.238µs, spill_count{partition=9}=0, spilled_bytes{partition=9}=0, mem_used{partition=9}=0, output_rows{partition=9}=594, start_timestamp{partition=6}=2023-02-17 11:14:48.323561128 UTC, end_timestamp{partition=6}=2023-02-17 11:15:43.131314080 UTC, elapsed_compute{partition=6}=4.819µs, spill_count{partition=6}=0, spilled_bytes{partition=6}=0, mem_used{partition=6}=0, output_rows{partition=6}=533, start_timestamp{partition=7}=2023-02-17 11:14:48.324025924 UTC, end_timestamp{partition=7}=2023-02-17 11:15:43.243650102 UTC, elapsed_compute{partition=7}=5.307µs, spill_count{partition=7}=0, spilled_bytes{partition=7}=0, mem_used{partition=7}=0, output_rows{partition=7}=526, start_timestamp{partition=12}=2023-02-17 11:14:48.324034445 UTC, end_timestamp{partition=12}=2023-02-17 11:15:43.452520684 UTC, elapsed_compute{partition=12}=3.771µs, spill_count{partition=12}=0, spilled_bytes{partition=12}=0, mem_used{partition=12}=0, output_rows{partition=12}=559, start_timestamp{partition=13}=2023-02-17 11:14:48.324038216 UTC, start_timestamp{partition=4}=2023-02-17 11:14:48.324039194 UTC, end_timestamp{partition=13}=2023-02-17 11:15:42.823949671 UTC, elapsed_compute{partition=13}=4.191µs, spill_count{partition=13}=0, spilled_bytes{partition=13}=0, mem_used{partition=13}=0, output_rows{partition=13}=610, end_timestamp{partition=4}=2023-02-17 11:15:43.576639410 UTC, elapsed_compute{partition=4}=3.352µs, start_timestamp{partition=8}=2023-02-17 11:14:48.324040102 UTC, spill_count{partition=4}=0, spilled_bytes{partition=4}=0, start_timestamp{partition=10}=2023-02-17 11:14:48.324042057 UTC, end_timestamp{partition=8}=2023-02-17 11:15:43.705207468 UTC, mem_used{partition=4}=0, elapsed_compute{partition=8}=2.864µs, spill_count{partition=8}=0, output_rows{partition=4}=570, spilled_bytes{partition=8}=0, mem_used{partition=8}=0, output_rows{partition=8}=557, end_timestamp{partition=10}=2023-02-17 11:15:43.332722013 UTC, elapsed_compute{partition=10}=4.261µs, spill_count{partition=10}=0, spilled_bytes{partition=10}=0, mem_used{partition=10}=0, output_rows{partition=10}=588, start_timestamp{partition=11}=2023-02-17 11:14:48.324044921 UTC, end_timestamp{partition=11}=2023-02-17 11:15:43.473156515 UTC, elapsed_compute{partition=11}=2.793µs, spill_count{partition=11}=0, spilled_bytes{partition=11}=0, mem_used{partition=11}=0, output_rows{partition=11}=547, start_timestamp{partition=14}=2023-02-17 11:14:48.324047365 UTC, end_timestamp{partition=14}=2023-02-17 11:15:44.803820485 UTC, elapsed_compute{partition=14}=3.283µs, spill_count{partition=14}=0, spilled_bytes{partition=14}=0, mem_used{partition=14}=0, output_rows{partition=14}=566, start_timestamp{partition=15}=2023-02-17 11:14:48.324538492 UTC, end_timestamp{partition=15}=2023-02-17 11:15:43.234220363 UTC, elapsed_compute{partition=15}=3.771µs, spill_count{partition=15}=0, spilled_bytes{partition=15}=0, mem_used{partition=15}=0, output_rows{partition=15}=560, start_timestamp{partition=2}=2023-02-17 11:14:48.325566421 UTC, end_timestamp{partition=2}=2023-02-17 11:15:43.271351602 UTC, elapsed_compute{partition=2}=3.771µs, spill_count{partition=2}=0, spilled_bytes{partition=2}=0, mem_used{partition=2}=0, output_rows{partition=2}=596]                                      |
|                        |         AggregateExec: mode=FinalPartitioned, gby=[RegionID@0 as RegionID], aggr=[SUM(hits.AdvEngineID), COUNT(UInt8(1)), AVG(hits.ResolutionWidth), COUNT(DISTINCT hits.UserID)], metrics=[start_timestamp{partition=0}=2023-02-17 11:14:48.310140285 UTC, end_timestamp{partition=0}=2023-02-17 11:15:44.116437155 UTC, elapsed_compute{partition=0}=1.341536712s, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=540, start_timestamp{partition=1}=2023-02-17 11:14:48.310146431 UTC, end_timestamp{partition=1}=2023-02-17 11:15:43.536071962 UTC, elapsed_compute{partition=1}=875.181925ms, spill_count{partition=1}=0, spilled_bytes{partition=1}=0, mem_used{partition=1}=0, output_rows{partition=1}=593, start_timestamp{partition=5}=2023-02-17 11:14:48.310152996 UTC, end_timestamp{partition=5}=2023-02-17 11:15:43.468226672 UTC, elapsed_compute{partition=5}=812.851263ms, spill_count{partition=5}=0, spilled_bytes{partition=5}=0, mem_used{partition=5}=0, output_rows{partition=5}=557, start_timestamp{partition=3}=2023-02-17 11:14:48.310160190 UTC, end_timestamp{partition=3}=2023-02-17 11:15:43.295569821 UTC, elapsed_compute{partition=3}=677.685474ms, spill_count{partition=3}=0, spilled_bytes{partition=3}=0, mem_used{partition=3}=0, output_rows{partition=3}=544, start_timestamp{partition=4}=2023-02-17 11:14:48.316690664 UTC, end_timestamp{partition=4}=2023-02-17 11:15:43.642816462 UTC, elapsed_compute{partition=4}=994.139982ms, spill_count{partition=4}=0, spilled_bytes{partition=4}=0, start_timestamp{partition=6}=2023-02-17 11:14:48.319870485 UTC, end_timestamp{partition=6}=2023-02-17 11:15:43.195183424 UTC, elapsed_compute{partition=6}=607.74759ms, spill_count{partition=6}=0, spilled_bytes{partition=6}=0, mem_used{partition=6}=0, start_timestamp{partition=2}=2023-02-17 11:14:48.319878587 UTC, output_rows{partition=6}=533, end_timestamp{partition=2}=2023-02-17 11:15:43.340678931 UTC, elapsed_compute{partition=2}=687.535939ms, spill_count{partition=2}=0, spilled_bytes{partition=2}=0, mem_used{partition=2}=0, start_timestamp{partition=9}=2023-02-17 11:14:48.319880961 UTC, output_rows{partition=2}=596, end_timestamp{partition=9}=2023-02-17 11:15:43.038256796 UTC, elapsed_compute{partition=9}=466.650221ms, spill_count{partition=9}=0, start_timestamp{partition=10}=2023-02-17 11:14:48.319882917 UTC, spilled_bytes{partition=9}=0, mem_used{partition=9}=0, start_timestamp{partition=7}=2023-02-17 11:14:48.319882917 UTC, output_rows{partition=9}=594, start_timestamp{partition=8}=2023-02-17 11:14:48.319883895 UTC, end_timestamp{partition=10}=2023-02-17 11:15:43.406124043 UTC, elapsed_compute{partition=10}=780.037423ms, spill_count{partition=10}=0, end_timestamp{partition=7}=2023-02-17 11:15:43.303612505 UTC, elapsed_compute{partition=7}=708.167018ms, mem_used{partition=4}=0, spill_count{partition=7}=0, spilled_bytes{partition=10}=0, spilled_bytes{partition=7}=0, mem_used{partition=7}=0, output_rows{partition=7}=526, output_rows{partition=4}=570, start_timestamp{partition=12}=2023-02-17 11:14:48.324016914 UTC, end_timestamp{partition=12}=2023-02-17 11:15:43.515950095 UTC, elapsed_compute{partition=12}=866.695466ms, end_timestamp{partition=8}=2023-02-17 11:15:43.781012260 UTC, elapsed_compute{partition=8}=1.092439299s, spill_count{partition=12}=0, spilled_bytes{partition=12}=0, spill_count{partition=8}=0, mem_used{partition=12}=0, spilled_bytes{partition=8}=0, output_rows{partition=12}=559, mem_used{partition=8}=0, output_rows{partition=8}=557, start_timestamp{partition=13}=2023-02-17 11:14:48.324022153 UTC, end_timestamp{partition=13}=2023-02-17 11:15:42.857814347 UTC, elapsed_compute{partition=13}=339.438836ms, spill_count{partition=13}=0, spilled_bytes{partition=13}=0, mem_used{partition=13}=0, output_rows{partition=13}=610, start_timestamp{partition=11}=2023-02-17 11:14:48.324025924 UTC, mem_used{partition=10}=0, output_rows{partition=10}=588, start_timestamp{partition=14}=2023-02-17 11:14:48.324025924 UTC, end_timestamp{partition=11}=2023-02-17 11:15:43.532833894 UTC, end_timestamp{partition=14}=2023-02-17 11:15:44.955514786 UTC, elapsed_compute{partition=11}=887.122749ms, spill_count{partition=11}=0, elapsed_compute{partition=14}=2.148712776s, spilled_bytes{partition=11}=0, spill_count{partition=14}=0, mem_used{partition=11}=0, output_rows{partition=11}=547, spilled_bytes{partition=14}=0, mem_used{partition=14}=0, output_rows{partition=14}=566, start_timestamp{partition=15}=2023-02-17 11:14:48.324524174 UTC, end_timestamp{partition=15}=2023-02-17 11:15:43.301627815 UTC, elapsed_compute{partition=15}=696.204892ms, spill_count{partition=15}=0, spilled_bytes{partition=15}=0, mem_used{partition=15}=0, output_rows{partition=15}=560]                           |
|                        |           CoalesceBatchesExec: target_batch_size=8192, metrics=[start_timestamp{partition=0}=2023-02-17 11:14:48.310137980 UTC, end_timestamp{partition=0}=2023-02-17 11:15:44.025947948 UTC, elapsed_compute{partition=0}=227.723776ms, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=4632, start_timestamp{partition=1}=2023-02-17 11:14:48.310145034 UTC, end_timestamp{partition=1}=2023-02-17 11:15:43.478440803 UTC, elapsed_compute{partition=1}=143.107274ms, spill_count{partition=1}=0, spilled_bytes{partition=1}=0, mem_used{partition=1}=0, output_rows{partition=1}=4776, start_timestamp{partition=5}=2023-02-17 11:14:48.310150203 UTC, end_timestamp{partition=5}=2023-02-17 11:15:43.389730253 UTC, elapsed_compute{partition=5}=114.573329ms, spill_count{partition=5}=0, spilled_bytes{partition=5}=0, mem_used{partition=5}=0, output_rows{partition=5}=4339, start_timestamp{partition=3}=2023-02-17 11:14:48.310157815 UTC, end_timestamp{partition=3}=2023-02-17 11:15:43.220973429 UTC, elapsed_compute{partition=3}=78.070733ms, spill_count{partition=3}=0, spilled_bytes{partition=3}=0, mem_used{partition=3}=0, output_rows{partition=3}=4374, start_timestamp{partition=6}=2023-02-17 11:14:48.312683848 UTC, end_timestamp{partition=6}=2023-02-17 11:15:43.130552038 UTC, elapsed_compute{partition=6}=59.288848ms, spill_count{partition=6}=0, spilled_bytes{partition=6}=0, start_timestamp{partition=4}=2023-02-17 11:14:48.316684378 UTC, end_timestamp{partition=4}=2023-02-17 11:15:43.576342792 UTC, elapsed_compute{partition=4}=114.551884ms, spill_count{partition=4}=0, spilled_bytes{partition=4}=0, mem_used{partition=4}=0, output_rows{partition=4}=4474, start_timestamp{partition=2}=2023-02-17 11:14:48.316694506 UTC, mem_used{partition=6}=0, output_rows{partition=6}=4375, start_timestamp{partition=7}=2023-02-17 11:14:48.316696810 UTC, end_timestamp{partition=2}=2023-02-17 11:15:43.270949245 UTC, elapsed_compute{partition=2}=124.613836ms, spill_count{partition=2}=0, spilled_bytes{partition=2}=0, mem_used{partition=2}=0, output_rows{partition=2}=5022, end_timestamp{partition=7}=2023-02-17 11:15:43.242464331 UTC, elapsed_compute{partition=7}=142.8013ms, spill_count{partition=7}=0, spilled_bytes{partition=7}=0, mem_used{partition=7}=0, output_rows{partition=7}=4212, start_timestamp{partition=8}=2023-02-17 11:14:48.318235841 UTC, end_timestamp{partition=8}=2023-02-17 11:15:43.704948844 UTC, elapsed_compute{partition=8}=145.125975ms, spill_count{partition=8}=0, spilled_bytes{partition=8}=0, mem_used{partition=8}=0, output_rows{partition=8}=4623, start_timestamp{partition=9}=2023-02-17 11:14:48.319527493 UTC, end_timestamp{partition=9}=2023-02-17 11:15:42.995277786 UTC, elapsed_compute{partition=9}=61.351343ms, spill_count{partition=9}=0, spilled_bytes{partition=9}=0, mem_used{partition=9}=0, output_rows{partition=9}=4819, start_timestamp{partition=10}=2023-02-17 11:14:48.319539994 UTC, end_timestamp{partition=10}=2023-02-17 11:15:43.332380347 UTC, elapsed_compute{partition=10}=84.795162ms, spill_count{partition=10}=0, spilled_bytes{partition=10}=0, mem_used{partition=10}=0, output_rows{partition=10}=4695, start_timestamp{partition=12}=2023-02-17 11:14:48.320377885 UTC, end_timestamp{partition=12}=2023-02-17 11:15:43.452270162 UTC, elapsed_compute{partition=12}=117.915184ms, spill_count{partition=12}=0, spilled_bytes{partition=12}=0, mem_used{partition=12}=0, output_rows{partition=12}=4504, start_timestamp{partition=13}=2023-02-17 11:14:48.322111704 UTC, end_timestamp{partition=13}=2023-02-17 11:15:42.823078885 UTC, elapsed_compute{partition=13}=22.802736ms, spill_count{partition=13}=0, spilled_bytes{partition=13}=0, mem_used{partition=13}=0, output_rows{partition=13}=4618, start_timestamp{partition=14}=2023-02-17 11:14:48.323568880 UTC, end_timestamp{partition=14}=2023-02-17 11:15:44.803583372 UTC, elapsed_compute{partition=14}=187.230989ms, spill_count{partition=14}=0, spilled_bytes{partition=14}=0, mem_used{partition=14}=0, output_rows{partition=14}=4332, start_timestamp{partition=15}=2023-02-17 11:14:48.323583547 UTC, end_timestamp{partition=15}=2023-02-17 11:15:43.233986184 UTC, elapsed_compute{partition=15}=70.530072ms, spill_count{partition=15}=0, spilled_bytes{partition=15}=0, mem_used{partition=15}=0, output_rows{partition=15}=4466, start_timestamp{partition=11}=2023-02-17 11:14:48.323596468 UTC, end_timestamp{partition=11}=2023-02-17 11:15:43.472952088 UTC, elapsed_compute{partition=11}=117.990749ms, spill_count{partition=11}=0, spilled_bytes{partition=11}=0, mem_used{partition=11}=0, output_rows{partition=11}=4480]                                                                                                                                              |
|                        |             RepartitionExec: partitioning=Hash([Column { name: "RegionID", index: 0 }], 16), input_partitions=16, metrics=[fetch_time{partition=0, inputPartition=0}=52.989171766s, repart_time{partition=0, inputPartition=0}=16.174137ms, send_time{partition=0, inputPartition=0}=109.65µs, fetch_time{partition=1, inputPartition=0}=53.744701549s, repart_time{partition=1, inputPartition=0}=10.945932ms, send_time{partition=1, inputPartition=0}=79.341µs, fetch_time{partition=2, inputPartition=0}=54.090254881s, repart_time{partition=2, inputPartition=0}=9.422336ms, send_time{partition=2, inputPartition=0}=63.625µs, fetch_time{partition=3, inputPartition=0}=53.103560014s, repart_time{partition=3, inputPartition=0}=15.266894ms, send_time{partition=3, inputPartition=0}=131.159µs, fetch_time{partition=4, inputPartition=0}=53.085552535s, repart_time{partition=4, inputPartition=0}=26.351531ms, send_time{partition=4, inputPartition=0}=145.688µs, fetch_time{partition=5, inputPartition=0}=53.541228748s, repart_time{partition=5, inputPartition=0}=13.909033ms, send_time{partition=5, inputPartition=0}=79.271µs, fetch_time{partition=6, inputPartition=0}=53.849604816s, repart_time{partition=6, inputPartition=0}=10.402634ms, send_time{partition=6, inputPartition=0}=73.961µs, fetch_time{partition=7, inputPartition=0}=52.4037112s, repart_time{partition=7, inputPartition=0}=16.988631ms, send_time{partition=7, inputPartition=0}=92.398µs, fetch_time{partition=8, inputPartition=0}=53.268465493s, repart_time{partition=8, inputPartition=0}=12.06696ms, send_time{partition=8, inputPartition=0}=74.172µs, fetch_time{partition=9, inputPartition=0}=52.817845744s, repart_time{partition=9, inputPartition=0}=14.588802ms, send_time{partition=9, inputPartition=0}=108.603µs, fetch_time{partition=10, inputPartition=0}=52.741418943s, repart_time{partition=10, inputPartition=0}=15.342743ms, send_time{partition=10, inputPartition=0}=469.196µs, fetch_time{partition=11, inputPartition=0}=52.454397514s, repart_time{partition=11, inputPartition=0}=15.616731ms, send_time{partition=11, inputPartition=0}=66.561µs, fetch_time{partition=12, inputPartition=0}=52.291427138s, repart_time{partition=12, inputPartition=0}=21.156291ms, send_time{partition=12, inputPartition=0}=308.838µs, fetch_time{partition=13, inputPartition=0}=52.528659506s, repart_time{partition=13, inputPartition=0}=17.344265ms, send_time{partition=13, inputPartition=0}=571.653µs, fetch_time{partition=14, inputPartition=0}=53.422171583s, repart_time{partition=14, inputPartition=0}=15.074341ms, send_time{partition=14, inputPartition=0}=79.478µs, fetch_time{partition=15, inputPartition=0}=52.701599498s, repart_time{partition=15, inputPartition=0}=25.33799ms, send_time{partition=15, inputPartition=0}=1.21825ms]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|                        |               AggregateExec: mode=Partial, gby=[RegionID@0 as RegionID], aggr=[SUM(hits.AdvEngineID), COUNT(UInt8(1)), AVG(hits.ResolutionWidth), COUNT(DISTINCT hits.UserID)], metrics=[start_timestamp{partition=0}=2023-02-17 11:14:48.312477047 UTC, end_timestamp{partition=0}=2023-02-17 11:15:41.459634870 UTC, elapsed_compute{partition=0}=52.655373379s, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=4568, start_timestamp{partition=1}=2023-02-17 11:14:48.312491854 UTC, end_timestamp{partition=1}=2023-02-17 11:15:42.136048948 UTC, elapsed_compute{partition=1}=53.609532373s, spill_count{partition=1}=0, spilled_bytes{partition=1}=0, mem_used{partition=1}=0, output_rows{partition=1}=4634, start_timestamp{partition=2}=2023-02-17 11:14:48.312504285 UTC, end_timestamp{partition=2}=2023-02-17 11:15:42.467883496 UTC, elapsed_compute{partition=2}=53.96355387s, spill_count{partition=2}=0, spilled_bytes{partition=2}=0, mem_used{partition=2}=0, output_rows{partition=2}=4634, start_timestamp{partition=3}=2023-02-17 11:14:48.312516647 UTC, end_timestamp{partition=3}=2023-02-17 11:15:41.539374261 UTC, elapsed_compute{partition=3}=53.000088545s, spill_count{partition=3}=0, spilled_bytes{partition=3}=0, mem_used{partition=3}=0, output_rows{partition=3}=4540, start_timestamp{partition=4}=2023-02-17 11:14:48.312529149 UTC, end_timestamp{partition=4}=2023-02-17 11:15:41.591829246 UTC, elapsed_compute{partition=4}=52.792748238s, spill_count{partition=4}=0, spilled_bytes{partition=4}=0, mem_used{partition=4}=0, output_rows{partition=4}=4522, start_timestamp{partition=5}=2023-02-17 11:14:48.312541860 UTC, end_timestamp{partition=5}=2023-02-17 11:15:41.933581308 UTC, elapsed_compute{partition=5}=53.335500039s, spill_count{partition=5}=0, spilled_bytes{partition=5}=0, mem_used{partition=5}=0, output_rows{partition=5}=4497, start_timestamp{partition=6}=2023-02-17 11:14:48.312554571 UTC, end_timestamp{partition=6}=2023-02-17 11:15:42.235194818 UTC, elapsed_compute{partition=6}=53.696138719s, spill_count{partition=6}=0, spilled_bytes{partition=6}=0, mem_used{partition=6}=0, output_rows{partition=6}=4533, start_timestamp{partition=7}=2023-02-17 11:14:48.312568051 UTC, end_timestamp{partition=7}=2023-02-17 11:15:40.936575648 UTC, elapsed_compute{partition=7}=52.133469366s, spill_count{partition=7}=0, spilled_bytes{partition=7}=0, mem_used{partition=7}=0, output_rows{partition=7}=4526, start_timestamp{partition=8}=2023-02-17 11:14:48.312580832 UTC, end_timestamp{partition=8}=2023-02-17 11:15:41.695139703 UTC, elapsed_compute{partition=8}=53.203193363s, spill_count{partition=8}=0, spilled_bytes{partition=8}=0, mem_used{partition=8}=0, output_rows{partition=8}=4487, start_timestamp{partition=9}=2023-02-17 11:14:48.312595219 UTC, end_timestamp{partition=9}=2023-02-17 11:15:41.282533015 UTC, start_timestamp{partition=10}=2023-02-17 11:14:48.316691642 UTC, end_timestamp{partition=10}=2023-02-17 11:15:41.212464107 UTC, elapsed_compute{partition=10}=52.463152609s, spill_count{partition=10}=0, spilled_bytes{partition=10}=0, mem_used{partition=10}=0, output_rows{partition=10}=4479, start_timestamp{partition=14}=2023-02-17 11:14:48.316696810 UTC, end_timestamp{partition=14}=2023-02-17 11:15:41.830572706 UTC, elapsed_compute{partition=14}=53.24051212s, spill_count{partition=14}=0, spilled_bytes{partition=14}=0, mem_used{partition=14}=0, output_rows{partition=14}=4667, start_timestamp{partition=13}=2023-02-17 11:14:48.316700093 UTC, end_timestamp{partition=13}=2023-02-17 11:15:41.075184840 UTC, elapsed_compute{partition=13}=51.29426684s, spill_count{partition=13}=0, spilled_bytes{partition=13}=0, mem_used{partition=13}=0, output_rows{partition=13}=4523, elapsed_compute{partition=9}=52.749075015s, spill_count{partition=9}=0, spilled_bytes{partition=9}=0, mem_used{partition=9}=0, output_rows{partition=9}=4470, start_timestamp{partition=11}=2023-02-17 11:14:48.316713433 UTC, start_timestamp{partition=15}=2023-02-17 11:14:48.316713433 UTC, end_timestamp{partition=15}=2023-02-17 11:15:41.215856386 UTC, elapsed_compute{partition=15}=52.402302582s, end_timestamp{partition=11}=2023-02-17 11:15:41.030949867 UTC, spill_count{partition=15}=0, spilled_bytes{partition=15}=0, mem_used{partition=15}=0, elapsed_compute{partition=11}=51.72354631s, spill_count{partition=11}=0, output_rows{partition=15}=4616, spilled_bytes{partition=11}=0, mem_used{partition=11}=0, output_rows{partition=11}=4502, start_timestamp{partition=12}=2023-02-17 11:14:48.316725026 UTC, end_timestamp{partition=12}=2023-02-17 11:15:40.791522237 UTC, elapsed_compute{partition=12}=52.059960206s, spill_count{partition=12}=0, spilled_bytes{partition=12}=0, mem_used{partition=12}=0, output_rows{partition=12}=4543] |
|                        |                 RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1, metrics=[fetch_time{partition=0, inputPartition=0}=2.894295896s, repart_time{partition=0, inputPartition=0}=NOT RECORDED, send_time{partition=0, inputPartition=0}=49.116011103s]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|                        |                   ParquetExec: limit=None, partitions={1 group: [[home/pdudzic/workspace/arrow-datafusion/data/hits.parquet]]}, projection=[RegionID, UserID, ResolutionWidth, AdvEngineID], metrics=[num_predicate_creation_errors=0, time_elapsed_opening{partition=0}=30.170968ms, time_elapsed_scanning_until_data{partition=0}=890.342µs, time_elapsed_scanning_total{partition=0}=51.99199476s, time_elapsed_processing{partition=0}=2.018826404s, start_timestamp{partition=0}=2023-02-17 11:14:48.316720696 UTC, end_timestamp{partition=0}=2023-02-17 11:15:40.355567207 UTC, elapsed_compute{partition=0}=NOT RECORDED, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=99997497, predicate_evaluation_errors{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, row_groups_pruned{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, bytes_scanned{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, pushdown_rows_filtered{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, pushdown_eval_time{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=NOT RECORDED, page_index_rows_filtered{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, page_index_eval_time{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=NOT RECORDED, predicate_evaluation_errors{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, row_groups_pruned{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, bytes_scanned{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=370608715, pushdown_rows_filtered{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, pushdown_eval_time{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=NOT RECORDED, page_index_rows_filtered{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=0, page_index_eval_time{partition=0, filename=home/pdudzic/workspace/arrow-datafusion/data/hits.parquet}=NOT RECORDED]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| Output Rows            | 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Duration               | 56.643182661s                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set. Query took 56.664 seconds.

@Dandandan you are correct - we are repartitioning after scanning which is consistent with all cpus working at 100% just after the query was submitted.

One thing I dont understand - shouldnt the repartitioning be based on grouping key?

Also here is the output of EXPLAIN SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10;

┌───────────────────────────┐
│           TOP_N           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           Top 10          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│     count_star() DESC     │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
│          sum(#1)          │
│        count_star()       │
│          avg(#2)          │
│     count(DISTINCT #3)    │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          RegionID         │
│        AdvEngineID        │
│      ResolutionWidth      │
│           UserID          │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│        PARQUET_SCAN       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          RegionID         │
│           UserID          │
│      ResolutionWidth      │
│        AdvEngineID        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 0           │
└───────────────────────────┘ 

from duckdb.

I guess @alamb is correct and grouping/aggregation code needs improvements.

@Dandandan
Copy link
Contributor

@dudzicp
The repartitioning shouldn't be on grouping key, as hash repartitioning itself is CPU intensive. So for optimal concurrency, you want that to be parallelized as well. The round-robin partitioning is very cheap to do in DataFusion.

In certain cases it might be beneficial to do hash repartition before the (partial)AggregateExec and only perform a single aggregation, now we're basically performing the work twice.

@Dandandan
Copy link
Contributor

Dandandan commented Feb 17, 2023

The last suggestion might be better for running in a non-distributed execution, the current rule is better optimized for distributed execution (as shuffling data might be expensive), so if we get some improvements there, we'll need to make that configurable.

@alamb
Copy link
Contributor

alamb commented Feb 18, 2023

I guess @alamb is correct and grouping/aggregation code needs improvements.

This is one of the last areas I think there are factors of 2 or more performance to be extracted given the current architecture 👍

@kmitchener
Copy link
Contributor

I've re-run the benchmark on a c6a.4xlarge instance similar to the other stateless engines in the benchmark. Submitted it here -> ClickHouse/ClickBench#81 . For nearly all requests, it performs significantly better, but I wonder if that's due in large part to the datafusion.optimizer.repartitioning_file_scans by default that was committed yesterday.

@alamb
Copy link
Contributor

alamb commented Feb 23, 2023

but I wonder if that's due in large part to the datafusion.optimizer.repartitioning_file_scans by default that was committed yesterday.

It would not at all surprise me

Thank you @kmitchener

@sundy-li
Copy link
Contributor

I guess @alamb is correct and grouping/aggregation code needs improvements.

That's right, duckdb has very impressive grouping/aggregation improvements. Details in blog.

ClickBench is almost for grouping/aggregation and high cardinality distinct, so If datafusion wants to have better performance in this bench, it must need a high-performance hashtable which suits for OLAP queries.

@jychen7
Copy link
Contributor

jychen7 commented Feb 26, 2023

The benchmark though should add parallelism by repartitioning after scanning - so I would expect aggregates to be executed in parallel (even though the scan won't be).

I recently look at Datafusion vs DuckDB for our use cases and one of our evaluation queries is a variant of ClickBench q23.

  • Datafusion v19.rc1: 12s
  • DuckDB v0.6.1: 0.6s (20x faster)

it aligns with "scan is not in parallel yet"? More detail here: #5404

@alamb
Copy link
Contributor

alamb commented Mar 10, 2023

Filed #5547 to track improving COUNT DISTINCT query performance

@Dandandan Dandandan changed the title Update benchmmarks on clickbench Update benchmarks on clickbench Mar 10, 2023
@jychen7
Copy link
Contributor

jychen7 commented Apr 11, 2023

I just run a benchmark on Datafusion v22 (use default setting): ClickHouse/ClickBench#97.

I am thinking to rename the result from Datafusion (single) to DataFusion (single parquet). So it is more aligned with DuckDB (parquet) and Clickhouse-local (single). Thoughts ❓

@jychen7
Copy link
Contributor

jychen7 commented Apr 11, 2023

🥂 cheers that #5325 fix the regression on q9

SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10;

I also create an issue for the non-working q32: #5969

SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10

Summary of queries that is 2x slower than DuckDB

ClickBench Datafusion v18 (parquet) Datafusion v22 (parquet) DuckDB 2022-11 (parquet) Issue Tracker
q0 0.22s 0.22s 0.03s
q4 2.66s 2.7s 0.79s
q8 2.87s 3s 0.9s
q9 52s 3.5s 1.29s #5325
#3516
q12 2s 2s 0.76s
q13 3.62s 3.7s 1.21s
q14 2.08s 2.08s 0.84s
q15 3.42s 3.5s 0.84s
q16 4.99s 5.0s 1.86s
q17 4.32s 4.34 1.81s
q18 10.74s 10.9s 3.49s
q30 2.88s 2.94s 0.92s
q31 5.22s 4.4s 1.14s
q32 - - 5.28s #5969
q33 8.54s 8.7s 3.15s
q34 9.01s 9.2s 3.19s
q35 3.66s 3.5s 0.88s
q36 0.4s 0.4s 0.18s
q39 0.77s 0.77s 0.32s

@comphead
Copy link
Contributor

@jychen7 glad to hear #5325 fixed
for #5969 are you running with artifact built in release mode?

@jychen7
Copy link
Contributor

jychen7 commented Apr 12, 2023

for #5969 are you running with artifact built in release mode?

yes, it is release mode. For detail, can refer to 'benchmark.sh' in https://github.com/ClickHouse/ClickBench/pull/97/files.

btw, q32 not working is a known problem. I didn't find related github issue, so create #5969 to track it

@alamb
Copy link
Contributor

alamb commented Jul 12, 2023

I expect DataFusion 28 to perform much better (and q32 now works) due to #6904

@alamb
Copy link
Contributor

alamb commented Jul 27, 2023

I filed #7108 to track updating the clickbench benchmarks again, let's continue the conversation there. Closing this ticket

@alamb alamb closed this as completed Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests