Skip to content

Commit

Permalink
doc: update tpc-h benchmark result
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangli20 committed Oct 10, 2024
1 parent 427d084 commit 15a4e9f
Show file tree
Hide file tree
Showing 6 changed files with 88 additions and 207 deletions.
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,13 @@ spark.sql.adaptive.localShuffleReader.enabled false

## Performance

Check [Benchmark Results](./benchmark-results/20240701-blaze300.md) with the latest date for the performance
comparison with vanilla Spark 3.3.3. The benchmark result shows that Blaze save about 50% time on TPC-DS/TPC-H 1TB datasets.
Stay tuned and join us for more upcoming thrilling numbers.
Check [TPC-H Benchmark Results](./benchmark-results/tpch.md).
The latest benchmark result shows that Blaze saved more than 50% time on TPC-H 1TB datasets comparing with Vanilla Spark 3.5.

TPC-DS Query time: ([How can I run TPC-DS benchmark?](./tpcds/README.md))
![20240701-query-time-tpcds](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701.png)
Stay tuned and join us for more upcoming thrilling numbers.

TPC-H Query time:
![20240701-query-time-tpch](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701-tpch.png)
![tpch-blaze400-spark351.png](./benchmark-results/tpch-blaze400-spark351.png)

We also encourage you to benchmark Blaze and share the results with us. 🤗

Expand Down
201 changes: 0 additions & 201 deletions benchmark-results/20240701-blaze300.md

This file was deleted.

Binary file not shown.
Binary file not shown.
Binary file added benchmark-results/tpch-blaze400-spark351.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 84 additions & 0 deletions benchmark-results/tpch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# TPC-H 1TB Benchmark

### Versions
- Blaze version: [4.0.0](https://github.com/blaze-init/blaze/tree/v4.0.0)
- Vanilla spark version: spark-3.5.1

### Environment
Hadoop 2.10.2 cluster mode running on 7 nodes, See [Kwai server conf](./kwai1-hardware-conf.md).
java version: 1.8.0_102.

### Configuration

Common configurations:
```conf
spark.master yarn
spark.shuffle.service.enabled true
spark.shuffle.service.port 7337
spark.driver.memory 20g
spark.driver.memoryOverhead 4096
spark.executor.instances 10000
spark.dynamicallocation.maxExecutors 10000
spark.io.compression.codec lz4
spark.sql.parquet.compression.codec zstd
# enabled in spark 3.5 by default
spark.sql.optimizer.runtime.bloomFilter.enabled true
# enable HashJoin for small tables, which is faster both in spark and blaze
# note: SortMergeJoin is still used for joining big tables with this configuration enabled
spark.sql.join.preferSortMergeJoin false
```

Configurations for Vanillia spark:
```conf
spark.executor.memory 4g
spark.executor.memoryOverhead 2048
spark.executor.cores 5
```

Configurations for blaze:
note: this configuration is widely used in production environment of Kuaishou.inc, without any tricky optimizations only for benchmark. (for example, you can set `spark.blaze.forceShuffledHashJoin true` to force using HashJoin instead of SortMergeJoin and get much faster benchmark result, but this is unacceptable in production environment)

```conf
spark.executor.memory 3g
spark.executor.memoryOverhead 3072
spark.blaze.enable true
spark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension
spark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager
```

### Benchmark result:
Blaze shows 2.12x query time speed-up comparing with Spark 3.5, with the same CPU/memory resources.

![tpch-blaze400-spark351.png](tpch-blaze400-spark351.png)

| | Spark | Blaze | Speedup |
| --- | -------- | ------- | ------- |
| q01 | 40.473 | 19.834 | 2.04 |
| q02 | 20.527 | 11.639 | 1.76 |
| q03 | 69.091 | 31.199 | 2.21 |
| q04 | 59.58 | 16.585 | 3.59 |
| q05 | 100.958 | 52.267 | 1.93 |
| q06 | 26.713 | 7.928 | 3.37 |
| q07 | 64.729 | 28.175 | 2.30 |
| q08 | 64.465 | 35.043 | 1.84 |
| q09 | 103.011 | 53.203 | 1.94 |
| q10 | 46.543 | 21.805 | 2.13 |
| q11 | 16.458 | 8.561 | 1.92 |
| q12 | 26.626 | 13.784 | 1.93 |
| q13 | 53.072 | 15.445 | 3.44 |
| q14 | 31.561 | 9.279 | 3.40 |
| q15 | 59.57 | 19.212 | 3.10 |
| q16 | 14.533 | 5.944 | 2.44 |
| q17 | 141.243 | 54.49 | 2.59 |
| q18 | 129.022 | 79.808 | 1.62 |
| q19 | 19.561 | 10.149 | 1.93 |
| q20 | 42.451 | 15.934 | 2.66 |
| q21 | 177.553 | 107.276 | 1.66 |
| q22 | 17.429 | 8.244 | 2.11 |
| | | | |
| sum | 1325.169 | 625.804 | 2.12 |

0 comments on commit 15a4e9f

Please sign in to comment.