Skip to content

Commit

Permalink
[BLAZE-587] Replace spark<major><minor><patch> pattern with spark-<ma…
Browse files Browse the repository at this point in the history
…jor><minor> for maven profile and shim name
  • Loading branch information
SteNicholas committed Sep 24, 2024
1 parent a3073c9 commit cef4833
Show file tree
Hide file tree
Showing 36 changed files with 157 additions and 247 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-ce7-releases.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
sparkver: [spark303, spark320, spark324, spark333, spark351]
sparkver: [spark-3.0, spark-3.1, spark-3.2, spark-3.3, spark-3.5]
blazever: [3.0.1]

steps:
Expand Down
39 changes: 16 additions & 23 deletions .github/workflows/tpcds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,44 +5,37 @@ on:
push:

jobs:
test-spark303:
name: Test Spark303
test-spark-30:
name: Test spark-3.0
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark303
sparkver: spark-3.0
sparkurl: https://archive.apache.org/dist/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz

test-spark313:
name: Test Spark313
test-spark-31:
name: Test spark-3.1
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark313
sparkver: spark-3.1
sparkurl: https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz

test-spark320:
name: Test Spark320
test-spark-32:
name: Test spark-3.2
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark320
sparkurl: https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop2.7.tgz

test-spark324:
name: Test Spark324
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark324
sparkver: spark-3.2
sparkurl: https://archive.apache.org/dist/spark/spark-3.2.4/spark-3.2.4-bin-hadoop2.7.tgz

test-spark333:
name: Test Spark333
test-spark-33:
name: Test spark-3.3
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark333
sparkver: spark-3.3
sparkurl: https://archive.apache.org/dist/spark/spark-3.3.3/spark-3.3.3-bin-hadoop3.tgz

test-spark351:
name: Test Spark351
test-spark-35:
name: Test spark-3.5
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark351
sparkurl: https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
sparkver: spark-3.5
sparkurl: https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,17 +76,17 @@ Specify shims package of which spark version that you would like to run on.

Currently we have supported these shims:

* spark303 - for spark3.0.x
* spark313 - for spark3.1.x
* spark324 - for spark3.2.x
* spark333 - for spark3.3.x
* spark351 - for spark3.5.x.
* spark-3.0 - for spark3.0.x
* spark-3.1 - for spark3.1.x
* spark-3.2 - for spark3.2.x
* spark-3.3 - for spark3.3.x
* spark-3.5 - for spark3.5.x.

You could either build Blaze in pre mode for debugging or in release mode to unlock the full potential of
Blaze.

```shell
SHIM=spark333 # or spark303/spark313/spark320/spark324/spark333/spark351
SHIM=spark-3.3 # or spark-3.0/spark-3.1/spark-3.2/spark-3.3/spark-3.5
MODE=release # or pre
mvn package -P"${SHIM}" -P"${MODE}"
```
Expand All @@ -98,7 +98,7 @@ directory.

You can use the following command to build a centos-7 compatible release:
```shell
SHIM=spark333 MODE=release ./release-docker.sh
SHIM=spark-3.3 MODE=release ./release-docker.sh
```

## Run Spark Job with Blaze Accelerator
Expand Down Expand Up @@ -132,10 +132,10 @@ comparison with vanilla Spark 3.3.3. The benchmark result shows that Blaze save
Stay tuned and join us for more upcoming thrilling numbers.

TPC-DS Query time: ([How can I run TPC-DS benchmark?](./tpcds/README.md))
![20240701-query-time-tpcds](./benchmark-results/spark333-vs-blaze300-query-time-20240701.png)
![20240701-query-time-tpcds](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701.png)

TPC-H Query time:
![20240701-query-time-tpch](./benchmark-results/spark333-vs-blaze300-query-time-20240701-tpch.png)
![20240701-query-time-tpch](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701-tpch.png)

We also encourage you to benchmark Blaze and share the results with us. 🤗

Expand Down
4 changes: 2 additions & 2 deletions benchmark-results/20240701-blaze300.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ spark.sql.readSideCharPadding false
### TPC-DS Results
Blaze saved 46% total query time comparing to spark, benchmarks using the above configuration.
Query time comparison (seconds):
![spark333-vs-blaze300-query-time-20240701.png](spark333-vs-blaze300-query-time-20240701.png)
![spark-3.3-vs-blaze300-query-time-20240701.png](spark-3.3-vs-blaze300-query-time-20240701.png)

| | Blaze | Spark | Speedup(x) |
| ------ | -------- | -------- | ---------- |
Expand Down Expand Up @@ -172,7 +172,7 @@ Query time comparison (seconds):
### TPC-H Results
Blaze saved 55% total query time comparing to spark, benchmarks using the above configuration.
Query time comparison (seconds):
![spark333-vs-blaze300-query-time-20240701-tpch.png](spark333-vs-blaze300-query-time-20240701-tpch.png)
![spark-3.3-vs-blaze300-query-time-20240701-tpch.png](spark-3.3-vs-blaze300-query-time-20240701-tpch.png)

| | Blaze | Spark | Speedup(x) |
| ------ | ------- | -------- | ---------- |
Expand Down
3 changes: 1 addition & 2 deletions dev/docker-build/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ services:
- ./../../:/blaze:rw
- ./../../target-docker:/blaze/target:rw
- ./../../target-docker/spark-extension-target:/blaze/spark-extension/target:rw
- ./../../target-docker/spark-extension-shims-spark303-target:/blaze/spark-extension-shims-spark303/target:rw
- ./../../target-docker/spark-extension-shims-spark241kwaiae-target:/blaze/spark-extension-shims-spark241kwaiae/target:rw
- ./../../target-docker/spark-extension-shims-spark-3.0-target:/blaze/spark-extension-shims-spark-3.0/target:rw
- ./../../target-docker/build-helper-proto-target:/blaze/build-helper/proto/target:rw
- ./../../target-docker/build-helper-assembly-target:/blaze/build-helper/assembly/target:rw
environment:
Expand Down
36 changes: 11 additions & 25 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -261,9 +261,9 @@
</profile>

<profile>
<id>spark303</id>
<id>spark-3.0</id>
<properties>
<shimName>spark303</shimName>
<shimName>spark-3.0</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -275,9 +275,9 @@
</profile>

<profile>
<id>spark313</id>
<id>spark-3.1</id>
<properties>
<shimName>spark313</shimName>
<shimName>spark-3.1</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -289,23 +289,9 @@
</profile>

<profile>
<id>spark320</id>
<id>spark-3.2</id>
<properties>
<shimName>spark320</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
<scalaLongVersion>2.12.15</scalaLongVersion>
<scalaTestVersion>3.2.9</scalaTestVersion>
<scalafmtVersion>3.0.0</scalafmtVersion>
<sparkVersion>3.2.0</sparkVersion>
</properties>
</profile>

<profile>
<id>spark324</id>
<properties>
<shimName>spark324</shimName>
<shimName>spark-3.2</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -317,9 +303,9 @@
</profile>

<profile>
<id>spark333</id>
<id>spark-3.3</id>
<properties>
<shimName>spark333</shimName>
<shimName>spark-3.3</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -331,16 +317,16 @@
</profile>

<profile>
<id>spark351</id>
<id>spark-3.5</id>
<properties>
<shimName>spark351</shimName>
<shimName>spark-3.5</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
<scalaLongVersion>2.12.15</scalaLongVersion>
<scalaTestVersion>3.2.9</scalaTestVersion>
<scalafmtVersion>3.0.0</scalafmtVersion>
<sparkVersion>3.5.1</sparkVersion>
<sparkVersion>3.5.2</sparkVersion>
</properties>
</profile>
</profiles>
Expand Down
2 changes: 1 addition & 1 deletion release-docker.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

export SHIM="${SHIM:-spark303}"
export SHIM="${SHIM:-spark-3.0}"
export MODE="${MODE:-release}"

docker-compose -f dev/docker-build/docker-compose.yml up
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import com.thoughtworks.enableIf

object InterceptedValidateSparkPlan extends Logging {

@enableIf(Seq("spark324", "spark333", "spark351").contains(System.getProperty("blaze.shim")))
@enableIf(Seq("spark-3.2", "spark-3.3", "spark-3.5").contains(System.getProperty("blaze.shim")))
def validate(plan: SparkPlan): Unit = {
import org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec
import org.apache.spark.sql.execution.blaze.plan.NativeRenameColumnsBase
Expand Down Expand Up @@ -70,13 +70,12 @@ object InterceptedValidateSparkPlan extends Logging {
}
}

@enableIf(Seq("spark303", "spark313", "spark320").contains(System.getProperty("blaze.shim")))
@enableIf(Seq("spark-3.0", "spark-3.1").contains(System.getProperty("blaze.shim")))
def validate(plan: SparkPlan): Unit = {
throw new UnsupportedOperationException(
"validate is not supported in spark 3.0.3 or 3.1.3 or spark 3.2.0")
throw new UnsupportedOperationException("validate is not supported in spark 3.0.3 or 3.1.3")
}

@enableIf(Seq("spark324", "spark333", "spark351").contains(System.getProperty("blaze.shim")))
@enableIf(Seq("spark-3.2", "spark-3.3", "spark-3.5").contains(System.getProperty("blaze.shim")))
private def errorOnInvalidBroadcastQueryStage(plan: SparkPlan): Unit = {
import org.apache.spark.sql.execution.adaptive.InvalidAQEPlanException
throw InvalidAQEPlanException("Invalid broadcast query stage", plan)
Expand Down
Loading

0 comments on commit cef4833

Please sign in to comment.