Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BLAZE-587] Replace spark<major><minor><patch> pattern with spark-<major><minor> for maven profile and shim name #588

Merged
merged 1 commit into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build-ce7-releases.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
sparkver: [spark303, spark320, spark324, spark333, spark351]
sparkver: [spark-3.0, spark-3.1, spark-3.2, spark-3.3, spark-3.5]
blazever: [3.0.1]

steps:
Expand Down
39 changes: 16 additions & 23 deletions .github/workflows/tpcds.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,44 +5,37 @@ on:
push:

jobs:
test-spark303:
name: Test Spark303
test-spark-30:
name: Test spark-3.0
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark303
sparkver: spark-3.0
sparkurl: https://archive.apache.org/dist/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz

test-spark313:
name: Test Spark313
test-spark-31:
name: Test spark-3.1
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark313
sparkver: spark-3.1
sparkurl: https://archive.apache.org/dist/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz

test-spark320:
name: Test Spark320
test-spark-32:
name: Test spark-3.2
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark320
sparkurl: https://archive.apache.org/dist/spark/spark-3.2.0/spark-3.2.0-bin-hadoop2.7.tgz

test-spark324:
name: Test Spark324
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark324
sparkver: spark-3.2
sparkurl: https://archive.apache.org/dist/spark/spark-3.2.4/spark-3.2.4-bin-hadoop2.7.tgz

test-spark333:
name: Test Spark333
test-spark-33:
name: Test spark-3.3
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark333
sparkver: spark-3.3
sparkurl: https://archive.apache.org/dist/spark/spark-3.3.3/spark-3.3.3-bin-hadoop3.tgz

test-spark351:
name: Test Spark351
test-spark-35:
name: Test spark-3.5
uses: ./.github/workflows/tpcds-reusable.yml
with:
sparkver: spark351
sparkurl: https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
sparkver: spark-3.5
sparkurl: https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,17 +76,17 @@ Specify shims package of which spark version that you would like to run on.

Currently we have supported these shims:

* spark303 - for spark3.0.x
* spark313 - for spark3.1.x
* spark324 - for spark3.2.x
* spark333 - for spark3.3.x
* spark351 - for spark3.5.x.
* spark-3.0 - for spark3.0.x
* spark-3.1 - for spark3.1.x
* spark-3.2 - for spark3.2.x
* spark-3.3 - for spark3.3.x
* spark-3.5 - for spark3.5.x.

You could either build Blaze in pre mode for debugging or in release mode to unlock the full potential of
Blaze.

```shell
SHIM=spark333 # or spark303/spark313/spark320/spark324/spark333/spark351
SHIM=spark-3.3 # or spark-3.0/spark-3.1/spark-3.2/spark-3.3/spark-3.5
MODE=release # or pre
mvn package -P"${SHIM}" -P"${MODE}"
```
Expand All @@ -98,7 +98,7 @@ directory.

You can use the following command to build a centos-7 compatible release:
```shell
SHIM=spark333 MODE=release ./release-docker.sh
SHIM=spark-3.3 MODE=release ./release-docker.sh
```

## Run Spark Job with Blaze Accelerator
Expand Down Expand Up @@ -132,10 +132,10 @@ comparison with vanilla Spark 3.3.3. The benchmark result shows that Blaze save
Stay tuned and join us for more upcoming thrilling numbers.

TPC-DS Query time: ([How can I run TPC-DS benchmark?](./tpcds/README.md))
![20240701-query-time-tpcds](./benchmark-results/spark333-vs-blaze300-query-time-20240701.png)
![20240701-query-time-tpcds](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701.png)

TPC-H Query time:
![20240701-query-time-tpch](./benchmark-results/spark333-vs-blaze300-query-time-20240701-tpch.png)
![20240701-query-time-tpch](./benchmark-results/spark-3.3-vs-blaze300-query-time-20240701-tpch.png)

We also encourage you to benchmark Blaze and share the results with us. 🤗

Expand Down
4 changes: 2 additions & 2 deletions benchmark-results/20240701-blaze300.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ spark.sql.readSideCharPadding false
### TPC-DS Results
Blaze saved 46% total query time comparing to spark, benchmarks using the above configuration.
Query time comparison (seconds):
![spark333-vs-blaze300-query-time-20240701.png](spark333-vs-blaze300-query-time-20240701.png)
![spark-3.3-vs-blaze300-query-time-20240701.png](spark-3.3-vs-blaze300-query-time-20240701.png)

| | Blaze | Spark | Speedup(x) |
| ------ | -------- | -------- | ---------- |
Expand Down Expand Up @@ -172,7 +172,7 @@ Query time comparison (seconds):
### TPC-H Results
Blaze saved 55% total query time comparing to spark, benchmarks using the above configuration.
Query time comparison (seconds):
![spark333-vs-blaze300-query-time-20240701-tpch.png](spark333-vs-blaze300-query-time-20240701-tpch.png)
![spark-3.3-vs-blaze300-query-time-20240701-tpch.png](spark-3.3-vs-blaze300-query-time-20240701-tpch.png)

| | Blaze | Spark | Speedup(x) |
| ------ | ------- | -------- | ---------- |
Expand Down
3 changes: 1 addition & 2 deletions dev/docker-build/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ services:
- ./../../:/blaze:rw
- ./../../target-docker:/blaze/target:rw
- ./../../target-docker/spark-extension-target:/blaze/spark-extension/target:rw
- ./../../target-docker/spark-extension-shims-spark303-target:/blaze/spark-extension-shims-spark303/target:rw
- ./../../target-docker/spark-extension-shims-spark241kwaiae-target:/blaze/spark-extension-shims-spark241kwaiae/target:rw
- ./../../target-docker/spark-extension-shims-spark-3.0-target:/blaze/spark-extension-shims-spark-3.0/target:rw
- ./../../target-docker/build-helper-proto-target:/blaze/build-helper/proto/target:rw
- ./../../target-docker/build-helper-assembly-target:/blaze/build-helper/assembly/target:rw
environment:
Expand Down
36 changes: 11 additions & 25 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -261,9 +261,9 @@
</profile>

<profile>
<id>spark303</id>
<id>spark-3.0</id>
<properties>
<shimName>spark303</shimName>
<shimName>spark-3.0</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -275,9 +275,9 @@
</profile>

<profile>
<id>spark313</id>
<id>spark-3.1</id>
<properties>
<shimName>spark313</shimName>
<shimName>spark-3.1</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -289,23 +289,9 @@
</profile>

<profile>
<id>spark320</id>
<id>spark-3.2</id>
<properties>
<shimName>spark320</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
<scalaLongVersion>2.12.15</scalaLongVersion>
<scalaTestVersion>3.2.9</scalaTestVersion>
<scalafmtVersion>3.0.0</scalafmtVersion>
<sparkVersion>3.2.0</sparkVersion>
</properties>
</profile>

<profile>
<id>spark324</id>
<properties>
<shimName>spark324</shimName>
<shimName>spark-3.2</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -317,9 +303,9 @@
</profile>

<profile>
<id>spark333</id>
<id>spark-3.3</id>
<properties>
<shimName>spark333</shimName>
<shimName>spark-3.3</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
Expand All @@ -331,16 +317,16 @@
</profile>

<profile>
<id>spark351</id>
<id>spark-3.5</id>
<properties>
<shimName>spark351</shimName>
<shimName>spark-3.5</shimName>
<shimPkg>spark-extension-shims-spark3</shimPkg>
<javaVersion>1.8</javaVersion>
<scalaVersion>2.12</scalaVersion>
<scalaLongVersion>2.12.15</scalaLongVersion>
<scalaTestVersion>3.2.9</scalaTestVersion>
<scalafmtVersion>3.0.0</scalafmtVersion>
<sparkVersion>3.5.1</sparkVersion>
<sparkVersion>3.5.2</sparkVersion>
</properties>
</profile>
</profiles>
Expand Down
2 changes: 1 addition & 1 deletion release-docker.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

export SHIM="${SHIM:-spark303}"
export SHIM="${SHIM:-spark-3.0}"
export MODE="${MODE:-release}"

docker-compose -f dev/docker-build/docker-compose.yml up
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import com.thoughtworks.enableIf

object InterceptedValidateSparkPlan extends Logging {

@enableIf(Seq("spark324", "spark333", "spark351").contains(System.getProperty("blaze.shim")))
@enableIf(Seq("spark-3.2", "spark-3.3", "spark-3.5").contains(System.getProperty("blaze.shim")))
def validate(plan: SparkPlan): Unit = {
import org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec
import org.apache.spark.sql.execution.blaze.plan.NativeRenameColumnsBase
Expand Down Expand Up @@ -70,13 +70,12 @@ object InterceptedValidateSparkPlan extends Logging {
}
}

@enableIf(Seq("spark303", "spark313", "spark320").contains(System.getProperty("blaze.shim")))
@enableIf(Seq("spark-3.0", "spark-3.1").contains(System.getProperty("blaze.shim")))
def validate(plan: SparkPlan): Unit = {
throw new UnsupportedOperationException(
"validate is not supported in spark 3.0.3 or 3.1.3 or spark 3.2.0")
throw new UnsupportedOperationException("validate is not supported in spark 3.0.3 or 3.1.3")
}

@enableIf(Seq("spark324", "spark333", "spark351").contains(System.getProperty("blaze.shim")))
@enableIf(Seq("spark-3.2", "spark-3.3", "spark-3.5").contains(System.getProperty("blaze.shim")))
private def errorOnInvalidBroadcastQueryStage(plan: SparkPlan): Unit = {
import org.apache.spark.sql.execution.adaptive.InvalidAQEPlanException
throw InvalidAQEPlanException("Invalid broadcast query stage", plan)
Expand Down
Loading