Skip to content

Commit

Permalink
Speed up build: unnecessary invalidation in the incremental recompile…
Browse files Browse the repository at this point in the history
… mode [databricks] (#9698)

Fixes #9691 

This PR aims at reducing unnecessary IncrementalCompiler output invalidations due to jar updates 

- Update build info only when the HEAD revision changes, or on clean
- Produce the final aggregator jar only if the shade plugin output changed
- Skip generating META-INF/ShimServiceProvider if already exists
- It was not breaking anything but it does not make sense to have -Xsource:2.13 in the 2.12 build
- Replace `skip` by `maven.scaladoc.skip` in docs and scripts as a follow up to #9615 
- fix binary-dedupe to not look outside the target/parallel-world dir
- optional skip for unpack of nightly dependencies, need a general caching solution

Repeated executions

Wall Clock pre-PR | Wall Clock post-PR  | Maven command
| -- | -- | -- |
| `01:10 min` |  `13.004 s` | `mvn package -pl tests -am -Dbuildver=330 \` <br>  `    -Dmaven.scaladoc.skip -DskipTests` |
| `01:23 min` | `25.801 s`  | `mvn package -pl tests -am -Dbuildver=330 \` <br>  `    -Dmaven.scaladoc.skip -Dsuffixes=rapids.CastOpSuite \`<br> `    -Dtests='Cast from string to timestamp'` |
| `01:30 min` | `30.00 s` |  `mvn package -Dbuildver=330 \`<br>`    -Ddist.jar.compress=false -DskipTests -Dmaven.scaladoc.skip` |
|  `48.769 s` | `19.253 s` |  `mvn -T1C package  -pl dist  -am -Dbuildver=330 \`<br> `    -Ddist.jar.compress=false -DskipTests \`<br> `    -Drapids.jni.unpack.skip=true` |             

---------
Signed-off-by: Gera Shegalov <[email protected]>
Co-authored-by: Jason Lowe <[email protected]>
  • Loading branch information
gerashegalov authored Nov 17, 2023
1 parent 2759c66 commit 30c3df3
Show file tree
Hide file tree
Showing 16 changed files with 408 additions and 101 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/mvn-verify-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ env:
COMMON_MVN_FLAGS: >-
-Ddist.jar.compress=false
-DskipTests
-Dskip
-Dmaven.scaladoc.skip
jobs:
get-shim-versions-from-dist:
Expand Down
12 changes: 9 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,19 +218,25 @@ for a single Spark version Shim alone.
To this end in a pre-production build you can set the Boolean property
`dist.jar.compress` to `false`, its default value is `true`.
Furthermore, after the first build execution on the clean repository the spark-rapids-jni
SNAPSHOT dependency typically does not change until the next nightly CI build, or the next install
to the local Maven repo if you are working on a change to the native code. So you can save
significant time spent on repeated unpacking these dependencies by adding `-Drapids.jni.unpack.skip`
to the `dist` build command.
The time saved is more significant if you are merely changing
the `aggregator` module, or the `dist` module, or just incorporating changes from
[spark-rapids-jni](https://github.com/NVIDIA/spark-rapids-jni/blob/branch-23.04/CONTRIBUTING.md#local-testing-of-cross-repo-contributions-cudf-spark-rapids-jni-and-spark-rapids)
For example, to quickly repackage `rapids-4-spark` after the
initial `./build/buildall` you can iterate by invoking
```Bash
mvn package -pl dist -PnoSnapshots -Ddist.jar.compress=false
mvn package -pl dist -PnoSnapshots -Ddist.jar.compress=false -Drapids.jni.unpack.skip
```
or similarly
```Bash
./build/buildall --rebuild-dist-only --option="-Ddist.jar.compress=false"
./build/buildall --rebuild-dist-only --option="-Ddist.jar.compress=false -Drapids.jni.unpack.skip"
```
## Code contributions
Expand Down Expand Up @@ -282,7 +288,7 @@ Before proceeding with importing spark-rapids into IDEA or switching to a differ
profile, execute the install phase with the corresponding `buildver`, e.g. for Spark 3.4.0:
```bash
mvn clean install -Dbuildver=340 -Dskip -DskipTests
mvn clean install -Dbuildver=340 -Dmaven.scaladoc.skip -DskipTests
```
##### Importing the project
Expand Down
72 changes: 70 additions & 2 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
<rapids.shade.package>com.nvidia.shaded.spark</rapids.shade.package>
<rapids.compressed.artifact>false</rapids.compressed.artifact>
<rapids.shim.jar.test.phase>none</rapids.shim.jar.test.phase>
<rapids.default.jar.excludePattern>**/*</rapids.default.jar.excludePattern>
<rapids.default.jar.phase>initialize</rapids.default.jar.phase>
<!-- Maven to register attached artifact , which we later replace -->
<rapids.shim.jar.phase>initialize</rapids.shim.jar.phase>
</properties>
<dependencies>
<dependency>
Expand Down Expand Up @@ -73,7 +77,6 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>${spark.version.classifier}</shadedClassifierName>
<artifactSet>
<excludes>
<exclude>org.slf4j:*</exclude>
Expand Down Expand Up @@ -108,13 +111,78 @@
<executions>
<execution>
<id>main-${spark.version.classifier}</id>
<phase>package</phase>
<phase>compile</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<id>init-dirs</id>
<phase>initialize</phase>
<goals><goal>run</goal></goals>
<configuration>
<target>
<mkdir dir="${project.build.outputDirectory}"/>
</target>
</configuration>
</execution>
<execution>
<id>generate-build-info</id>
<phase>none</phase>
</execution>
<execution>
<id>create-aggregator-for-downstream-if-content-changed</id>
<goals><goal>run</goal></goals>
<phase>process-classes</phase>
<configuration>
<target>
<taskdef resource="net/sf/antcontrib/antcontrib.properties"/>
<property name="realAggJar"
location="${project.build.outputDirectory}/../${project.build.finalName}-shaded.jar"/>
<property name="aggJarForDownstream"
location="${project.build.outputDirectory}/../${project.build.finalName}-${spark.version.classifier}.jar"/>
<property name="newClassesDir" location="${project.build.outputDirectory}/../new-classes"/>
<property name="oldClassesDir" location="${project.build.outputDirectory}/../old-classes"/>
<echo>Checking if need to recreate: ${aggJarForDownstream}</echo>
<!-- using diff instead of checksum to deal with the expected zip metadata diff -->

<!-- make sure we start with a clean new dir -->
<mkdir dir="${newClassesDir}"/>
<delete dir="${newClassesDir}"/>
<unzip src="${realAggJar}" dest="${newClassesDir}"/>
<mkdir dir="${oldClassesDir}"/>

<exec executable="diff"
resultproperty="diff.result">
<arg value="-q"/>
<arg value="-r"/>
<arg value="${oldClassesDir}"/>
<arg value="${newClassesDir}"/>
</exec>

<ac:if xmlns:ac="antlib:net.sf.antcontrib">
<equals arg1="0" arg2="${diff.result}"/>
<then>
<echo>Aggregator jar unchanged</echo>
</then>
<else>
<echo>Aggregator jar changed, recreating final jar</echo>
<delete dir="${oldClassesDir}"/>
<move file="${newClassesDir}" tofile="${oldClassesDir}"/>
<copy file="${realAggJar}" tofile="${aggJarForDownstream}"/>
</else>
</ac:if>
</target>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
Expand Down
4 changes: 2 additions & 2 deletions build/buildall
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ function build_single_shim() {
-DskipTests \
-Dbuildver="$BUILD_VER" \
-Drat.skip="$SKIP_CHECKS" \
-Dskip \
-Dmaven.scaladoc.skip \
-Dmaven.scalastyle.skip="$SKIP_CHECKS" \
-pl aggregator -am > "$LOG_FILE" 2>&1 || {
[[ "$LOG_FILE" != "/dev/tty" ]] && echo "$LOG_FILE:" && tail -20 "$LOG_FILE" || true
Expand Down Expand Up @@ -303,5 +303,5 @@ time (
echo "Resuming from $joinShimBuildFrom build only using $BASE_VER"
$MVN $FINAL_OP -rf $joinShimBuildFrom $MODULE_OPT $MVN_PROFILE_OPT $INCLUDED_BUILDVERS_OPT \
-Dbuildver="$BASE_VER" \
-DskipTests -Dskip
-DskipTests -Dmaven.scaladoc.skip
)
19 changes: 17 additions & 2 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
<dist.jar.name>${project.build.directory}/${project.build.finalName}-${jni.classifier}.jar</dist.jar.name>
<dist.jar.pom.url>jar:file:${dist.jar.name}!/META-INF/maven/${project.groupId}/${project.artifactId}/pom.xml</dist.jar.pom.url>
<rapids.default.jar.phase>none</rapids.default.jar.phase>
<rapids.jni.unpack.skip>false</rapids.jni.unpack.skip>
</properties>
<profiles>
<profile>
Expand Down Expand Up @@ -323,6 +324,19 @@
</target>
</configuration>
</execution>
<execution>
<id>copy-jni-and-ucx-classes</id>
<!-- after optional unpack -->
<phase>process-resources</phase>
<goals><goal>run</goal></goals>
<configuration>
<target>
<copy todir="${project.build.directory}/parallel-world">
<fileset dir="${project.build.directory}/jni-deps"/>
</copy>
</target>
</configuration>
</execution>
<execution>
<phase>verify</phase>
<goals>
Expand Down Expand Up @@ -447,21 +461,22 @@ self.log("... OK")
<goal>unpack</goal>
</goals>
<configuration>
<skip>${rapids.jni.unpack.skip}</skip>
<artifactItems>
<!-- if add new artifacts, should set `overWrite` as true -->
<artifactItem>
<groupId>com.nvidia</groupId>
<artifactId>spark-rapids-jni</artifactId>
<classifier>${jni.classifier}</classifier>
<excludes>META-INF/**</excludes>
<outputDirectory>${project.build.directory}/parallel-world</outputDirectory>
<outputDirectory>${project.build.directory}/jni-deps</outputDirectory>
<overWrite>true</overWrite>
</artifactItem>
<artifactItem>
<groupId>org.openucx</groupId>
<artifactId>jucx</artifactId>
<excludes>META-INF/**</excludes>
<outputDirectory>${project.build.directory}/parallel-world</outputDirectory>
<outputDirectory>${project.build.directory}/jni-deps</outputDirectory>
<overWrite>true</overWrite>
</artifactItem>
</artifactItems>
Expand Down
4 changes: 2 additions & 2 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright (c) 2021-2022, NVIDIA CORPORATION.
# Copyright (c) 2021-2023, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -157,7 +157,7 @@ mv "$SPARK3XX_COMMON_DIR" parallel-world/
# Determine the list of unshimmed class files
UNSHIMMED_LIST_TXT=unshimmed-result.txt
echo "$((++STEP))/ creating sorted list of unshimmed classes > $UNSHIMMED_LIST_TXT"
find . -name '*.class' -not -path './parallel-world/spark3*' | \
find ./parallel-world -name '*.class' -not -path './parallel-world/spark3*' | \
cut -d/ -f 3- | sort > "$UNSHIMMED_LIST_TXT"

function verify_same_sha_for_unshimmed() {
Expand Down
2 changes: 1 addition & 1 deletion integration_tests/src/assembly/bin.xml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
<outputDirectory>integration_tests</outputDirectory>
</file>
<file>
<source>${project.build.directory}/extra-resources/rapids4spark-version-info.properties</source>
<source>${project.build.outputDirectory}/rapids4spark-version-info.properties</source>
<outputDirectory>integration_tests</outputDirectory>
</file>
</files>
Expand Down
2 changes: 1 addition & 1 deletion jenkins/databricks/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ $MVN_CMD -B -Ddatabricks -Dbuildver=$BUILDVER clean package -DskipTests $MVN_OPT
if [[ "$WITH_DEFAULT_UPSTREAM_SHIM" != "0" ]]; then
echo "Building the default Spark shim and creating a two-shim dist jar"
UPSTREAM_BUILDVER=$($MVN_CMD help:evaluate -q -pl dist -Dexpression=buildver -DforceStdout)
$MVN_CMD -B package -pl dist -am -DskipTests -Dskip $MVN_OPT \
$MVN_CMD -B package -pl dist -am -DskipTests -Dmaven.scaladoc.skip $MVN_OPT \
-Dincluded_buildvers=$UPSTREAM_BUILDVER,$BUILDVER
fi

Expand Down
2 changes: 1 addition & 1 deletion jenkins/spark-premerge-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ fi

CUDA_CLASSIFIER=${CUDA_CLASSIFIER:-'cuda11'}
MVN_CMD="mvn -Dmaven.wagon.http.retryHandler.count=3"
MVN_BUILD_ARGS="-Drat.skip=true -Dskip -Dmaven.scalastyle.skip=true -Dcuda.version=$CUDA_CLASSIFIER"
MVN_BUILD_ARGS="-Drat.skip=true -Dmaven.scaladoc.skip -Dmaven.scalastyle.skip=true -Dcuda.version=$CUDA_CLASSIFIER"

mvn_verify() {
echo "Run mvn verify..."
Expand Down
72 changes: 54 additions & 18 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -819,6 +819,7 @@
<cloudera.repo.enabled>false</cloudera.repo.enabled>
<bloop.installPhase>install</bloop.installPhase>
<bloop.configDirectory>${spark.rapids.source.basedir}/.bloop</bloop.configDirectory>
<build.info.path>${project.build.outputDirectory}/rapids4spark-version-info.properties</build.info.path>
</properties>

<dependencyManagement>
Expand Down Expand Up @@ -966,30 +967,64 @@
</target>
</configuration>
</execution>
<execution>
<id>setup-dirs</id>
<phase>initialize</phase>
<goals><goal>run</goal></goals>
<configuration>
<target>
<mkdir dir="${project.build.directory}/extra-resources"/>
<mkdir dir="${project.build.directory}/tmp"/>
</target>
</configuration>
</execution>
<execution>
<id>generate-build-info</id>
<phase>generate-resources</phase>
<configuration>
<!-- Execute the shell script to generate the plugin build information. -->
<target name="build-info">
<mkdir dir="${project.build.directory}/extra-resources"/>
<mkdir dir="${project.build.directory}/tmp"/>
<exec executable="bash"
output="${project.build.directory}/extra-resources/rapids4spark-version-info.properties"
resultproperty="build-info.exitCode"
errorproperty="build-info.errorMsg"
failonerror="false">
<arg value="${spark.rapids.source.basedir}/build/build-info"/>
<arg value="${project.version}"/>
<arg value="${spark-rapids-jni.version}"/>
<exec executable="git" outputproperty="git.head.revision">
<arg value="rev-parse"/>
<arg value="HEAD"/>
</exec>
<fail message="exec build-info.sh failed, exit code is ${build-info.exitCode}, error msg is ${build-info.errorMsg}">
<condition>
<not>
<equals arg1="${build-info.exitCode}" arg2="0"/>
</not>
</condition>
</fail>
<property file="${build.info.path}" prefix="saved.build-info"/>
<echo>
Comparing git revisions:
previous=${saved.build-info.revision}
current=${git.head.revision}
</echo>
<taskdef resource="net/sf/antcontrib/antcontrib.properties"/>
<ac:if xmlns:ac="antlib:net.sf.antcontrib">
<equals arg1="${git.head.revision}" arg2="${saved.build-info.revision}"/>
<then>
<echo>
Git revisions unchanged: skipping version info file generation.
Delete ${build.info.path} or mvn clean if regeneration desired.
This will force full Scala code rebuild in downstream modules.
</echo>
</then>
<else>
<echo>Generating new version info file</echo>
<mkdir dir="${project.build.outputDirectory}"/>
<exec executable="bash"
output="${build.info.path}"
resultproperty="build-info.exitCode"
errorproperty="build-info.errorMsg"
failonerror="false">
<arg value="${spark.rapids.source.basedir}/build/build-info"/>
<arg value="${project.version}"/>
<arg value="${spark-rapids-jni.version}"/>
</exec>
<fail message="exec build-info.sh failed, exit code is ${build-info.exitCode}, error msg is ${build-info.errorMsg}">
<condition>
<not>
<equals arg1="${build-info.exitCode}" arg2="0"/>
</not>
</condition>
</fail>
</else>
</ac:if>
</target>
</configuration>

Expand Down Expand Up @@ -1049,6 +1084,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<executions>
<execution>
<id>default-compile</id>
Expand Down Expand Up @@ -1118,8 +1154,8 @@
<arg>-Xfatal-warnings</arg>
<!-- #endif scala-2.12 -->
<arg>-Wconf:cat=lint-adapted-args:e</arg>
<arg>-Xsource:2.13</arg>
<!-- #if scala-2.13 --><!--
<arg>-Xsource:2.13</arg>
<arg>-Ywarn-unused:locals,patvars,privates</arg>
<arg>-Wconf:cat=deprecation:wv,any:e</arg>
<arg>-Wconf:cat=scaladoc:wv</arg>
Expand Down
Loading

0 comments on commit 30c3df3

Please sign in to comment.