Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4281][Build] Package Yarn shuffle service into its own jar #3147

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ echo "Spark $VERSION$GITREVSTRING built for Hadoop $SPARK_HADOOP_VERSION" > "$DI
# Copy jars
cp "$FWDIR"/assembly/target/scala*/*assembly*hadoop*.jar "$DISTDIR/lib/"
cp "$FWDIR"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/"
cp "$FWDIR"/network/yarn/target/scala*/spark-network-yarn*.jar "$DISTDIR/lib/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you adjust the title you'll need to change this too


# Copy example sources (needed for python and SQL)
mkdir -p "$DISTDIR/examples/src/main"
Expand Down
4 changes: 2 additions & 2 deletions network/common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,12 @@
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move the comment up here? Should this be at the provided scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually Yarn already provides slf4j so it doesn't need to be a core dependency. For standalone mode, this is also already required by Spark so it doesn't need to be a core dependency there either. HOWEVER I just realized I forgot to actually make it provided by adding the tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I added the tag

</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand Down
4 changes: 2 additions & 2 deletions network/shuffle/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@
<artifactId>spark-network-common_2.10</artifactId>
<version>${project.version}</version>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand Down
33 changes: 33 additions & 0 deletions network/yarn/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -54,5 +54,38 @@
<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment on what this is doing? I don't know much about the shading plugin, and accordingly I'm not certain why shading is equivalent to producing an assembly jar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the shading plugin is primarily used to create uber jars, and the shading dependency part is just a generally useful thing to do in this process: http://maven.apache.org/plugins/maven-shade-plugin/. This is how we create assembly jars in say the example and core modules, except the only difference here is that we don't actually need to shade any dependencies. I think this is a pretty standard thing to do and I'm not sure if a comment is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I agree with @andrewor14 on this one - it's used pervasively in our maven build and I'm not sure we want to comment it everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Perhaps just <!-- Creates a fat jar, sourced from assembly/pom.xml -->?

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<outputFile>${project.build.directory}/scala-${scala.binary.version}/spark-network-yarn-${project.version}-hadoop${hadoop.version}.jar</outputFile>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name this something like spark-yarn-shuffle-hadoop${hadoop.version}? The current name is very generic and, unlike our internal build, this will be user-facing since some folks might need to actually copy this jar into a location for YARN. It might be good to make it very obvious what this is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - if this is going to be compatible across multiple YARN versions, maybe we should actually just put the Spark version instead: spark-${project.version}-yarn-shuffle.

<artifactSet>
<includes>
<include>*:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
8 changes: 4 additions & 4 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,12 @@ object BuildCommons {
"streaming-flume", "streaming-kafka", "streaming-mqtt", "streaming-twitter",
"streaming-zeromq").map(ProjectRef(buildLocation, _))

val optionallyEnabledProjects@Seq(yarn, yarnStable, yarnAlpha, networkYarn, java8Tests,
sparkGangliaLgpl, sparkKinesisAsl) = Seq("yarn", "yarn-stable", "yarn-alpha", "network-yarn",
val optionallyEnabledProjects@Seq(yarn, yarnStable, yarnAlpha, java8Tests,
sparkGangliaLgpl, sparkKinesisAsl) = Seq("yarn", "yarn-stable", "yarn-alpha",
"java8-tests", "ganglia-lgpl", "kinesis-asl").map(ProjectRef(buildLocation, _))

val assemblyProjects@Seq(assembly, examples) = Seq("assembly", "examples")
.map(ProjectRef(buildLocation, _))
val assemblyProjects@Seq(assembly, examples, networkYarn) =
Seq("assembly", "examples", "network-yarn").map(ProjectRef(buildLocation, _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to make sure the name here matches what is in the maven build. You can modify the name using "jarName in assembly" setting. You'll want to modify it just for the network-yarn project though.


val tools = ProjectRef(buildLocation, "tools")
// Root project.
Expand Down