Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4281][Build] Package Yarn shuffle service into its own jar #3147

Closed
wants to merge 6 commits into from

Conversation

andrewor14
Copy link
Contributor

This is another addendum to #3082, which added the Yarn shuffle service to run inside the NM. This PR makes the feature much more usable by packaging enough dependencies into the jar to run the service inside an NM. After these changes, the user can run ./make-distribution.sh and find a spark-network-yarn*.jar in their lib directory. The equivalent change is done in SBT by making the network-yarn module an assembly project.

Andrew Or added 2 commits November 6, 2014 21:04
This allows make-distribution to create a small uber jar for the
network-yarn module, such that all uses of the Yarn shuffle service
can just drop this jar onto the NM classpath and start the shuffle
service after configuring the NM to include it.
@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23037 has started for PR 3147 at commit abcefd1.

  • This patch merges cleanly.

@andrewor14 andrewor14 changed the title [SPARK-4281] Package Yarn shuffle service into its own jar [SPARK-4281][Build] Package Yarn shuffle service into its own jar Nov 7, 2014
@andrewor14
Copy link
Contributor Author

I have tested the changes in both maven and SBT.
@pwendell Can you review the changes here?

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23037 has finished for PR 3147 at commit abcefd1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StandaloneWorkerShuffleService(sparkConf: SparkConf, securityManager: SecurityManager)
    • public class RetryingBlockFetcher

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23037/
Test PASSed.

@@ -41,12 +41,12 @@
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</dependency>

<!-- Provided dependencies -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move the comment up here? Should this be at the provided scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually Yarn already provides slf4j so it doesn't need to be a core dependency. For standalone mode, this is also already required by Spark so it doesn't need to be a core dependency there either. HOWEVER I just realized I forgot to actually make it provided by adding the tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I added the tag

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23056 has started for PR 3147 at commit 65db822.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23056 has finished for PR 3147 at commit 65db822.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23056/
Test PASSed.

<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<outputFile>${project.build.directory}/scala-${scala.binary.version}/spark-network-yarn-${project.version}-hadoop${hadoop.version}.jar</outputFile>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name this something like spark-yarn-shuffle-hadoop${hadoop.version}? The current name is very generic and, unlike our internal build, this will be user-facing since some folks might need to actually copy this jar into a location for YARN. It might be good to make it very obvious what this is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - if this is going to be compatible across multiple YARN versions, maybe we should actually just put the Spark version instead: spark-${project.version}-yarn-shuffle.

@pwendell
Copy link
Contributor

pwendell commented Nov 8, 2014

Hey Andrew - this looks good. I added some comment, all were regarding how we name the produced jar.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23191 has started for PR 3147 at commit bda58d0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23191 has finished for PR 3147 at commit bda58d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23191/
Test PASSed.

@pwendell
Copy link
Contributor

LGTM

@andrewor14
Copy link
Contributor Author

Ok, I merge into master and 1.2

@pwendell
Copy link
Contributor

Okay, you merge

@asfgit asfgit closed this in aa43a8d Nov 12, 2014
asfgit pushed a commit that referenced this pull request Nov 12, 2014
This is another addendum to #3082, which added the Yarn shuffle service to run inside the NM. This PR makes the feature much more usable by packaging enough dependencies into the jar to run the service inside an NM. After these changes, the user can run `./make-distribution.sh` and find a `spark-network-yarn*.jar` in their `lib` directory. The equivalent change is done in SBT by making the `network-yarn` module an assembly project.

Author: Andrew Or <[email protected]>

Closes #3147 from andrewor14/yarn-shuffle-build and squashes the following commits:

bda58d0 [Andrew Or] Fix line too long
81e9705 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-build
fb7f398 [Andrew Or] Rename jar to spark-{VERSION}-yarn-shuffle.jar
65db822 [Andrew Or] Actually mark slf4j as provided
abcefd1 [Andrew Or] Do the same for SBT
c653028 [Andrew Or] Package network-yarn and its dependencies

(cherry picked from commit aa43a8d)
Signed-off-by: Andrew Or <[email protected]>
@andrewor14 andrewor14 deleted the yarn-shuffle-build branch November 13, 2014 01:26
@JoshRosen
Copy link
Contributor

I ran the make-distribution.sh script and it complained about being unable to find the YARN shuffle jar:

[...]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:30 min
[INFO] Finished at: 2014-11-12T18:53:57-08:00
[INFO] Final Memory: 58M/564M
[INFO] ------------------------------------------------------------------------
cp: /Users/joshrosen/Documents/Spark/network/yarn/target/scala*/spark-*-yarn-shuffle.jar: No such file or directory

@arahuja
Copy link
Contributor

arahuja commented Nov 13, 2014

I had that as well when I did provide -Pyarn to make-distribution.sh, but it works when that is provided

@andrewor14
Copy link
Contributor Author

Good catch. The jar won't be there unless -Pyarn is provided. I'll fix this shortly.

@andrewor14
Copy link
Contributor Author

Alright the hot fix is merged. Thanks for reporting.

asfgit pushed a commit that referenced this pull request Nov 13, 2014
This is introduced in #3147 and is failing builds without the `-Pyarn` profile.

Author: Andrew Or <[email protected]>

Closes #3250 from andrewor14/fix-yarn-shuffle-build and squashes the following commits:

42b3d37 [Andrew Or] Do not fail fast if Yarn shuffle jar does not exist

(cherry picked from commit a0fa1ba)
Signed-off-by: Andrew Or <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 13, 2014
This is introduced in #3147 and is failing builds without the `-Pyarn` profile.

Author: Andrew Or <[email protected]>

Closes #3250 from andrewor14/fix-yarn-shuffle-build and squashes the following commits:

42b3d37 [Andrew Or] Do not fail fast if Yarn shuffle jar does not exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants