Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][DOCS] Clarify that Spark apps should mark Spark as a 'provided' dependency, not package it #23938

Closed
wants to merge 1 commit into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Mar 2, 2019

What changes were proposed in this pull request?

Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency.

Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this.

How was this patch tested?

Doc build

@srowen srowen self-assigned this Mar 2, 2019
@SparkQA
Copy link

SparkQA commented Mar 2, 2019

Test build #102943 has finished for PR 23938 at commit f8fcc52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all for it. could we add some explanation?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update Accessing OpenStack Swift from Spark documentation, too?

@srowen
Copy link
Member Author

srowen commented Mar 3, 2019

@dongjoon-hyun yeah, that one I wasn't sure about as it's some support code that sounded like it was meant to be bundled in an app. @steveloughran is that correct -- hadoop-cloud should be a compile scope dependency, not provided by the cluster?

@steveloughran
Copy link
Contributor

you should compile with hadoop-cloud and add those JARs it pulls in to the spark tarball placed on the shared cluster FS for YARN to pick up. Don't know about other deployment engines I'm afraid. The build also adds it to the SPARK_HOME/lib, which gives it to you for spark-standalone during spark submit, either for anything related to JAR upload, or for any store which implements delegation tokens (HADOOP-14456, HADOOP-16068, etc), so it collects the tokens for all stores listed in spark.yarn.hadoopFilesystems.

@srowen
Copy link
Member Author

srowen commented Mar 4, 2019

@steveloughran to be clear do you compile your app, or Spark, with this dependency? it sounds like "Spark" not the app. If so I'll update this further.

@steveloughran
Copy link
Contributor

sorry, yeah, spark.

Even if the spark team doesn't redist those JARs, it'd be really useful if the release process published the POM. that way, if you want your build to pick up the exact set of dependencies which are in sync with spark, excluding all the stuff which will cause grief, you'd just add it as a dependency.

@srowen
Copy link
Member Author

srowen commented Mar 4, 2019

Ah OK on further review @steveloughran , the docs here are saying to include the dependency in your app, which would be the right thing if not bundled by Spark, and that's the current state of things for a default cluster. I think that much of the doc is then OK, and shouldn't change to mentioned provided.

srowen added a commit that referenced this pull request Mar 5, 2019
…d' dependency, not package it

## What changes were proposed in this pull request?

Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency.

Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this.

## How was this patch tested?

Doc build

Closes #23938 from srowen/Provided.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 3909223)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member Author

srowen commented Mar 5, 2019

Merged to master/2.4/2.3

srowen added a commit that referenced this pull request Mar 5, 2019
…d' dependency, not package it

## What changes were proposed in this pull request?

Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency.

Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this.

## How was this patch tested?

Doc build

Closes #23938 from srowen/Provided.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 3909223)
Signed-off-by: Sean Owen <[email protected]>
@srowen srowen closed this in 3909223 Mar 5, 2019
@srowen srowen deleted the Provided branch March 10, 2019 19:08
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…d' dependency, not package it

## What changes were proposed in this pull request?

Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency.

Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this.

## How was this patch tested?

Doc build

Closes apache#23938 from srowen/Provided.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 3909223)
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
…d' dependency, not package it

## What changes were proposed in this pull request?

Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency.

Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this.

## How was this patch tested?

Doc build

Closes apache#23938 from srowen/Provided.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 3909223)
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…d' dependency, not package it

## What changes were proposed in this pull request?

Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency.

Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this.

## How was this patch tested?

Doc build

Closes apache#23938 from srowen/Provided.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 3909223)
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants