-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR][DOCS] Clarify that Spark apps should mark Spark as a 'provided' dependency, not package it #23938
Conversation
Test build #102943 has finished for PR 23938 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all for it. could we add some explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update Accessing OpenStack Swift from Spark
documentation, too?
@dongjoon-hyun yeah, that one I wasn't sure about as it's some support code that sounded like it was meant to be bundled in an app. @steveloughran is that correct -- |
you should compile with hadoop-cloud and add those JARs it pulls in to the spark tarball placed on the shared cluster FS for YARN to pick up. Don't know about other deployment engines I'm afraid. The build also adds it to the SPARK_HOME/lib, which gives it to you for spark-standalone during spark submit, either for anything related to JAR upload, or for any store which implements delegation tokens (HADOOP-14456, HADOOP-16068, etc), so it collects the tokens for all stores listed in spark.yarn.hadoopFilesystems. |
@steveloughran to be clear do you compile your app, or Spark, with this dependency? it sounds like "Spark" not the app. If so I'll update this further. |
sorry, yeah, spark. Even if the spark team doesn't redist those JARs, it'd be really useful if the release process published the POM. that way, if you want your build to pick up the exact set of dependencies which are in sync with spark, excluding all the stuff which will cause grief, you'd just add it as a dependency. |
Ah OK on further review @steveloughran , the docs here are saying to include the dependency in your app, which would be the right thing if not bundled by Spark, and that's the current state of things for a default cluster. I think that much of the doc is then OK, and shouldn't change to mentioned |
…d' dependency, not package it ## What changes were proposed in this pull request? Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency. Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this. ## How was this patch tested? Doc build Closes #23938 from srowen/Provided. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 3909223) Signed-off-by: Sean Owen <[email protected]>
Merged to master/2.4/2.3 |
…d' dependency, not package it ## What changes were proposed in this pull request? Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency. Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this. ## How was this patch tested? Doc build Closes #23938 from srowen/Provided. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 3909223) Signed-off-by: Sean Owen <[email protected]>
…d' dependency, not package it ## What changes were proposed in this pull request? Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency. Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this. ## How was this patch tested? Doc build Closes apache#23938 from srowen/Provided. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 3909223) Signed-off-by: Sean Owen <[email protected]>
…d' dependency, not package it ## What changes were proposed in this pull request? Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency. Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this. ## How was this patch tested? Doc build Closes apache#23938 from srowen/Provided. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 3909223) Signed-off-by: Sean Owen <[email protected]>
…d' dependency, not package it ## What changes were proposed in this pull request? Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency. Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this. ## How was this patch tested? Doc build Closes apache#23938 from srowen/Provided. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 3909223) Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
Spark apps do not need to package Spark. In fact it can cause problems in some cases. Our examples should show depending on Spark as a 'provided' dependency.
Packaging Spark makes the app much bigger by tens of megabytes. It can also bring in conflicting dependencies that wouldn't otherwise be a problem. https://issues.apache.org/jira/browse/SPARK-26146 was what reminded me of this.
How was this patch tested?
Doc build