Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4298][Core] - The spark-submit cannot read Main-Class from Manifest. #3561

Closed
wants to merge 6 commits into from

Conversation

brennonyork
Copy link

Resolves a bug where the Main-Class from a .jar file wasn't being read in properly. This was caused by the fact that the primaryResource object was a URI and needed to be normalized through a call to .getPath before it could be passed into the JarFile object.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Dec 5, 2014

Test build #24150 has started for PR 3561 at commit 8d20936.

  • This patch merges cleanly.

@JoshRosen
Copy link
Contributor

This sounds good to me, although there's one edge case that I'm curious about: what if my main JAR is hosted in HDFS and has a URI like hdfs://path/to/my/library.jar? In that case, I think we'll run into problems when trying to read the main class. That's a problem in the current code, though, and this patch seems like a strict improvement over the current behavior.

I wonder how hard it would be to support this for JARs hosted in HDFS, since that might be useful when submitting jobs under cluster deploy mode. I suppose we could try downloading the JAR to read its manifest. @andrewor14, any thoughts here?

@JoshRosen
Copy link
Contributor

EDIT: mistakenly had file:// instead of hdfs:// in earlier comment.

@SparkQA
Copy link

SparkQA commented Dec 5, 2014

Test build #24150 has finished for PR 3561 at commit 8d20936.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24150/
Test FAILed.

@brennonyork
Copy link
Author

@JoshRosen I'm pretty sure we can definitely support the hdfs:// URI model. I'll look and see if, given an hdfs:// URI, Spark would already have some sort of Hadoop Configuration object representing the connection made, but, if not, can always make one.

Also, can you help me understand why the tests failed? I'm seeing:

[error] (streaming/test:test) sbt.TestsFailedException: Tests unsuccessful

But that isn't really that helpful and, as with all the talk on the dev distro, I'm just wondering if its the patch that fails or if its a timing / sync issue (./dev/run-tests finishes without fail on my OSX machine).

@JoshRosen
Copy link
Contributor

It looks like that failure is due to a (known) flaky Spark Streaming test:

[info] CheckpointSuite:
[info] - basic rdd checkpoints + dstream graph checkpoint recovery (17 seconds, 99 milliseconds)
[info] - persistence of conf through checkpoints (1 second, 180 milliseconds)
[info] - recovery with map and reduceByKey operations (7 seconds, 294 milliseconds)
[info] - recovery with invertible reduceByKeyAndWindow operation (9 seconds, 363 milliseconds)
[info] - recovery with saveAsHadoopFiles operation (7 seconds, 299 milliseconds)
[info] - recovery with saveAsNewAPIHadoopFiles operation (7 seconds, 292 milliseconds)
[info] - recovery with updateStateByKey operation (9 seconds, 272 milliseconds)
[info] - recovery with file input stream *** FAILED *** (10 seconds, 283 milliseconds)
[info]   List() was empty (CheckpointSuite.scala:336)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
[info]   at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
[info]   at org.apache.spark.streaming.CheckpointSuite$$anonfun$12.apply$mcV$sp(CheckpointSuite.scala:336)
[info]   at org.apache.spark.streaming.CheckpointSuite$$anonfun$12.apply(CheckpointSuite.scala:282)
[info]   at org.apache.spark.streaming.CheckpointSuite$$anonfun$12.apply(CheckpointSuite.scala:282)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
[info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
[...]

I could have Jenkins retest this, but to avoid spam I'll just wait until you push a new commit to handle the hdfs:// case.

@JoshRosen
Copy link
Contributor

Hmm, it looks like there's already a JIRA for that particular test's flakiness: SPARK-1600.

@andrewor14
Copy link
Contributor

Hey it seems that we can only support this feature when the jar is local (i.e. with the file:/ URI scheme). In general I think it's hard to support this for all schemes, unless we download the jar from HDFS or something here, but this may have other implications. For this PR, I think it makes sense to only support it for the file scheme.

@brennonyork
Copy link
Author

Given that we're only supporting the file scheme then, should we rerun the flaky streaming test to ensure everything is okay?

@andrewor14
Copy link
Contributor

We could... but we'll need to re-run the tests anyway after you make your changes. Retest this please

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24272 has started for PR 3561 at commit 8d20936.

  • This patch merges cleanly.

@JoshRosen
Copy link
Contributor

Even if we're only going to support the file:// scheme for now, we should add a check that the URI actually uses the file:// scheme before attempting to strip it off and treat the path as a local path. If we don't do this, then we risk bugs where someone passes a hdfs:// path but we read the file from a local path.

(I think that this was implicit in my comments (and Andrew's), but I just wanted to say it a bit more explicitly to avoid any confusion).

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24272 has finished for PR 3561 at commit 8d20936.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24272/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 11, 2014

Test build #24380 has started for PR 3561 at commit c6dad68.

  • This patch merges cleanly.

@brennonyork
Copy link
Author

@JoshRosen and @andrewor14, updated to add a match statement for the URI supporting the file:// scheme. Figured a case would be best if we wanted to look at supporting the hdfs:// scheme in the future. Thoughts?

return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little confused, what is the return type of this block? It seems to me that it's going to be Unit. Does this actually compile?

…aces, removed { } from case statements, removed return values
@brennonyork
Copy link
Author

@andrewor14 updated per all the relevant requests and retested everything! :) Any other issues?

@SparkQA
Copy link

SparkQA commented Dec 11, 2014

Test build #24382 has started for PR 3561 at commit 14daa20.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 11, 2014

Test build #24380 has finished for PR 3561 at commit c6dad68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • SparkSubmit.printErrorAndExit("Cannot load main class from JAR: " + primaryResource)
    • SparkSubmit.printErrorAndExit("Cannot load main class from JAR: " + primaryResource +

@andrewor14
Copy link
Contributor

Maybe SparkSubmit threw an exception somewhere but it's swallowed in the tests?

@JoshRosen
Copy link
Contributor

Aha, I found the problem: it looks like tests in SparkSubmitSuite end up calling SparkSubmit.main(), which sets system properties. I added some debug logging:

--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -328,6 +328,7 @@ object SparkSubmit {
     }

     for ((key, value) <- sysProps) {
+      println(s"Setting system property $key $value")
       System.setProperty(key, value)
     }

Now, when I run SparkSubmitSuite:

[info] SparkSubmitSuite:
[info] - prints usage on empty input (92 milliseconds)
[info] - prints usage with only --help (4 milliseconds)
[info] - prints error with unrecognized options (4 milliseconds)
Setting system property SPARK_SUBMIT true
Setting system property spark.app.name foo.jar
Setting system property spark.jars file:/Users/joshrosen/Documents/spark/core/foo.jar
Setting system property spark.master local[*]
[info] - handle binary specified but not class (11 milliseconds)
[info] - handles arguments with --key=val (14 milliseconds)
[info] - handles arguments to user program (2 milliseconds)
[info] - handles arguments to user program with name collision (2 milliseconds)
[info] - handles YARN cluster mode (22 milliseconds)
[info] - handles YARN client mode (8 milliseconds)
[info] - handles standalone cluster mode (7 milliseconds)
[info] - handles standalone client mode (3 milliseconds)
[info] - handles mesos client mode (3 milliseconds)
[info] - handles confs with flag equivalents (4 milliseconds)

A general fix for this issue is to use test fixtures that ensure that system properties which are modified in tests are restored to their old values after the tests finish. I have some code to do this, so I can see about submitting a separate PR to add that fixture.

@JoshRosen
Copy link
Contributor

I've opened #3739 to try to systematically clean up our explicit usages of System.setProperty in the test code, as well as to address the implicit calls that broke this test.

@brennonyork
Copy link
Author

@JoshRosen great catch! Sounds like this can't be accepted until #3739 is completed, but glad we have a resolution.

@JoshRosen
Copy link
Contributor

I've merged #3739 into master, so this should be ready to retest.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24924 has started for PR 3561 at commit 14daa20.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24924 has finished for PR 3561 at commit 14daa20.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24924/
Test FAILed.

@brennonyork
Copy link
Author

Is org.apache.spark.storage.BlockNotFoundException: Block taskresult_0 not found that a common timing-related test issue? If not I can definitely look into what might be causing that :/

@JoshRosen
Copy link
Contributor

I'm still investigating; it might be caused by my PR, but it's not failing deterministically in all builds so I'm not sure. I can dig in, but I'm sure it's not caused by this PR.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24938 has started for PR 3561 at commit 14daa20.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24938 has finished for PR 3561 at commit 14daa20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24938/
Test PASSed.

@JoshRosen
Copy link
Contributor

Looks like that's some unrelated test flakiness, since all tests are now passing. Since it doesn't seem like my PR broke any tests, let me go ahead and finish backporting my PR into the maintenance branches. After that, I'll loop back here, perform the minor edits that Andrew and I suggested, and merge this in.

(If you have time to do those edits yourself, it'd save me a little bit of time, but not a big deal if you don't get around to it before I merge).

…original code to above its necessary code segment
@brennonyork
Copy link
Author

@JoshRosen took care of the minor edits for ya!

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24961 has started for PR 3561 at commit 5e0fce1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24961 has finished for PR 3561 at commit 5e0fce1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24961/
Test PASSed.

@JoshRosen
Copy link
Contributor

I finished my backports of the other patch, so I'm going to merge this now. Thanks!

asfgit pushed a commit that referenced this pull request Dec 31, 2014
…ifest.

Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object.

Author: Brennon York <[email protected]>

Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits:

5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment
14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values
c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class
8d20936 [Brennon York] updated to reset the error message back to the default
a043039 [Brennon York] updated to split the uri and jar vals
8da7cbf [Brennon York] fixes SPARK-4298

(cherry picked from commit 8e14c5e)
Signed-off-by: Josh Rosen <[email protected]>
@asfgit asfgit closed this in 8e14c5e Dec 31, 2014
asfgit pushed a commit that referenced this pull request Dec 31, 2014
…ifest.

Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object.

Author: Brennon York <[email protected]>

Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits:

5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment
14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values
c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class
8d20936 [Brennon York] updated to reset the error message back to the default
a043039 [Brennon York] updated to split the uri and jar vals
8da7cbf [Brennon York] fixes SPARK-4298

(cherry picked from commit 8e14c5e)
Signed-off-by: Josh Rosen <[email protected]>

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
asfgit pushed a commit that referenced this pull request Dec 31, 2014
…ifest.

Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object.

Author: Brennon York <[email protected]>

Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits:

5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment
14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values
c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class
8d20936 [Brennon York] updated to reset the error message back to the default
a043039 [Brennon York] updated to split the uri and jar vals
8da7cbf [Brennon York] fixes SPARK-4298

(cherry picked from commit 8e14c5e)
Signed-off-by: Josh Rosen <[email protected]>

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
@JoshRosen
Copy link
Contributor

Alright, I've merged this to master (1.3.0), branch-1.2 (1.2.1), branch-1.1 (1.1.2), and branch-1.0 (1.0.3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants