[SPARK-4298][Core] - The spark-submit cannot read Main-Class from Manifest. #3561

brennonyork · 2014-12-02T19:48:51Z

Resolves a bug where the Main-Class from a .jar file wasn't being read in properly. This was caused by the fact that the primaryResource object was a URI and needed to be normalized through a call to .getPath before it could be passed into the JarFile object.

AmplabJenkins · 2014-12-02T19:52:12Z

Can one of the admins verify this patch?

JoshRosen · 2014-12-05T00:06:07Z

Jenkins, this is ok to test.

SparkQA · 2014-12-05T00:10:02Z

Test build #24150 has started for PR 3561 at commit 8d20936.

This patch merges cleanly.

JoshRosen · 2014-12-05T00:27:36Z

This sounds good to me, although there's one edge case that I'm curious about: what if my main JAR is hosted in HDFS and has a URI like hdfs://path/to/my/library.jar? In that case, I think we'll run into problems when trying to read the main class. That's a problem in the current code, though, and this patch seems like a strict improvement over the current behavior.

I wonder how hard it would be to support this for JARs hosted in HDFS, since that might be useful when submitting jobs under cluster deploy mode. I suppose we could try downloading the JAR to read its manifest. @andrewor14, any thoughts here?

JoshRosen · 2014-12-05T00:28:24Z

EDIT: mistakenly had file:// instead of hdfs:// in earlier comment.

SparkQA · 2014-12-05T01:11:38Z

Test build #24150 has finished for PR 3561 at commit 8d20936.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-05T01:11:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24150/
Test FAILed.

brennonyork · 2014-12-08T18:42:31Z

@JoshRosen I'm pretty sure we can definitely support the hdfs:// URI model. I'll look and see if, given an hdfs:// URI, Spark would already have some sort of Hadoop Configuration object representing the connection made, but, if not, can always make one.

Also, can you help me understand why the tests failed? I'm seeing:

[error] (streaming/test:test) sbt.TestsFailedException: Tests unsuccessful

But that isn't really that helpful and, as with all the talk on the dev distro, I'm just wondering if its the patch that fails or if its a timing / sync issue (./dev/run-tests finishes without fail on my OSX machine).

JoshRosen · 2014-12-08T19:10:18Z

It looks like that failure is due to a (known) flaky Spark Streaming test:

[info] CheckpointSuite:
[info] - basic rdd checkpoints + dstream graph checkpoint recovery (17 seconds, 99 milliseconds)
[info] - persistence of conf through checkpoints (1 second, 180 milliseconds)
[info] - recovery with map and reduceByKey operations (7 seconds, 294 milliseconds)
[info] - recovery with invertible reduceByKeyAndWindow operation (9 seconds, 363 milliseconds)
[info] - recovery with saveAsHadoopFiles operation (7 seconds, 299 milliseconds)
[info] - recovery with saveAsNewAPIHadoopFiles operation (7 seconds, 292 milliseconds)
[info] - recovery with updateStateByKey operation (9 seconds, 272 milliseconds)
[info] - recovery with file input stream *** FAILED *** (10 seconds, 283 milliseconds)
[info]   List() was empty (CheckpointSuite.scala:336)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
[info]   at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
[info]   at org.apache.spark.streaming.CheckpointSuite$$anonfun$12.apply$mcV$sp(CheckpointSuite.scala:336)
[info]   at org.apache.spark.streaming.CheckpointSuite$$anonfun$12.apply(CheckpointSuite.scala:282)
[info]   at org.apache.spark.streaming.CheckpointSuite$$anonfun$12.apply(CheckpointSuite.scala:282)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
[info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
[...]

I could have Jenkins retest this, but to avoid spam I'll just wait until you push a new commit to handle the hdfs:// case.

JoshRosen · 2014-12-08T19:11:52Z

Hmm, it looks like there's already a JIRA for that particular test's flakiness: SPARK-1600.

andrewor14 · 2014-12-09T22:10:13Z

Hey it seems that we can only support this feature when the jar is local (i.e. with the file:/ URI scheme). In general I think it's hard to support this for all schemes, unless we download the jar from HDFS or something here, but this may have other implications. For this PR, I think it makes sense to only support it for the file scheme.

brennonyork · 2014-12-09T23:32:35Z

Given that we're only supporting the file scheme then, should we rerun the flaky streaming test to ensure everything is okay?

andrewor14 · 2014-12-09T23:39:35Z

We could... but we'll need to re-run the tests anyway after you make your changes. Retest this please

SparkQA · 2014-12-09T23:45:31Z

Test build #24272 has started for PR 3561 at commit 8d20936.

This patch merges cleanly.

JoshRosen · 2014-12-10T00:16:54Z

Even if we're only going to support the file:// scheme for now, we should add a check that the URI actually uses the file:// scheme before attempting to strip it off and treat the path as a local path. If we don't do this, then we risk bugs where someone passes a hdfs:// path but we read the file from a local path.

(I think that this was implicit in my comments (and Andrew's), but I just wanted to say it a bit more explicitly to avoid any confusion).

SparkQA · 2014-12-10T01:08:07Z

Test build #24272 has finished for PR 3561 at commit 8d20936.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-10T01:08:10Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24272/
Test PASSed.

…e' URI to load the main-class

SparkQA · 2014-12-11T21:32:55Z

Test build #24380 has started for PR 3561 at commit c6dad68.

This patch merges cleanly.

brennonyork · 2014-12-11T21:33:13Z

@JoshRosen and @andrewor14, updated to add a match statement for the URI supporting the file:// scheme. Figured a case would be best if we wanted to look at supporting the hdfs:// scheme in the future. Thoughts?

andrewor14 · 2014-12-11T21:44:53Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

          return
+        }


a little confused, what is the return type of this block? It seems to me that it's going to be Unit. Does this actually compile?

…aces, removed { } from case statements, removed return values

brennonyork · 2014-12-11T22:17:00Z

@andrewor14 updated per all the relevant requests and retested everything! :) Any other issues?

SparkQA · 2014-12-11T22:20:16Z

Test build #24382 has started for PR 3561 at commit 14daa20.

This patch merges cleanly.

SparkQA · 2014-12-11T22:53:52Z

Test build #24380 has finished for PR 3561 at commit c6dad68.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- SparkSubmit.printErrorAndExit("Cannot load main class from JAR: " + primaryResource)
- SparkSubmit.printErrorAndExit("Cannot load main class from JAR: " + primaryResource +

andrewor14 · 2014-12-18T20:09:48Z

Maybe SparkSubmit threw an exception somewhere but it's swallowed in the tests?

JoshRosen · 2014-12-19T00:57:05Z

Aha, I found the problem: it looks like tests in SparkSubmitSuite end up calling SparkSubmit.main(), which sets system properties. I added some debug logging:

--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -328,6 +328,7 @@ object SparkSubmit {
     }

     for ((key, value) <- sysProps) {
+      println(s"Setting system property $key $value")
       System.setProperty(key, value)
     }

Now, when I run SparkSubmitSuite:

[info] SparkSubmitSuite:
[info] - prints usage on empty input (92 milliseconds)
[info] - prints usage with only --help (4 milliseconds)
[info] - prints error with unrecognized options (4 milliseconds)
Setting system property SPARK_SUBMIT true
Setting system property spark.app.name foo.jar
Setting system property spark.jars file:/Users/joshrosen/Documents/spark/core/foo.jar
Setting system property spark.master local[*]
[info] - handle binary specified but not class (11 milliseconds)
[info] - handles arguments with --key=val (14 milliseconds)
[info] - handles arguments to user program (2 milliseconds)
[info] - handles arguments to user program with name collision (2 milliseconds)
[info] - handles YARN cluster mode (22 milliseconds)
[info] - handles YARN client mode (8 milliseconds)
[info] - handles standalone cluster mode (7 milliseconds)
[info] - handles standalone client mode (3 milliseconds)
[info] - handles mesos client mode (3 milliseconds)
[info] - handles confs with flag equivalents (4 milliseconds)

A general fix for this issue is to use test fixtures that ensure that system properties which are modified in tests are restored to their old values after the tests finish. I have some code to do this, so I can see about submitting a separate PR to add that fixture.

JoshRosen · 2014-12-19T01:48:54Z

I've opened #3739 to try to systematically clean up our explicit usages of System.setProperty in the test code, as well as to address the implicit calls that broke this test.

brennonyork · 2014-12-19T18:01:44Z

@JoshRosen great catch! Sounds like this can't be accepted until #3739 is completed, but glad we have a resolution.

JoshRosen · 2014-12-31T02:16:30Z

I've merged #3739 into master, so this should be ready to retest.

JoshRosen · 2014-12-31T02:16:32Z

Jenkins, retest this please.

SparkQA · 2014-12-31T02:17:46Z

Test build #24924 has started for PR 3561 at commit 14daa20.

This patch merges cleanly.

SparkQA · 2014-12-31T03:23:55Z

Test build #24924 has finished for PR 3561 at commit 14daa20.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-31T03:23:58Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24924/
Test FAILed.

brennonyork · 2014-12-31T04:40:42Z

Is org.apache.spark.storage.BlockNotFoundException: Block taskresult_0 not found that a common timing-related test issue? If not I can definitely look into what might be causing that :/

JoshRosen · 2014-12-31T04:41:50Z

I'm still investigating; it might be caused by my PR, but it's not failing deterministically in all builds so I'm not sure. I can dig in, but I'm sure it's not caused by this PR.

JoshRosen · 2014-12-31T04:43:46Z

Jenkins, retest this please.

SparkQA · 2014-12-31T04:47:33Z

Test build #24938 has started for PR 3561 at commit 14daa20.

This patch merges cleanly.

SparkQA · 2014-12-31T06:13:38Z

Test build #24938 has finished for PR 3561 at commit 14daa20.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-31T06:13:42Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24938/
Test PASSed.

JoshRosen · 2014-12-31T06:19:46Z

Looks like that's some unrelated test flakiness, since all tests are now passing. Since it doesn't seem like my PR broke any tests, let me go ahead and finish backporting my PR into the maintenance branches. After that, I'll loop back here, perform the minor edits that Andrew and I suggested, and merge this in.

(If you have time to do those edits yourself, it'd save me a little bit of time, but not a big deal if you don't get around to it before I merge).

…original code to above its necessary code segment

brennonyork · 2014-12-31T17:46:08Z

@JoshRosen took care of the minor edits for ya!

SparkQA · 2014-12-31T17:47:35Z

Test build #24961 has started for PR 3561 at commit 5e0fce1.

This patch merges cleanly.

SparkQA · 2014-12-31T19:13:30Z

Test build #24961 has finished for PR 3561 at commit 5e0fce1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-31T19:13:34Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24961/
Test PASSed.

JoshRosen · 2014-12-31T19:53:39Z

I finished my backports of the other patch, so I'm going to merge this now. Thanks!

…ifest. Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object. Author: Brennon York <[email protected]> Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits: 5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment 14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class 8d20936 [Brennon York] updated to reset the error message back to the default a043039 [Brennon York] updated to split the uri and jar vals 8da7cbf [Brennon York] fixes SPARK-4298 (cherry picked from commit 8e14c5e) Signed-off-by: Josh Rosen <[email protected]>

…ifest. Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object. Author: Brennon York <[email protected]> Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits: 5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment 14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class 8d20936 [Brennon York] updated to reset the error message back to the default a043039 [Brennon York] updated to split the uri and jar vals 8da7cbf [Brennon York] fixes SPARK-4298 (cherry picked from commit 8e14c5e) Signed-off-by: Josh Rosen <[email protected]> Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

JoshRosen · 2014-12-31T20:01:18Z

Alright, I've merged this to master (1.3.0), branch-1.2 (1.2.1), branch-1.1 (1.1.2), and branch-1.0 (1.0.3).

Brennon York added 3 commits November 27, 2014 12:04

fixes SPARK-4298

8da7cbf

updated to split the uri and jar vals

a043039

updated to reset the error message back to the default

8d20936

Set case statement to support multiple jar URI's and enabled the 'fil…

c6dad68

…e' URI to load the main-class

andrewor14 reviewed Dec 11, 2014
View reviewed changes

pushed mainClass assignment into match statement, removed spurious sp…

14daa20

…aces, removed { } from case statements, removed return values

JoshRosen mentioned this pull request Dec 19, 2014

[SPARK-1010] Clean up uses of System.setProperty in unit tests #3739

Closed

Use string interpolation for error messages, moved comment line from …

5e0fce1

…original code to above its necessary code segment

asfgit closed this in 8e14c5e Dec 31, 2014

[SPARK-4298][Core] - The spark-submit cannot read Main-Class from Manifest. #3561

[SPARK-4298][Core] - The spark-submit cannot read Main-Class from Manifest. #3561

Conversation

brennonyork commented Dec 2, 2014

AmplabJenkins commented Dec 2, 2014

JoshRosen commented Dec 5, 2014

SparkQA commented Dec 5, 2014

JoshRosen commented Dec 5, 2014

JoshRosen commented Dec 5, 2014

SparkQA commented Dec 5, 2014

AmplabJenkins commented Dec 5, 2014

brennonyork commented Dec 8, 2014

JoshRosen commented Dec 8, 2014

JoshRosen commented Dec 8, 2014

andrewor14 commented Dec 9, 2014

brennonyork commented Dec 9, 2014

andrewor14 commented Dec 9, 2014

SparkQA commented Dec 9, 2014

JoshRosen commented Dec 10, 2014

SparkQA commented Dec 10, 2014

AmplabJenkins commented Dec 10, 2014

SparkQA commented Dec 11, 2014

brennonyork commented Dec 11, 2014

andrewor14 Dec 11, 2014

Choose a reason for hiding this comment

brennonyork commented Dec 11, 2014

SparkQA commented Dec 11, 2014

SparkQA commented Dec 11, 2014

andrewor14 commented Dec 18, 2014

JoshRosen commented Dec 19, 2014

JoshRosen commented Dec 19, 2014

brennonyork commented Dec 19, 2014

JoshRosen commented Dec 31, 2014

JoshRosen commented Dec 31, 2014

SparkQA commented Dec 31, 2014

SparkQA commented Dec 31, 2014

AmplabJenkins commented Dec 31, 2014

brennonyork commented Dec 31, 2014

JoshRosen commented Dec 31, 2014

JoshRosen commented Dec 31, 2014

SparkQA commented Dec 31, 2014

SparkQA commented Dec 31, 2014

AmplabJenkins commented Dec 31, 2014

JoshRosen commented Dec 31, 2014

brennonyork commented Dec 31, 2014

SparkQA commented Dec 31, 2014

SparkQA commented Dec 31, 2014

AmplabJenkins commented Dec 31, 2014

JoshRosen commented Dec 31, 2014

JoshRosen commented Dec 31, 2014