[SPARK-23941][Mesos] Mesos task failed on specific spark app name #21014

tiboun · 2018-04-09T21:40:01Z

What changes were proposed in this pull request?

Shell escaped the name passed to spark-submit and change how conf attributes are shell escaped.

How was this patch tested?

This test has been tested manually with Hive-on-spark with mesos or with the use case described in the issue with the sparkPi application with a custom name which contains illegal shell characters.

With this PR, hive-on-spark on mesos works like a charm with hive 3.0.0-SNAPSHOT.

I state that this contribution is my original work and that I license the work to the project under the project’s open source license

vanzin · 2018-04-09T21:48:40Z

ok to test

SparkQA · 2018-04-09T22:07:28Z

Test build #89077 has finished for PR 21014 at commit fb078eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung

maybe not super critical, but do we need the same for other params, like mainClass, pyFiles?

tiboun · 2018-04-10T08:37:35Z

IMO, mainClass doesn't need the same because space is not allowed but in order to have a correct command, we may shell escape it and improve error thrown.
PyFiles may have the same fix because it's a path.
Do you want me to apply the shellEscape for both ?

vanzin · 2018-04-10T17:15:30Z

mainClass doesn't need the same because space is not allowed

It may have dollar signs and other things that the shell might want to interpret.

tiboun · 2018-04-11T07:49:32Z

Ah yes, correct Vanzin, I didn't think about that use case. So I will apply shellEscape for both in this PR.

SparkQA · 2018-04-11T08:11:45Z

Test build #89180 has finished for PR 21014 at commit 4732c4b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tiboun · 2018-04-18T18:54:36Z

Hi, do I need to do something else in order for this PR to be merged ?

vanzin · 2018-04-18T23:33:08Z

Wouldn't it be simpler and safer to just shellEscape everything in options before returning it?

SparkQA · 2018-04-26T16:16:35Z

Test build #89892 has finished for PR 21014 at commit 2260e15.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-04-26T17:04:03Z

What I meant was to actually do options.map(shellEscape) in the return line.

tiboun · 2018-04-26T17:17:34Z

Ah yes, I understand what you mean, I'm going to fix it right now

SparkQA · 2018-04-26T17:41:54Z

Test build #89896 has finished for PR 21014 at commit 0038a23.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-05-01T15:27:50Z

Merging to master / 2.3 / 2.2.

## What changes were proposed in this pull request? Shell escaped the name passed to spark-submit and change how conf attributes are shell escaped. ## How was this patch tested? This test has been tested manually with Hive-on-spark with mesos or with the use case described in the issue with the sparkPi application with a custom name which contains illegal shell characters. With this PR, hive-on-spark on mesos works like a charm with hive 3.0.0-SNAPSHOT. I state that this contribution is my original work and that I license the work to the project under the project’s open source license Author: Bounkong Khamphousone <[email protected]> Closes #21014 from tiboun/fix/SPARK-23941. (cherry picked from commit 6782359) Signed-off-by: Marcelo Vanzin <[email protected]>

* [SPARK-23816][CORE] Killed tasks should ignore FetchFailures. SPARK-19276 ensured that FetchFailures do not get swallowed by other layers of exception handling, but it also meant that a killed task could look like a fetch failure. This is particularly a problem with speculative execution, where we expect to kill tasks as they are reading shuffle data. The fix is to ensure that we always check for killed tasks first. Added a new unit test which fails before the fix, ran it 1k times to check for flakiness. Full suite of tests on jenkins. Author: Imran Rashid <[email protected]> Closes apache#20987 from squito/SPARK-23816. (cherry picked from commit 10f45bb) Signed-off-by: Marcelo Vanzin <[email protected]> * [SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might generate a wrong result by codegen. `EqualNullSafe` for `FloatType` and `DoubleType` might generate a wrong result by codegen. ```scala scala> val df = Seq((Some(-1.0d), None), (None, Some(-1.0d))).toDF() df: org.apache.spark.sql.DataFrame = [_1: double, _2: double] scala> df.show() +----+----+ | _1| _2| +----+----+ |-1.0|null| |null|-1.0| +----+----+ scala> df.filter("_1 <=> _2").show() +----+----+ | _1| _2| +----+----+ |-1.0|null| |null|-1.0| +----+----+ ``` The result should be empty but the result remains two rows. Added a test. Author: Takuya UESHIN <[email protected]> Closes apache#21094 from ueshin/issues/SPARK-24007/equalnullsafe. (cherry picked from commit f09a9e9) Signed-off-by: gatorsmile <[email protected]> * [SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table ## What changes were proposed in this pull request? TableReader would get disproportionately slower as the number of columns in the query increased. I fixed the way TableReader was looking up metadata for each column in the row. Previously, it had been looking up this data in linked lists, accessing each linked list by an index (column number). Now it looks up this data in arrays, where indexing by column number works better. ## How was this patch tested? Manual testing All sbt unit tests python sql tests Author: Bruce Robbins <[email protected]> Closes apache#21043 from bersprockets/tabreadfix. * [MINOR][DOCS] Fix comments of SQLExecution#withExecutionId ## What changes were proposed in this pull request? Fix comment. Change `BroadcastHashJoin.broadcastFuture` to `BroadcastExchangeExec.relationFuture`: https://github.com/apache/spark/blob/d28d5732ae205771f1f443b15b10e64dcffb5ff0/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L66 ## How was this patch tested? N/A Author: seancxmao <[email protected]> Closes apache#21113 from seancxmao/SPARK-13136. (cherry picked from commit c303b1b) Signed-off-by: hyukjinkwon <[email protected]> * [SPARK-23941][MESOS] Mesos task failed on specific spark app name ## What changes were proposed in this pull request? Shell escaped the name passed to spark-submit and change how conf attributes are shell escaped. ## How was this patch tested? This test has been tested manually with Hive-on-spark with mesos or with the use case described in the issue with the sparkPi application with a custom name which contains illegal shell characters. With this PR, hive-on-spark on mesos works like a charm with hive 3.0.0-SNAPSHOT. I state that this contribution is my original work and that I license the work to the project under the project’s open source license Author: Bounkong Khamphousone <[email protected]> Closes apache#21014 from tiboun/fix/SPARK-23941. (cherry picked from commit 6782359) Signed-off-by: Marcelo Vanzin <[email protected]> * [SPARK-23433][CORE] Late zombie task completions update all tasksets Fetch failure lead to multiple tasksets which are active for a given stage. While there is only one "active" version of the taskset, the earlier attempts can still have running tasks, which can complete successfully. So a task completion needs to update every taskset so that it knows the partition is completed. That way the final active taskset does not try to submit another task for the same partition, and so that it knows when it is completed and when it should be marked as a "zombie". Added a regression test. Author: Imran Rashid <[email protected]> Closes apache#21131 from squito/SPARK-23433. (cherry picked from commit 94641fe) Signed-off-by: Imran Rashid <[email protected]> * [SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalogVersionsSuite should verify the downloaded file ## What changes were proposed in this pull request? This is a backport of apache#21210 because `branch-2.2` also faces the same failures. Although [SPARK-22654](https://issues.apache.org/jira/browse/SPARK-22654) made `HiveExternalCatalogVersionsSuite` download from Apache mirrors three times, it has been flaky because it didn't verify the downloaded file. Some Apache mirrors terminate the downloading abnormally, the *corrupted* file shows the following errors. ``` gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now 22:46:32.700 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, thread names: Keep-Alive-Timer ===== *** RUN ABORTED *** java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.0"): error=2, No such file or directory ``` This has been reported weirdly in two ways. For example, the above case is reported as Case 2 `no failures`. - Case 1. [Test Result (1 failure / +1)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389/) - Case 2. [Test Result (no failures)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/) This PR aims to make `HiveExternalCatalogVersionsSuite` more robust by verifying the downloaded `tgz` file by extracting and checking the existence of `bin/spark-submit`. If it turns out that the file is empty or corrupted, `HiveExternalCatalogVersionsSuite` will do retry logic like the download failure. ## How was this patch tested? Pass the Jenkins. Author: Dongjoon Hyun <[email protected]> Closes apache#21232 from dongjoon-hyun/SPARK-23489-2. * [SPARK-23697][CORE] LegacyAccumulatorWrapper should define isZero correctly ## What changes were proposed in this pull request? It's possible that Accumulators of Spark 1.x may no longer work with Spark 2.x. This is because `LegacyAccumulatorWrapper.isZero` may return wrong answer if `AccumulableParam` doesn't define equals/hashCode. This PR fixes this by using reference equality check in `LegacyAccumulatorWrapper.isZero`. ## How was this patch tested? a new test Author: Wenchen Fan <[email protected]> Closes apache#21229 from cloud-fan/accumulator. (cherry picked from commit 4d5de4d) Signed-off-by: Wenchen Fan <[email protected]> * [SPARK-21278][PYSPARK] Upgrade to Py4J 0.10.6 This PR aims to bump Py4J in order to fix the following float/double bug. Py4J 0.10.5 fixes this (py4j/py4j#272) and the latest Py4J is 0.10.6. **BEFORE** ``` >>> df = spark.range(1) >>> df.select(df['id'] + 17.133574204226083).show() +--------------------+ |(id + 17.1335742042)| +--------------------+ | 17.1335742042| +--------------------+ ``` **AFTER** ``` >>> df = spark.range(1) >>> df.select(df['id'] + 17.133574204226083).show() +-------------------------+ |(id + 17.133574204226083)| +-------------------------+ | 17.133574204226083| +-------------------------+ ``` Manual. Author: Dongjoon Hyun <[email protected]> Closes apache#18546 from dongjoon-hyun/SPARK-21278. (cherry picked from commit c8d0aba) Signed-off-by: Marcelo Vanzin <[email protected]> * [SPARK-16406][SQL] Improve performance of LogicalPlan.resolve `LogicalPlan.resolve(...)` uses linear searches to find an attribute matching a name. This is fine in normal cases, but gets problematic when you try to resolve a large number of columns on a plan with a large number of attributes. This PR adds an indexing structure to `resolve(...)` in order to find potential matches quicker. This PR improves the reference resolution time for the following code by 4x (11.8s -> 2.4s): ``` scala val n = 4000 val values = (1 to n).map(_.toString).mkString(", ") val columns = (1 to n).map("column" + _).mkString(", ") val query = s""" |SELECT $columns |FROM VALUES ($values) T($columns) |WHERE 1=2 AND 1 IN ($columns) |GROUP BY $columns |ORDER BY $columns |""".stripMargin spark.time(sql(query)) ``` Existing tests. Author: Herman van Hovell <[email protected]> Closes apache#14083 from hvanhovell/SPARK-16406. * [PYSPARK] Update py4j to version 0.10.7. (cherry picked from commit cc613b5) Signed-off-by: Marcelo Vanzin <[email protected]> (cherry picked from commit 323dc3a) Signed-off-by: Marcelo Vanzin <[email protected]> * [SPARKR] Match pyspark features in SparkR communication protocol. (cherry picked from commit 628c7b5) Signed-off-by: Marcelo Vanzin <[email protected]> (cherry picked from commit 16cd9ac) Signed-off-by: Marcelo Vanzin <[email protected]> * Keep old-style messages for AnalysisException with ambiguous references

## What changes were proposed in this pull request? Shell escaped the name passed to spark-submit and change how conf attributes are shell escaped. ## How was this patch tested? This test has been tested manually with Hive-on-spark with mesos or with the use case described in the issue with the sparkPi application with a custom name which contains illegal shell characters. With this PR, hive-on-spark on mesos works like a charm with hive 3.0.0-SNAPSHOT. I state that this contribution is my original work and that I license the work to the project under the project’s open source license Author: Bounkong Khamphousone <[email protected]> Closes apache#21014 from tiboun/fix/SPARK-23941.

## What changes were proposed in this pull request? Shell escaped the name passed to spark-submit and change how conf attributes are shell escaped. ## How was this patch tested? This test has been tested manually with Hive-on-spark with mesos or with the use case described in the issue with the sparkPi application with a custom name which contains illegal shell characters. With this PR, hive-on-spark on mesos works like a charm with hive 3.0.0-SNAPSHOT. I state that this contribution is my original work and that I license the work to the project under the project’s open source license Author: Bounkong Khamphousone <[email protected]> Closes apache#21014 from tiboun/fix/SPARK-23941. (cherry picked from commit 6782359) Signed-off-by: Marcelo Vanzin <[email protected]>

felixcheung reviewed Apr 10, 2018

View reviewed changes

tiboun force-pushed the fix/SPARK-23941 branch from fb078eb to 4732c4b Compare April 11, 2018 07:55

tiboun force-pushed the fix/SPARK-23941 branch from 4732c4b to 2260e15 Compare April 26, 2018 15:57

fix call to spark-submit

0038a23

tiboun force-pushed the fix/SPARK-23941 branch from 2260e15 to 0038a23 Compare April 26, 2018 17:24

asfgit closed this in 6782359 May 1, 2018

krcz mentioned this pull request Jun 18, 2018

[SPARK-23464][MESOS] Fix mesos cluster scheduler options double-escaping #20641

Closed

This was referenced Jun 22, 2018

Fix mesos cluster scheduler options double-escaping mesosphere/spark#32

Closed

[SPARK-23941][MESOS] Mesos task failed on specific spark app name mesosphere/spark#33

Merged

samvantran mentioned this pull request Aug 2, 2018

[DCOS-38138] Update Spark CLI for shell-escape fix mesosphere/spark-build#388

Merged

farhan5900 mentioned this pull request Oct 20, 2020

[SPARK-33199] Mesos Task Failed when pyFiles and docker image option used together mesosphere/spark#89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23941][Mesos] Mesos task failed on specific spark app name #21014

[SPARK-23941][Mesos] Mesos task failed on specific spark app name #21014

tiboun commented Apr 9, 2018 •

edited

Loading

vanzin commented Apr 9, 2018

SparkQA commented Apr 9, 2018

felixcheung left a comment

tiboun commented Apr 10, 2018 •

edited

Loading

vanzin commented Apr 10, 2018

tiboun commented Apr 11, 2018

SparkQA commented Apr 11, 2018

tiboun commented Apr 18, 2018

vanzin commented Apr 18, 2018

SparkQA commented Apr 26, 2018

vanzin commented Apr 26, 2018

tiboun commented Apr 26, 2018

SparkQA commented Apr 26, 2018

vanzin commented May 1, 2018

[SPARK-23941][Mesos] Mesos task failed on specific spark app name #21014

[SPARK-23941][Mesos] Mesos task failed on specific spark app name #21014

Conversation

tiboun commented Apr 9, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

vanzin commented Apr 9, 2018

SparkQA commented Apr 9, 2018

felixcheung left a comment

Choose a reason for hiding this comment

tiboun commented Apr 10, 2018 • edited Loading

vanzin commented Apr 10, 2018

tiboun commented Apr 11, 2018

SparkQA commented Apr 11, 2018

tiboun commented Apr 18, 2018

vanzin commented Apr 18, 2018

SparkQA commented Apr 26, 2018

vanzin commented Apr 26, 2018

tiboun commented Apr 26, 2018

SparkQA commented Apr 26, 2018

vanzin commented May 1, 2018

tiboun commented Apr 9, 2018 •

edited

Loading

tiboun commented Apr 10, 2018 •

edited

Loading