Mkim/branch 1.4.1 palantir1 #17

mingyukim · 2015-07-23T06:38:58Z

No description provided.

**Symptom.** If an executor in an application times out, `HeartbeatReceiver` attempts to kill it. After this happens, however, the application never gets an executor back even when there are cluster resources available. **Cause.** The issue is that `sc.killExecutor` automatically assumes that the application wishes to adjust its resource requirements permanently downwards. This is not the intention in `HeartbeatReceiver`, however, which simply wants a replacement for the expired executor. **Fix.** Differentiate between the intention to kill and the intention to replace an executor with a fresh one. More details can be found in the commit message. Author: Andrew Or <[email protected]> Closes apache#7107 from andrewor14/heartbeat-no-kill and squashes the following commits: 1cd2cd7 [Andrew Or] Add regression test for SPARK-8119 25a347d [Andrew Or] Reuse more code in scheduler backend 31ebd40 [Andrew Or] Differentiate between kill and replace Conflicts: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala MKIM: It was not trivial to resolve the conflict

mingyukim · 2015-07-23T06:41:01Z

@punya for review. I noted on the commit msg, but resolving conflict on core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala was non-trivial, so I didn't cherry-pick that change. It's a unit test.

Mkim/branch 1.4.1 palantir1

…onfig option. ## What changes were proposed in this pull request? Currently, `OptimizeIn` optimizer replaces `In` expression into `InSet` expression if the size of set is greater than a constant, 10. This issue aims to make a configuration `spark.sql.optimizer.inSetConversionThreshold` for that. After this PR, `OptimizerIn` is configurable. ```scala scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain() == Physical Plan == WholeStageCodegen : +- Project [a#7 IN (1,2,3) AS (a IN (1, 2, 3))#8] : +- INPUT +- Generate explode([1,2]), false, false, [a#7] +- Scan OneRowRelation[] scala> sqlContext.setConf("spark.sql.optimizer.inSetConversionThreshold", "2") scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain() == Physical Plan == WholeStageCodegen : +- Project [a#16 INSET (1,2,3) AS (a IN (1, 2, 3))#17] : +- INPUT +- Generate explode([1,2]), false, false, [a#16] +- Scan OneRowRelation[] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <[email protected]> Closes apache#12562 from dongjoon-hyun/SPARK-14796.

…aggregations ## What changes were proposed in this pull request? Partial aggregations are generated in `EnsureRequirements`, but the planner fails to check if partial aggregation satisfies sort requirements. For the following query: ``` val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2") spark.sql("select max(b) from t2 group by a").explain(true) ``` Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation. ``` == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- LocalTableScan [a#5, b#6] ``` Actually, a correct plan is: ``` == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- *Sort [a#5 ASC], false, 0 +- LocalTableScan [a#5, b#6] ``` ## How was this patch tested? Added tests in `PlannerSuite`. Author: Takeshi YAMAMURO <[email protected]> Closes apache#14865 from maropu/SPARK-17289.

Andrew Or and others added 2 commits July 22, 2015 23:32

Preparing 1.4.1-palantir1

b70a598

Update version number in package.scala

bd7a9bb

mingyukim added a commit that referenced this pull request Aug 12, 2015

Merge pull request #17 from palantir/mkim/branch-1.4.1-palantir1

79971c7

Mkim/branch 1.4.1 palantir1

mingyukim merged commit 79971c7 into branch-1.4.1-palantir Aug 12, 2015

robert3005 deleted the mkim/branch-1.4.1-palantir1 branch September 24, 2016 04:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mkim/branch 1.4.1 palantir1 #17

Mkim/branch 1.4.1 palantir1 #17

mingyukim commented Jul 23, 2015

mingyukim commented Jul 23, 2015

Mkim/branch 1.4.1 palantir1 #17

Mkim/branch 1.4.1 palantir1 #17

Conversation

mingyukim commented Jul 23, 2015

mingyukim commented Jul 23, 2015