SKIPME merged Apache branch-1.6 #139

markhamstra · 2015-12-29T00:19:24Z

No description provided.

Author: Shixiong Zhu <[email protected]> Closes apache#10424 from zsxwing/typo. (cherry picked from commit 93da856) Signed-off-by: Reynold Xin <[email protected]>

…ryServerSuite This patch fixes a flaky "test jdbc cancel" test in HiveThriftBinaryServerSuite. This test is prone to a race-condition which causes it to block indefinitely with while waiting for an extremely slow query to complete, which caused many Jenkins builds to time out. For more background, see my comments on apache#6207 (the PR which introduced this test). Author: Josh Rosen <[email protected]> Closes apache#10425 from JoshRosen/SPARK-11823. (cherry picked from commit 2235cd4) Signed-off-by: Josh Rosen <[email protected]>

Author: Shixiong Zhu <[email protected]> Closes apache#10439 from zsxwing/kafka-message-handler-doc. (cherry picked from commit 93db50d) Signed-off-by: Tathagata Das <[email protected]>

…or Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing. Author: Shixiong Zhu <[email protected]> Closes apache#10385 from zsxwing/accumulator-broadcast-example. (cherry picked from commit 20591af) Signed-off-by: Tathagata Das <[email protected]>

…ay fields Accessing null elements in an array field fails when tungsten is enabled. It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled. This PR solves this by checking if the accessed element in the array field is null, in the generated code. Example: ``` // Array of String case class AS( as: Seq[String] ) val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF dfAS.registerTempTable("T_AS") for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))} ``` With Tungsten disabled: ``` 0 = [a] 1 = [null] 2 = [b] ``` With Tungsten enabled: ``` 0 = [a] 15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ``` Author: pierre-borckmans <[email protected]> Closes apache#10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array. (cherry picked from commit 43b2a63) Signed-off-by: Reynold Xin <[email protected]>

allow the user to override MAVEN_OPTS (2GB wasn't sufficient for me) Author: Adrian Bridgett <[email protected]> Closes apache#10448 from abridgett/feature/do_not_force_maven_opts. (cherry picked from commit ead6abf) Signed-off-by: Josh Rosen <[email protected]>

…tbeat interval Previously, the rpc timeout was the default network timeout, which is the same value the driver uses to determine dead executors. This means if there is a network issue, the executor is determined dead after one heartbeat attempt. There is a separate config for the heartbeat interval which is a better value to use for the heartbeat RPC. With this change, the executor will make multiple heartbeat attempts even with RPC issues. Author: Nong Li <[email protected]> Closes apache#10365 from nongli/spark-12411.

…a is used fix an exception with IBM JDK by removing update field from a JavaVersion tuple. This is because IBM JDK does not have information on update '_xx' Author: Kazuaki Ishizaki <[email protected]> Closes apache#10463 from kiszk/SPARK-12502. (cherry picked from commit 9e85bb7) Signed-off-by: Kousuke Saruta <[email protected]>

…NSERT syntax In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()): INSERT INTO $table VALUES ( ?, ?, ..., ? ) But some technologies require a list of column names: INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? ) This was blocking the use of e.g. the Progress JDBC Driver for Cassandra. Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc(). If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types. This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names. Author: CK50 <[email protected]> Closes apache#10380 from CK50/master-SPARK-12010-2. (cherry picked from commit 502476e) Signed-off-by: Sean Owen <[email protected]>

…i-Join After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code. For example, users can do the Equi-Join like ```df.join(df2, 'name', 'outer').select('name', 'height').collect()``` - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`). - After a PR: apache#8600, the 1.6 does not have such an issue, but the description has not been updated. Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join. Author: gatorsmile <[email protected]> Closes apache#10477 from gatorsmile/pyOuterJoin.

The feature was first added at commit: 7b877b2 but was later removed (probably by mistake) at commit: fc8b581. This change sets the default path of RDDs created via sc.textFile(...) to the path argument. Here is the symptom: * Using spark-1.5.2-bin-hadoop2.6: scala> sc.textFile("/home/root/.bashrc").name res5: String = null scala> sc.binaryFiles("/home/root/.bashrc").name res6: String = /home/root/.bashrc * while using Spark 1.3.1: scala> sc.textFile("/home/root/.bashrc").name res0: String = /home/root/.bashrc scala> sc.binaryFiles("/home/root/.bashrc").name res1: String = /home/root/.bashrc Author: Yaron Weinsberg <[email protected]> Author: yaron <[email protected]> Closes apache#10456 from wyaron/master. (cherry picked from commit 73b70f0) Signed-off-by: Kousuke Saruta <[email protected]>

ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`. Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654). Author: Kousuke Saruta <[email protected]> Closes apache#10381 from sarutak/SPARK-12424. (cherry picked from commit 07165ca) Signed-off-by: Kousuke Saruta <[email protected]>

…hrow Buffer underflow exception Since we only need to implement `def skipBytes(n: Int)`, code in apache#10213 could be simplified. davies scwf Author: Daoyuan Wang <[email protected]> Closes apache#10253 from adrian-wang/kryo. (cherry picked from commit a6d3853) Signed-off-by: Kousuke Saruta <[email protected]>

Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <[email protected]> Closes apache#10440 from zsxwing/findbugs. (cherry picked from commit 710b411) Signed-off-by: Shixiong Zhu <[email protected]>

SKIPME merged Apache branch-1.6

* Pass the actual iterable from the option to get files * Split the original instance variables * Explicitly set the type of the array

pwendell and others added 17 commits December 21, 2015 17:50

Preparing Spark release v1.6.0-rc4

4062cda

Preparing development version 1.6.0-SNAPSHOT

5b19e7c

[MINOR] Fix typos in JavaStreamingContext

309ef35

Author: Shixiong Zhu <[email protected]> Closes apache#10424 from zsxwing/typo. (cherry picked from commit 93da856) Signed-off-by: Reynold Xin <[email protected]>

[SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler

94fb5e8

Author: Shixiong Zhu <[email protected]> Closes apache#10439 from zsxwing/kafka-message-handler-doc. (cherry picked from commit 93db50d) Signed-off-by: Tathagata Das <[email protected]>

Merge branch 'branch-1.6' of github.com:apache/spark into csd-1.6

d545dfe

markhamstra added a commit that referenced this pull request Dec 29, 2015

Merge pull request #139 from markhamstra/csd-1.6

7b9cab6

SKIPME merged Apache branch-1.6

markhamstra merged commit 7b9cab6 into alteryx:csd-1.6 Dec 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKIPME merged Apache branch-1.6 #139

SKIPME merged Apache branch-1.6 #139

markhamstra commented Dec 29, 2015

SKIPME merged Apache branch-1.6 #139

SKIPME merged Apache branch-1.6 #139

Conversation

markhamstra commented Dec 29, 2015