SKIPME merged Apache branch-1.5 #134

markhamstra · 2015-12-18T20:05:30Z

No description provided.

… doc With the merge of [SPARK-8337](https://issues.apache.org/jira/browse/SPARK-8337), now the Python API has the same functionalities compared to Scala/Java, so here changing the description to make it more precise. zsxwing tdas , please review, thanks a lot. Author: jerryshao <[email protected]> Closes apache#10246 from jerryshao/direct-kafka-doc-update. (cherry picked from commit 24d3357) Signed-off-by: Shixiong Zhu <[email protected]>

…afe cross-JVM comparisions In the current implementation of named expressions' `ExprIds`, we rely on a per-JVM AtomicLong to ensure that expression ids are unique within a JVM. However, these expression ids will not be _globally_ unique. This opens the potential for id collisions if new expression ids happen to be created inside of tasks rather than on the driver. There are currently a few cases where tasks allocate expression ids, which happen to be safe because those expressions are never compared to expressions created on the driver. In order to guard against the introduction of invalid comparisons between driver-created and executor-created expression ids, this patch extends `ExprId` to incorporate a UUID to identify the JVM that created the id, which prevents collisions. Author: Josh Rosen <[email protected]> Closes apache#9093 from JoshRosen/SPARK-11080.

…rasure Issue As noted in PR apache#9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor. As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark. Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`. Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`. As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type. `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types. This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`. This PR blocks apache#9441, so once this is merged, the other can be rebased. cc holdenk Author: Mike Dusenberry <[email protected]> Closes apache#9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue. (cherry picked from commit 1b82203) Signed-off-by: Joseph K. Bradley <[email protected]>

… backport backport apache#10265 to branch 1.5. When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies. https://issues.apache.org/jira/browse/SPARK-12275 Author: yucai <[email protected]> Closes apache#10291 from yucai/backport_1.5_no_plan_for_broadcasthint and squashes the following commits: b09715c [yucai] [SPARK-12275][SQL] No plan for BroadcastHint in some condition - 1.5 backport

…split String.split accepts a regular expression, so we should escape "." and "|". Author: Shixiong Zhu <[email protected]> Closes apache#10361 from zsxwing/reg-bug. (cherry picked from commit 540b5ae) Signed-off-by: Shixiong Zhu <[email protected]>

…table Backport apache#9390 and apache#9744 to branch-1.5. Author: Sun Rui <[email protected]> Author: Shivaram Venkataraman <[email protected]> Closes apache#10372 from sun-rui/SPARK-10500-branch-1.5.

…a Source filter API JIRA: https://issues.apache.org/jira/browse/SPARK-12218 When creating filters for Parquet/ORC, we should not push nested AND expressions partially. Author: Yin Huai <[email protected]> Closes apache#10362 from yhuai/SPARK-12218. (cherry picked from commit 41ee7c5) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala

SKIPME merged Apache branch-1.5

jerryshao and others added 8 commits December 10, 2015 15:32

[SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwri…

d2f71c2

…table Backport apache#9390 and apache#9744 to branch-1.5. Author: Sun Rui <[email protected]> Author: Shivaram Venkataraman <[email protected]> Closes apache#10372 from sun-rui/SPARK-10500-branch-1.5.

Merge branch 'branch-1.5' of github.com:apache/spark into csd-1.5

0fcc878

markhamstra added a commit that referenced this pull request Dec 18, 2015

Merge pull request #134 from markhamstra/csd-1.5

3993929

SKIPME merged Apache branch-1.5

markhamstra merged commit 3993929 into alteryx:csd-1.5 Dec 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKIPME merged Apache branch-1.5 #134

SKIPME merged Apache branch-1.5 #134

markhamstra commented Dec 18, 2015

SKIPME merged Apache branch-1.5 #134

SKIPME merged Apache branch-1.5 #134

Conversation

markhamstra commented Dec 18, 2015