Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKIPME merged Apache branch-1.5 #134

Merged
merged 8 commits into from
Dec 18, 2015
Merged

Conversation

markhamstra
Copy link

No description provided.

jerryshao and others added 8 commits December 10, 2015 15:32
… doc

With the merge of [SPARK-8337](https://issues.apache.org/jira/browse/SPARK-8337), now the Python API has the same functionalities compared to Scala/Java, so here changing the description to make it more precise.

zsxwing tdas , please review, thanks a lot.

Author: jerryshao <[email protected]>

Closes apache#10246 from jerryshao/direct-kafka-doc-update.

(cherry picked from commit 24d3357)
Signed-off-by: Shixiong Zhu <[email protected]>
…afe cross-JVM comparisions

In the current implementation of named expressions' `ExprIds`, we rely on a per-JVM AtomicLong to ensure that expression ids are unique within a JVM. However, these expression ids will not be _globally_ unique. This opens the potential for id collisions if new expression ids happen to be created inside of tasks rather than on the driver.

There are currently a few cases where tasks allocate expression ids, which happen to be safe because those expressions are never compared to expressions created on the driver. In order to guard against the introduction of invalid comparisons between driver-created and executor-created expression ids, this patch extends `ExprId` to incorporate a UUID to identify the JVM that created the id, which prevents collisions.

Author: Josh Rosen <[email protected]>

Closes apache#9093 from JoshRosen/SPARK-11080.
…rasure Issue

As noted in PR apache#9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor.  As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark.  Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`.  Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`.  As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type.  `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types.

This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`.  This PR blocks apache#9441, so once this is merged, the other can be rebased.

cc holdenk

Author: Mike Dusenberry <[email protected]>

Closes apache#9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue.

(cherry picked from commit 1b82203)
Signed-off-by: Joseph K. Bradley <[email protected]>
… backport

backport apache#10265 to branch 1.5.

When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit,
it only recursively invokes BasicOperators.apply with this "child".
It makes many strategies have no change to process this plan,
which probably leads to "No plan" issue, so we use planLater to go through all strategies.

https://issues.apache.org/jira/browse/SPARK-12275

Author: yucai <[email protected]>

Closes apache#10291 from yucai/backport_1.5_no_plan_for_broadcasthint and squashes the following commits:

b09715c [yucai] [SPARK-12275][SQL] No plan for BroadcastHint in some condition - 1.5 backport
…split

String.split accepts a regular expression, so we should escape "." and "|".

Author: Shixiong Zhu <[email protected]>

Closes apache#10361 from zsxwing/reg-bug.

(cherry picked from commit 540b5ae)
Signed-off-by: Shixiong Zhu <[email protected]>
…table

Backport apache#9390 and apache#9744 to branch-1.5.

Author: Sun Rui <[email protected]>
Author: Shivaram Venkataraman <[email protected]>

Closes apache#10372 from sun-rui/SPARK-10500-branch-1.5.
…a Source filter API

JIRA: https://issues.apache.org/jira/browse/SPARK-12218

When creating filters for Parquet/ORC, we should not push nested AND expressions partially.

Author: Yin Huai <[email protected]>

Closes apache#10362 from yhuai/SPARK-12218.

(cherry picked from commit 41ee7c5)
Signed-off-by: Yin Huai <[email protected]>

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
markhamstra added a commit that referenced this pull request Dec 18, 2015
SKIPME merged Apache branch-1.5
@markhamstra markhamstra merged commit 3993929 into alteryx:csd-1.5 Dec 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants