-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SKIPME merged Apache branch-1.6 #140
Commits on Dec 29, 2015
-
[SPARK-11394][SQL] Throw IllegalArgumentException for unsupported typ…
…es in postgresql If DataFrame has BYTE types, throws an exception: org.postgresql.util.PSQLException: ERROR: type "byte" does not exist Author: Takeshi YAMAMURO <[email protected]> Closes apache#9350 from maropu/FixBugInPostgreJdbc. (cherry picked from commit 73862a1) Signed-off-by: Yin Huai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 85a8718 - Browse repository at this point
Copy the full SHA 85a8718View commit details -
[SPARK-12526][SPARKR] ifelse
,
when,
otherwise` unable to take Col……umn as value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <[email protected]> Closes apache#10481 from saurfang/spark-12526. (cherry picked from commit d80cc90) Signed-off-by: Shivaram Venkataraman <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c069ffc - Browse repository at this point
Copy the full SHA c069ffcView commit details
Commits on Dec 30, 2015
-
[SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections
Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information. Author: Holden Karau <[email protected]> Closes apache#10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections. (cherry picked from commit d1ca634) Signed-off-by: Davies Liu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8dc6549 - Browse repository at this point
Copy the full SHA 8dc6549View commit details -
[SPARK-12399] Display correct error message when accessing REST API w…
…ith an unknown app Id I got an exception when accessing the below REST API with an unknown application Id. `http://<server-url>:18080/api/v1/applications/xxx/jobs` Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx` ``` org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116) at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226) at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46) at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66) ``` Author: Carson Wang <[email protected]> Closes apache#10352 from carsonwang/unknownAppFix. (cherry picked from commit b244297) Signed-off-by: Marcelo Vanzin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cd86075 - Browse repository at this point
Copy the full SHA cd86075View commit details
Commits on Jan 3, 2016
-
[SPARK-12327][SPARKR] fix code for lintr warning for commented code
shivaram Author: felixcheung <[email protected]> Closes apache#10408 from felixcheung/rcodecomment. (cherry picked from commit c3d5056) Signed-off-by: Shivaram Venkataraman <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4e9dd16 - Browse repository at this point
Copy the full SHA 4e9dd16View commit details
Commits on Jan 4, 2016
-
[SPARK-12562][SQL] DataFrame.write.format(text) requires the column n…
…ame to be called value Author: Xiu Guo <[email protected]> Closes apache#10515 from xguo27/SPARK-12562. (cherry picked from commit 84f8492) Signed-off-by: Reynold Xin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f7a3223 - Browse repository at this point
Copy the full SHA f7a3223View commit details -
[SPARK-12486] Worker should kill the executors more forcefully if pos…
…sible. This patch updates the ExecutorRunner's terminate path to use the new java 8 API to terminate processes more forcefully if possible. If the executor is unhealthy, it would previously ignore the destroy() call. Presumably, the new java API was added to handle cases like this. We could update the termination path in the future to use OS specific commands for older java versions. Author: Nong Li <[email protected]> Closes apache#10438 from nongli/spark-12486-executors. (cherry picked from commit 8f65939) Signed-off-by: Andrew Or <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cd02038 - Browse repository at this point
Copy the full SHA cd02038View commit details -
[SPARK-12470] [SQL] Fix size reduction calculation
also only allocate required buffer size Author: Pete Robbins <[email protected]> Closes apache#10421 from robbinspg/master. (cherry picked from commit b504b6a) Signed-off-by: Davies Liu <[email protected]> Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala
Configuration menu - View commit details
-
Copy full SHA for b5a1f56 - Browse repository at this point
Copy the full SHA b5a1f56View commit details -
[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (databricks/spark-redshift#143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <[email protected]> Closes apache#10519 from JoshRosen/jdbc-driver-precedence. (cherry picked from commit 6c83d93) Signed-off-by: Yin Huai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7f37c1e - Browse repository at this point
Copy the full SHA 7f37c1eView commit details -
[DOC] Adjust coverage for partitionBy()
This is the related thread: http://search-hadoop.com/m/q3RTtO3ReeJ1iF02&subj=Re+partitioning+json+data+in+spark Michael suggested fixing the doc. Please review. Author: tedyu <[email protected]> Closes apache#10499 from ted-yu/master. (cherry picked from commit 40d0396) Signed-off-by: Michael Armbrust <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1005ee3 - Browse repository at this point
Copy the full SHA 1005ee3View commit details -
[SPARK-12589][SQL] Fix UnsafeRowParquetRecordReader to properly set t…
…he row length. The reader was previously not setting the row length meaning it was wrong if there were variable length columns. This problem does not manifest usually, since the value in the column is correct and projecting the row fixes the issue. Author: Nong Li <[email protected]> Closes apache#10576 from nongli/spark-12589. (cherry picked from commit 34de24a) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
Configuration menu - View commit details
-
Copy full SHA for 8ac9198 - Browse repository at this point
Copy the full SHA 8ac9198View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f4a224 - Browse repository at this point
Copy the full SHA 6f4a224View commit details
Commits on Jan 5, 2016
-
[SPARKR][DOC] minor doc update for version in migration guide
checked that the change is in Spark 1.6.0. shivaram Author: felixcheung <[email protected]> Closes apache#10574 from felixcheung/rwritemodedoc. (cherry picked from commit 8896ec9) Signed-off-by: Shivaram Venkataraman <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8950482 - Browse repository at this point
Copy the full SHA 8950482View commit details -
[SPARK-12568][SQL] Add BINARY to Encoders
Author: Michael Armbrust <[email protected]> Closes apache#10516 from marmbrus/datasetCleanup. (cherry picked from commit 53beddc) Signed-off-by: Michael Armbrust <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d9e4438 - Browse repository at this point
Copy the full SHA d9e4438View commit details -
[SPARK-12647][SQL] Fix o.a.s.sqlexecution.ExchangeCoordinatorSuite.de…
…termining the number of reducers: aggregate operator change expected partition sizes Author: Pete Robbins <[email protected]> Closes apache#10599 from robbinspg/branch-1.6.
Configuration menu - View commit details
-
Copy full SHA for 5afa62b - Browse repository at this point
Copy the full SHA 5afa62bView commit details -
[SPARK-12617] [PYSPARK] Clean up the leak sockets of Py4J
This patch added Py4jCallbackConnectionCleaner to clean the leak sockets of Py4J every 30 seconds. This is a workaround before Py4J fixes the leak issue py4j/py4j#187 Author: Shixiong Zhu <[email protected]> Closes apache#10579 from zsxwing/SPARK-12617. (cherry picked from commit 047a31b) Signed-off-by: Davies Liu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f31d0fd - Browse repository at this point
Copy the full SHA f31d0fdView commit details -
[SPARK-12511] [PYSPARK] [STREAMING] Make sure PythonDStream.registerS…
…erializer is called only once There is an issue that Py4J's PythonProxyHandler.finalize blocks forever. (py4j/py4j#184) Py4j will create a PythonProxyHandler in Java for "transformer_serializer" when calling "registerSerializer". If we call "registerSerializer" twice, the second PythonProxyHandler will override the first one, then the first one will be GCed and trigger "PythonProxyHandler.finalize". To avoid that, we should not call"registerSerializer" more than once, so that "PythonProxyHandler" in Java side won't be GCed. Author: Shixiong Zhu <[email protected]> Closes apache#10514 from zsxwing/SPARK-12511. (cherry picked from commit 6cfe341) Signed-off-by: Davies Liu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 83fe5cf - Browse repository at this point
Copy the full SHA 83fe5cfView commit details -
[SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans
SPARK-12450 . Un-persist broadcasted variables in KMeans. Author: RJ Nowling <[email protected]> Closes apache#10415 from rnowling/spark-12450. (cherry picked from commit 78015a8) Signed-off-by: Joseph K. Bradley <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0afad66 - Browse repository at this point
Copy the full SHA 0afad66View commit details -
[SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk
Successfully ran kinesis demo on a live, aws hosted kinesis stream against master and 1.6 branches. For reasons I don't entirely understand it required a manual merge to 1.5 which I did as shown here: BrianLondon@075c22e The demo ran successfully on the 1.5 branch as well. According to `mvn dependency:tree` it is still pulling a fairly old version of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis regression in 1.5.2. Author: BrianLondon <[email protected]> Closes apache#10492 from BrianLondon/remove-only. (cherry picked from commit ff89975) Signed-off-by: Sean Owen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bf3dca2 - Browse repository at this point
Copy the full SHA bf3dca2View commit details
Commits on Jan 6, 2016
-
[SPARK-12393][SPARKR] Add read.text and write.text for SparkR
Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <[email protected]> Closes apache#10348 from yanboliang/spark-12393. (cherry picked from commit d1fea41) Signed-off-by: Shivaram Venkataraman <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c3135d0 - Browse repository at this point
Copy the full SHA c3135d0View commit details -
[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting `initialModel.weights` to `list`. Author: zero323 <[email protected]> Closes apache#9986 from zero323/SPARK-12006. (cherry picked from commit fcd013c) Signed-off-by: Joseph K. Bradley <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1756819 - Browse repository at this point
Copy the full SHA 1756819View commit details -
[SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming
Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext. Author: Shixiong Zhu <[email protected]> Closes apache#10621 from zsxwing/SPARK-12617-2. (cherry picked from commit 1e6648d) Signed-off-by: Shixiong Zhu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d821fae - Browse repository at this point
Copy the full SHA d821faeView commit details -
[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of defau…
…lt root path to gain the streaming batch url. Author: huangzhaowei <[email protected]> Closes apache#10617 from SaintBacchus/SPARK-12672.
Configuration menu - View commit details
-
Copy full SHA for 8f0ead3 - Browse repository at this point
Copy the full SHA 8f0ead3View commit details -
Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead …
…of default root path to gain the streaming batch url." This reverts commit 8f0ead3. Will merge apache#10618 instead.
Configuration menu - View commit details
-
Copy full SHA for 39b0a34 - Browse repository at this point
Copy the full SHA 39b0a34View commit details
Commits on Jan 7, 2016
-
[SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in…
… pyspark JIRA: https://issues.apache.org/jira/browse/SPARK-12016 We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark. Author: Liang-Chi Hsieh <[email protected]> Closes apache#10100 from viirya/fix-load-py-wordvecmodel. (cherry picked from commit b51a4cd) Signed-off-by: Joseph K. Bradley <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 11b901b - Browse repository at this point
Copy the full SHA 11b901bView commit details -
[SPARK-12673][UI] Add missing uri prepending for job description
Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26 pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png) Author: jerryshao <[email protected]> Closes apache#10618 from jerryshao/SPARK-12673. (cherry picked from commit 174e72c) Signed-off-by: Shixiong Zhu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 94af69c - Browse repository at this point
Copy the full SHA 94af69cView commit details -
[SPARK-12678][CORE] MapPartitionsRDD clearDependencies
MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. Author: Guillaume Poulin <[email protected]> Closes apache#10623 from gpoulin/map_partition_deps. (cherry picked from commit b673852) Signed-off-by: Reynold Xin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d061b85 - Browse repository at this point
Copy the full SHA d061b85View commit details -
Revert "[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is …
…not None" This reverts commit fcd013c. Author: Yin Huai <[email protected]> Closes apache#10632 from yhuai/pythonStyle. (cherry picked from commit e5cde7a) Signed-off-by: Yin Huai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 34effc4 - Browse repository at this point
Copy the full SHA 34effc4View commit details -
[DOC] fix 'spark.memory.offHeap.enabled' default value to false
modify 'spark.memory.offHeap.enabled' default value to false Author: zzcclp <[email protected]> Closes apache#10633 from zzcclp/fix_spark.memory.offHeap.enabled_default_value. (cherry picked from commit 84e77a1) Signed-off-by: Reynold Xin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 47a58c7 - Browse repository at this point
Copy the full SHA 47a58c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 13895cb - Browse repository at this point
Copy the full SHA 13895cbView commit details -
[SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None
If initial model passed to GMM is not empty it causes net.razorvine.pickle.PickleException. It can be fixed by converting initialModel.weights to list. Author: zero323 <[email protected]> Closes apache#10644 from zero323/SPARK-12006. (cherry picked from commit 592f649) Signed-off-by: Joseph K. Bradley <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 69a885a - Browse repository at this point
Copy the full SHA 69a885aView commit details -
[SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overla…
…pping splits https://issues.apache.org/jira/browse/SPARK-12662 cc yhuai Author: Sameer Agarwal <[email protected]> Closes apache#10626 from sameeragarwal/randomsplit. (cherry picked from commit f194d99) Signed-off-by: Reynold Xin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 017b73e - Browse repository at this point
Copy the full SHA 017b73eView commit details -
[SPARK-12598][CORE] bug in setMinPartitions
There is a bug in the calculation of ```maxSplitSize```. The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```. Author: Darek Blasiak <[email protected]> Closes apache#10546 from datafarmer/setminpartitionsbug. (cherry picked from commit 8346518) Signed-off-by: Sean Owen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6ef8235 - Browse repository at this point
Copy the full SHA 6ef8235View commit details
Commits on Jan 8, 2016
-
[SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and all…
…owBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <[email protected]> Closes apache#10453 from zsxwing/streaming-conf. (cherry picked from commit c94199e) Signed-off-by: Tathagata Das <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a7c3636 - Browse repository at this point
Copy the full SHA a7c3636View commit details -
[SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (…
…branch 1.6) backport apache#10609 to branch 1.6 Author: Shixiong Zhu <[email protected]> Closes apache#10656 from zsxwing/SPARK-12591-branch-1.6.
Configuration menu - View commit details
-
Copy full SHA for 0d96c54 - Browse repository at this point
Copy the full SHA 0d96c54View commit details -
Configuration menu - View commit details
-
Copy full SHA for a77a7c5 - Browse repository at this point
Copy the full SHA a77a7c5View commit details