Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKIPME merged Apache branch-1.6 #126

Merged
merged 31 commits into from
Dec 9, 2015
Merged

Conversation

markhamstra
Copy link

No description provided.

brkyvz and others added 30 commits December 7, 2015 00:22
…y when Jenkins load is high

We need to make sure that the last entry is indeed the last entry in the queue.

Author: Burak Yavuz <[email protected]>

Closes apache#10110 from brkyvz/batch-wal-test-fix.

(cherry picked from commit 6fd9e70)
Signed-off-by: Tathagata Das <[email protected]>
This PR:
1. Suppress all known warnings.
2. Cleanup test cases and fix some errors in test cases.
3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
5. Make sure the default Hadoop file system is local when running test cases.
6. Turn on warnings into errors.

Author: Sun Rui <[email protected]>

Closes apache#10030 from sun-rui/SPARK-12034.

(cherry picked from commit 39d677c)
Signed-off-by: Shivaram Venkataraman <[email protected]>
Currently, the current line is not cleared by Cltr-C

After this patch
```
>>> asdfasdf^C
Traceback (most recent call last):
  File "~/spark/python/pyspark/context.py", line 225, in signal_handler
    raise KeyboardInterrupt()
KeyboardInterrupt
```

It's still worse than 1.5 (and before).

Author: Davies Liu <[email protected]>

Closes apache#10134 from davies/fix_cltrc.

(cherry picked from commit ef3f047)
Signed-off-by: Davies Liu <[email protected]>
…ner not present

The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004).

While apache#9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected.

Author: Tathagata Das <[email protected]>

Closes apache#9988 from tdas/SPARK-11932.

(cherry picked from commit 5d80d8c)
Signed-off-by: Tathagata Das <[email protected]>
https://issues.apache.org/jira/browse/SPARK-11963

Author: Xusen Yin <[email protected]>

Closes apache#9962 from yinxusen/SPARK-11963.

(cherry picked from commit 871e85d)
Signed-off-by: Joseph K. Bradley <[email protected]>
…cala doc

In SPARK-11946 the API for pivot was changed a bit and got updated doc, the doc changes were not made for the python api though. This PR updates the python doc to be consistent.

Author: Andrew Ray <[email protected]>

Closes apache#10176 from aray/sql-pivot-python-doc.

(cherry picked from commit 36282f7)
Signed-off-by: Yin Huai <[email protected]>
Switched from using SQLContext constructor to using getOrCreate, mainly in model save/load methods.

This covers all instances in spark.mllib.  There were no uses of the constructor in spark.ml.

CC: mengxr yhuai

Author: Joseph K. Bradley <[email protected]>

Closes apache#10161 from jkbradley/mllib-sqlcontext-fix.

(cherry picked from commit 3e7e05f)
Signed-off-by: Xiangrui Meng <[email protected]>
…ing include_example

Made new patch contaning only markdown examples moved to exmaple/folder.
Ony three  java code were not shfted since they were contaning compliation error ,these classes are
1)StandardScale 2)NormalizerExample 3)VectorIndexer

Author: Xusen Yin <[email protected]>
Author: somideshmukh <[email protected]>

Closes apache#10002 from somideshmukh/SomilBranch1.33.

(cherry picked from commit 78209b0)
Signed-off-by: Xiangrui Meng <[email protected]>
Add since annotation to ml.classification

Author: Takahashi Hiroshi <[email protected]>

Closes apache#8534 from taishi-oss/issue10259.

(cherry picked from commit 7d05a62)
Signed-off-by: Xiangrui Meng <[email protected]>
…mple code

Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear.

Author: Yanbo Liang <[email protected]>

Closes apache#10006 from yanboliang/spark-11958.

(cherry picked from commit 4a39b5a)
Signed-off-by: Xiangrui Meng <[email protected]>
…means Value

Author: cody koeninger <[email protected]>

Closes apache#10132 from koeninger/SPARK-12103.

(cherry picked from commit 48a9804)
Signed-off-by: Sean Owen <[email protected]>
Author: Jeff Zhang <[email protected]>

Closes apache#10172 from zjffdu/SPARK-12166.

(cherry picked from commit 7081291)
Signed-off-by: Sean Owen <[email protected]>
This reverts PR apache#10002, commit 78209b0.

The original PR wasn't tested on Jenkins before being merged.

Author: Cheng Lian <[email protected]>

Closes apache#10200 from liancheng/revert-pr-10002.

(cherry picked from commit da2012a)
Signed-off-by: Cheng Lian <[email protected]>
Fix commons-collection group ID to commons-collections for version 3.x

Patches earlier PR at apache#9731

Author: Sean Owen <[email protected]>

Closes apache#10198 from srowen/SPARK-11652.2.

(cherry picked from commit e3735ce)
Signed-off-by: Sean Owen <[email protected]>
checked with hive, greatest/least should cast their children to a tightest common type,
i.e. `(int, long) => long`, `(int, string) => error`, `(decimal(10,5), decimal(5, 10)) => error`

Author: Wenchen Fan <[email protected]>

Closes apache#10196 from cloud-fan/type-coercion.

(cherry picked from commit 381f17b)
Signed-off-by: Michael Armbrust <[email protected]>
This PR is to add three more data types into Encoder, including `BigDecimal`, `Date` and `Timestamp`.

marmbrus cloud-fan rxin Could you take a quick look at these three types? Not sure if it can be merged to 1.6. Thank you very much!

Author: gatorsmile <[email protected]>

Closes apache#10188 from gatorsmile/dataTypesinEncoder.

(cherry picked from commit c0b13d5)
Signed-off-by: Michael Armbrust <[email protected]>
… APIs

This PR contains the following updates:

- Created a new private variable `boundTEncoder` that can be shared by multiple functions, `RDD`, `select` and `collect`.
- Replaced all the `queryExecution.analyzed` by the function call `logicalPlan`
- A few API comments are using wrong class names (e.g., `DataFrame`) or parameter names (e.g., `n`)
- A few API descriptions are wrong. (e.g., `mapPartitions`)

marmbrus rxin cloud-fan Could you take a look and check if they are appropriate? Thank you!

Author: gatorsmile <[email protected]>

Closes apache#10184 from gatorsmile/datasetClean.

(cherry picked from commit 5d96a71)
Signed-off-by: Michael Armbrust <[email protected]>
jira: https://issues.apache.org/jira/browse/SPARK-10393

Since the logic of the text processing part has been moved to ML estimators/transformers, replace the related code in LDA Example with the ML pipeline.

Author: Yuhao Yang <[email protected]>
Author: yuhaoyang <[email protected]>

Closes apache#8551 from hhbyyh/ldaExUpdate.

(cherry picked from commit 872a2ee)
Signed-off-by: Joseph K. Bradley <[email protected]>
…unction

Delays application of ResolvePivot until all aggregates are resolved to prevent problems with UnresolvedFunction and adds unit test

Author: Andrew Ray <[email protected]>

Closes apache#10202 from aray/sql-pivot-unresolved-function.

(cherry picked from commit 4bcb894)
Signed-off-by: Yin Huai <[email protected]>
jira: https://issues.apache.org/jira/browse/SPARK-11605
Check Java compatibility for MLlib for this release.

fix:

1. `StreamingTest.registerStream` needs java friendly interface.

2. `GradientBoostedTreesModel.computeInitialPredictionAndError` and `GradientBoostedTreesModel.updatePredictionError` has java compatibility issue. Mark them as `developerAPI`.

TBD:
[updated] no fix for now per discussion.
`org.apache.spark.mllib.classification.LogisticRegressionModel`
`public scala.Option<java.lang.Object> getThreshold();` has wrong return type for Java invocation.
`SVMModel` has the similar issue.

Yet adding a `scala.Option<java.util.Double> getThreshold()` would result in an overloading error due to the same function signature. And adding a new function with different name seems to be not necessary.

cc jkbradley feynmanliang

Author: Yuhao Yang <[email protected]>

Closes apache#10102 from hhbyyh/javaAPI.

(cherry picked from commit 5cb4695)
Signed-off-by: Joseph K. Bradley <[email protected]>
Documentation regarding the `IndexToString` label transformer with code snippets in Scala/Java/Python.

Author: BenFradet <[email protected]>

Closes apache#10166 from BenFradet/SPARK-12159.

(cherry picked from commit 06746b3)
Signed-off-by: Joseph K. Bradley <[email protected]>
This patch tightens them to `private[memory]`.

Author: Andrew Or <[email protected]>

Closes apache#10182 from andrewor14/memory-visibility.

(cherry picked from commit 9494521)
Signed-off-by: Josh Rosen <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes apache#10060 from marmbrus/docs.

(cherry picked from commit 3959489)
Signed-off-by: Michael Armbrust <[email protected]>
This PR moves pieces of the spark.ml user guide to reflect suggestions in SPARK-8517. It does not introduce new content, as requested.

<img width="192" alt="screen shot 2015-12-08 at 11 36 00 am" src="https://cloud.githubusercontent.com/assets/7594753/11666166/e82b84f2-9d9f-11e5-8904-e215424d8444.png">

Author: Timothy Hunter <[email protected]>

Closes apache#10207 from thunterdb/spark-8517.

(cherry picked from commit 765c67f)
Signed-off-by: Joseph K. Bradley <[email protected]>
…columns in RegressionEvaluator

felixcheung , mengxr

Just added a message to require()

Author: Dominik Dahlem <[email protected]>

Closes apache#9598 from dahlem/ddahlem_regression_evaluator_double_predictions_message_04112015.

(cherry picked from commit a0046e3)
Signed-off-by: Joseph K. Bradley <[email protected]>
…throw Buffer underflow exception

Jira: https://issues.apache.org/jira/browse/SPARK-12222

Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception:
```
com.esotericsoftware.kryo.KryoException: Buffer underflow.
	at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:131)
	at com.esotericsoftware.kryo.io.Input.skip(Input.java:264)
```

This is caused by a bug of kryo's `Input.skip(long count)`(EsotericSoftware/kryo#119) and we call this method in `KryoInputDataInputBridge`.

Instead of upgrade kryo's version, this pr bypass the  kryo's `Input.skip(long count)` by directly call another `skip` method in kryo's Input.java(https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/io/Input.java#L124), i.e. write the bug-fixed version of `Input.skip(long count)` in KryoInputDataInputBridge's `skipBytes` method.

more detail link to apache#9748 (comment)

Author: Fei Wang <[email protected]>

Closes apache#10213 from scwf/patch-1.

(cherry picked from commit 3934562)
Signed-off-by: Davies Liu <[email protected]>
Author: uncleGen <[email protected]>

Closes apache#10023 from uncleGen/1.6-bugfix.

(cherry picked from commit a113216)
Signed-off-by: Sean Owen <[email protected]>
Currently word2vec has the window hard coded at 5, some users may want different sizes (for example if using on n-gram input or similar). User request comes from http://stackoverflow.com/questions/32231975/spark-word2vec-window-size .

Author: Holden Karau <[email protected]>
Author: Holden Karau <[email protected]>

Closes apache#8513 from holdenk/SPARK-10299-word2vec-should-allow-users-to-specify-the-window-size.

(cherry picked from commit 22b9a87)
Signed-off-by: Sean Owen <[email protected]>
markhamstra added a commit that referenced this pull request Dec 9, 2015
SKIPME merged Apache branch-1.6
@markhamstra markhamstra merged commit 7fb05f9 into alteryx:csd-1.6 Dec 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.