Spark 615 map partitions with index callable from java #16

holdenk · 2014-02-27T03:43:18Z

No description provided.

AmplabJenkins · 2014-02-27T03:43:50Z

Merged build triggered.

AmplabJenkins · 2014-02-27T03:43:50Z

Merged build started.

AmplabJenkins · 2014-02-27T03:43:56Z

Merged build triggered.

AmplabJenkins · 2014-02-27T04:13:00Z

Merged build finished.

AmplabJenkins · 2014-02-27T04:13:00Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12894/

mateiz · 2014-03-04T05:41:13Z

Hey Holden, wait on this a bit until #17 is merged. Then we'll also want to make sure it works with Java 8 (you'll need to make the class an interface and such).

pwendell · 2014-03-08T19:24:32Z

@holdenk mind bumping this now that #17 is in? You'll have to change extends to with... since the function classes are now interfaces rather than abstract classes.

holdenk · 2014-03-08T19:50:25Z

Sure, I'll give this a shot today :)

On Sat, Mar 8, 2014 at 11:24 AM, Patrick Wendell
[email protected]:

@holdenk https://github.com/holdenk mind bumping this now that #17 https://github.com/apache/spark/pull/17is in? You'll have to change
extends to with... since the function classes are now interfaces rather
than abstract classes.

Reply to this email directly or view it on GitHubhttps://github.com//pull/16#issuecomment-37107006
.

Cell : 425-233-8271

AmplabJenkins · 2014-03-08T21:31:27Z

Merged build triggered.

AmplabJenkins · 2014-03-08T21:31:27Z

Merged build started.

AmplabJenkins · 2014-03-08T22:29:46Z

Merged build finished.

AmplabJenkins · 2014-03-08T22:29:46Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13075/

JoshRosen · 2014-08-24T02:57:21Z

Sorry to necro the oldest open PR, but do you mind closing this now that mapPartitionsWithIndex has been fixed? Thanks!

reformating

Fix java.util.MissingFormatArgumentException in statsd module

## What changes were proposed in this pull request? This PR brings the support for chained Python UDFs, for example ```sql select udf1(udf2(a)) select udf1(udf2(a) + 3) select udf1(udf2(a) + udf3(b)) ``` Also directly chained unary Python UDFs are put in single batch of Python UDFs, others may require multiple batches. For example, ```python >>> sqlContext.sql("select double(double(1))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#10 AS double(double(1))#9] : +- INPUT +- !BatchPythonEvaluation double(double(1)), [pythonUDF#10] +- Scan OneRowRelation[] >>> sqlContext.sql("select double(double(1) + double(2))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#19 AS double((double(1) + double(2)))#16] : +- INPUT +- !BatchPythonEvaluation double((pythonUDF#17 + pythonUDF#18)), [pythonUDF#17,pythonUDF#18,pythonUDF#19] +- !BatchPythonEvaluation double(2), [pythonUDF#17,pythonUDF#18] +- !BatchPythonEvaluation double(1), [pythonUDF#17] +- Scan OneRowRelation[] ``` TODO: will support multiple unrelated Python UDFs in one batch (another PR). ## How was this patch tested? Added new unit tests for chained UDFs. Author: Davies Liu <[email protected]> Closes #12014 from davies/py_udfs.

…-1656 to netflix/1.6.1 * commit '5b54d2fbb11b45298440d77deb06514f12c47b40': [DSEPLAT-1656] Upgrade the version of metacat client, benjamin and bdurl.

Fix dev tools and add some new, Criteo specific ones.

* Documentation for the current state of the world. * Adding navigation links from other pages * Address comments, add TODO for things that should be fixed * Address comments, mostly making images section clearer * Virtual runtime -> container runtime

Move column writers to Arrow.scala Add support for more types; Switch to arrow NullableVector closes apache#16

* Documentation for the current state of the world. * Adding navigation links from other pages * Address comments, add TODO for things that should be fixed * Address comments, mostly making images section clearer * Virtual runtime -> container runtime

修改 hiveContext permission

upgrade spark version to 2.4.1-kylin-r5

Feature/proxy user

[YSPARK-1523] Cleanup hbaseread.py

…onnect ### What changes were proposed in this pull request? Implement Arrow-optimized Python UDFs in Spark Connect. Please see #39384 for motivation and performance improvements of Arrow-optimized Python UDFs. ### Why are the changes needed? Parity with vanilla PySpark. ### Does this PR introduce _any_ user-facing change? Yes. In Spark Connect Python Client, users can: 1. Set `useArrow` parameter True to enable Arrow optimization for a specific Python UDF. ```sh >>> df = spark.range(2) >>> df.select(udf(lambda x : x + 1, useArrow=True)('id')).show() +------------+ |<lambda>(id)| +------------+ | 1| | 2| +------------+ # ArrowEvalPython indicates Arrow optimization >>> df.select(udf(lambda x : x + 1, useArrow=True)('id')).explain() == Physical Plan == *(2) Project [pythonUDF0#18 AS <lambda>(id)#16] +- ArrowEvalPython [<lambda>(id#14L)#15], [pythonUDF0#18], 200 +- *(1) Range (0, 2, step=1, splits=1) ``` 2. Enable `spark.sql.execution.pythonUDF.arrow.enabled` Spark Conf to make all Python UDFs Arrow-optimized. ```sh >>> spark.conf.set("spark.sql.execution.pythonUDF.arrow.enabled", True) >>> df.select(udf(lambda x : x + 1)('id')).show() +------------+ |<lambda>(id)| +------------+ | 1| | 2| +------------+ # ArrowEvalPython indicates Arrow optimization >>> df.select(udf(lambda x : x + 1)('id')).explain() == Physical Plan == *(2) Project [pythonUDF0#30 AS <lambda>(id)#28] +- ArrowEvalPython [<lambda>(id#26L)#27], [pythonUDF0#30], 200 +- *(1) Range (0, 2, step=1, splits=1) ``` ### How was this patch tested? Parity unit tests. Closes #40725 from xinrong-meng/connect_arrow_py_udf. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

holdenk added 4 commits February 26, 2014 19:32

Fix Java API for mapPartitionsWithIndex

215a9bf

Check all the values

e2331ed

Add missing class

0d624bf

Use fakeClassTag

4421ecc

holdenk added 7 commits March 8, 2014 12:31

Fix Java API for mapPartitionsWithIndex

8d849a1

Check all the values

958efa4

Add missing class

6ad1a3c

Use fakeClassTag

e64e1ad

It compiles with the Java 8 happy pandas

8bfd3f3

merge

b6a613f

Remove old function

36c7831

holdenk added 7 commits March 11, 2014 14:01

Check all the values

79d1bc1

Add missing class

ec80d7a

Use fakeClassTag

e4962ab

Fix Java API for mapPartitionsWithIndex

f484afc

Add missing class

4eb9c0f

It compiles with the Java 8 happy pandas

96a86c7

Remove old function

df6922a

holdenk closed this Aug 24, 2014

jackylk pushed a commit to jackylk/spark that referenced this pull request Nov 8, 2014

Merge pull request apache#16 from jackylk/reformat

adecb45

reformating

JasonMWhite pushed a commit to JasonMWhite/spark that referenced this pull request Dec 2, 2015

Merge pull request apache#16 from Shopify/statsd-bug

794180b

Fix java.util.MissingFormatArgumentException in statsd module

AnthonyTruchet added a commit to AnthonyTruchet/spark that referenced this pull request Dec 12, 2016

Merge pull request apache#16 from AnthonyTruchet/dev-tools

4d7c891

Fix dev tools and add some new, Criteo specific ones.

icexelloss added a commit to icexelloss/spark that referenced this pull request Apr 28, 2017

Implement Arrow column writers

bdba357

Move column writers to Arrow.scala Add support for more types; Switch to arrow NullableVector closes apache#16

sven0726 pushed a commit to sven0726/spark that referenced this pull request Dec 3, 2018

Merge pull request apache#16 from gf53520/mofidyHiveContext

f8ba94c

修改 hiveContext permission

hn5092 added a commit to hn5092/spark that referenced this pull request Apr 25, 2019

apache#16 upgrade parquet version

9cb059b

upgrade spark version to 2.4.1-kylin-r5

hn5092 added a commit to hn5092/spark that referenced this pull request Jul 17, 2019

apache#16 upgrade parquet version

3c08c51

upgrade spark version to 2.4.1-kylin-r5

hn5092 added a commit to hn5092/spark that referenced this pull request Jul 18, 2019

apache#16 release 2.4.1-kylin-r11

b71eea3

SirOibaf added a commit to SirOibaf/spark that referenced this pull request Jun 11, 2020

[HOPSWORKS-1499] Bump Hops version to 2.8.2.9 (apache#16)

49f9fca

ringtail added a commit to ringtail/spark that referenced this pull request Jan 21, 2021

Merge pull request apache#16 from ringtail/feature/proxy-user

3319ae1

Feature/proxy user

redsanket pushed a commit to redsanket/spark that referenced this pull request Feb 16, 2021

Merge pull request apache#16 from bzhang02/cleanup_hbaseread

543c8ba

[YSPARK-1523] Cleanup hbaseread.py

risyomei pushed a commit to risyomei/spark that referenced this pull request Jun 26, 2023

VINITUS-351: backport SPARK-38992 (apache#16)

aaca493

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 615 map partitions with index callable from java #16

Spark 615 map partitions with index callable from java #16

holdenk commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

mateiz commented Mar 4, 2014

pwendell commented Mar 8, 2014

holdenk commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

JoshRosen commented Aug 24, 2014

Spark 615 map partitions with index callable from java #16

Spark 615 map partitions with index callable from java #16

Conversation

holdenk commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

mateiz commented Mar 4, 2014

pwendell commented Mar 8, 2014

holdenk commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

AmplabJenkins commented Mar 8, 2014

JoshRosen commented Aug 24, 2014