Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6876] [PySpark] [SQL] add DataFrame na.replace in pyspark #6003

Closed
wants to merge 5 commits into from

Conversation

adrian-wang
Copy link
Contributor

No description provided.

/**
* Convert java map of K, V into Map of K, V (for calling API with varargs)
*/
def toMap[K, V](jm: java.util.Map[K, V]): Map[K, V] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toScalaMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sry I wasn't clear. I meant let's rename this to toScalaMap.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32215 has started for PR 6003 at commit 08a07ad.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32215 has finished for PR 6003 at commit 08a07ad.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32215/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32218 has started for PR 6003 at commit 484af9e.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32218 has finished for PR 6003 at commit 484af9e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32218/
Test PASSed.

@@ -1182,6 +1256,13 @@ def _to_seq(sc, cols, converter=None):
return sc._jvm.PythonUtils.toSeq(cols)


def _to_map(sc, jm):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this _to_scala_map

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #32381 has started for PR 6003 at commit 04209b9.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #32381 has finished for PR 6003 at commit 04209b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32381/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented May 11, 2015

Thanks. I finally had time to review this in detail. A few things we need:

  1. There are lot of branches for testing various input types. We should test those as well in tests.py. (Not in docstring tests).
  2. We should support the case where to_replace is just a dict, like Pandas does.
  3. Ideally, we should also support the case where to_replace is a tuple, and value is just a single value. The semantics is that for all values in to_replace, just replace it with value.

|null| null|null|
+----+------+----+
"""
if not isinstance(to_replace, (float, int, long, basestring, list, tuple)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error handling and input type checking are pretty nice!

@AmplabJenkins
Copy link

Build triggered.

@AmplabJenkins
Copy link

Build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32467 has started for PR 6003 at commit 2bb3b23.

@rxin
Copy link
Contributor

rxin commented May 12, 2015

LGTM. Can you update it so it can merge cleanly with master?

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32467 has finished for PR 6003 at commit 2bb3b23.

  • This patch fails Python style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32467/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32469 has started for PR 6003 at commit 4a148f7.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32469 has finished for PR 6003 at commit 4a148f7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32469/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32488 has started for PR 6003 at commit 672efba.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32488 has finished for PR 6003 at commit 672efba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32488/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented May 12, 2015

Thanks. Merging in!

asfgit pushed a commit that referenced this pull request May 12, 2015
Author: Daoyuan Wang <[email protected]>

Closes #6003 from adrian-wang/pynareplace and squashes the following commits:

672efba [Daoyuan Wang] remove py2.7 feature
4a148f7 [Daoyuan Wang] to_replace support dict, value support single value, and add full tests
9e232e7 [Daoyuan Wang] rename scala map
af0268a [Daoyuan Wang] remove na
63ac579 [Daoyuan Wang] add na.replace in pyspark

(cherry picked from commit d86ce84)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in d86ce84 May 12, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Author: Daoyuan Wang <[email protected]>

Closes apache#6003 from adrian-wang/pynareplace and squashes the following commits:

672efba [Daoyuan Wang] remove py2.7 feature
4a148f7 [Daoyuan Wang] to_replace support dict, value support single value, and add full tests
9e232e7 [Daoyuan Wang] rename scala map
af0268a [Daoyuan Wang] remove na
63ac579 [Daoyuan Wang] add na.replace in pyspark
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Author: Daoyuan Wang <[email protected]>

Closes apache#6003 from adrian-wang/pynareplace and squashes the following commits:

672efba [Daoyuan Wang] remove py2.7 feature
4a148f7 [Daoyuan Wang] to_replace support dict, value support single value, and add full tests
9e232e7 [Daoyuan Wang] rename scala map
af0268a [Daoyuan Wang] remove na
63ac579 [Daoyuan Wang] add na.replace in pyspark
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Author: Daoyuan Wang <[email protected]>

Closes apache#6003 from adrian-wang/pynareplace and squashes the following commits:

672efba [Daoyuan Wang] remove py2.7 feature
4a148f7 [Daoyuan Wang] to_replace support dict, value support single value, and add full tests
9e232e7 [Daoyuan Wang] rename scala map
af0268a [Daoyuan Wang] remove na
63ac579 [Daoyuan Wang] add na.replace in pyspark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants