Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize binary operations between int and str columns #1828

Merged
merged 17 commits into from
Oct 8, 2020

Conversation

xinrong-meng
Copy link
Contributor

@xinrong-meng xinrong-meng commented Oct 6, 2020

Notes

  • New test cases related to Pandas only are added for PR reviewers to visualize Pandas' behavior.
    They are commented out and will be removed before the PR merge (after PR review).
  • This PR doesn't aim to cover all binary operations: only +, -, *, /, //, % are concerned.

Proposal

Make behaviors of binary operations (+, -, *, /, //, %) between int and str columns consistent with respective pandas behaviors.

  • Standardize binary operations as follows:

    +: raise TypeError between int column and str column (or string literal)
    *: act as spark SQL repeat between int column(or int literal) and str columns; raise TypeError if a string literal is involved
    -, /, //, %(modulo): raise TypeError if a str column (or string literal) is involved

  • Add def repeat(col, n): in databricks/koalas/spark/functions.py

    repeat defined in scala API only accepts integer literal as the method's second parameter.
    But internal StringRepeat accepts IntegerType as the method's second parameter.

    In order to pass int columns as the second parameter, we take advantage of callUDF.

Test

databricks/koalas/tests/test_dataframe.py

Resolves #1819

@itholic
Copy link
Contributor

itholic commented Oct 7, 2020

Just FYI: You can run formatter and linter using dev/reformat and dev/lint-python to perform static analysis. dev/reformat automatically rearrange your code if it's needed :)

@xinrong-meng
Copy link
Contributor Author

@itholic Thank you for reminding me!

@xinrong-meng xinrong-meng marked this pull request as ready for review October 7, 2020 22:56
@xinrong-meng xinrong-meng changed the title [WIP] Standardize binary operations between int and str columns Standardize binary operations between int and str columns Oct 7, 2020
@xinrong-meng xinrong-meng changed the title Standardize binary operations between int and str columns Standardize binary operations between numeric and str columns Oct 7, 2020
databricks/koalas/base.py Outdated Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
@xinrong-meng xinrong-meng changed the title Standardize binary operations between numeric and str columns Standardize binary operations between int and str columns Oct 7, 2020
Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, Looks fine to me.

databricks/koalas/base.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_dataframe.py Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Collaborator

ueshin commented Oct 8, 2020

Thanks! merging.

@ueshin ueshin merged commit 6c8f4be into databricks:master Oct 8, 2020
@xinrong-meng
Copy link
Contributor Author

@ueshin Thank you for merging!

@itholic
Copy link
Contributor

itholic commented Oct 9, 2020

Nice 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operations on Columns of Differing Types
3 participants