Standardize binary operations between int and str columns #1828

xinrong-meng · 2020-10-06T20:30:01Z

Notes

New test cases related to Pandas only are added for PR reviewers to visualize Pandas' behavior.
They are commented out and will be removed before the PR merge (after PR review).
This PR doesn't aim to cover all binary operations: only +, -, *, /, //, % are concerned.

Proposal

Make behaviors of binary operations (+, -, *, /, //, %) between int and str columns consistent with respective pandas behaviors.

Standardize binary operations as follows:

+: raise TypeError between int column and str column (or string literal)
*: act as spark SQL repeat between int column(or int literal) and str columns; raise TypeError if a string literal is involved
-, /, //, %(modulo): raise TypeError if a str column (or string literal) is involved
Add def repeat(col, n): in databricks/koalas/spark/functions.py

repeat defined in scala API only accepts integer literal as the method's second parameter.
But internal StringRepeat accepts IntegerType as the method's second parameter.

In order to pass int columns as the second parameter, we take advantage of callUDF.

Test

databricks/koalas/tests/test_dataframe.py

Resolves #1819

itholic · 2020-10-07T04:14:44Z

Just FYI: You can run formatter and linter using dev/reformat and dev/lint-python to perform static analysis. dev/reformat automatically rearrange your code if it's needed :)

xinrong-meng · 2020-10-07T19:13:58Z

@itholic Thank you for reminding me!

databricks/koalas/base.py

databricks/koalas/tests/test_dataframe.py

itholic

Otherwise, Looks fine to me.

databricks/koalas/base.py

databricks/koalas/tests/test_dataframe.py

databricks/koalas/base.py

databricks/koalas/tests/test_dataframe.py

databricks/koalas/base.py

ueshin

LGTM.

ueshin · 2020-10-08T18:22:13Z

Thanks! merging.

xinrong-meng · 2020-10-08T19:04:51Z

@ueshin Thank you for merging!

itholic · 2020-10-09T06:25:04Z

Nice 👍

xinrong-meng added 5 commits October 6, 2020 11:27

TypeError when 'numeric + str'

e224b26

Tests for +, - between int and str

4cbe659

+, - between int and str

582faa4

Tests for /(//) between int and str

469e447

/(//) between int and str

0d98731

* between int and str

024240f

xinrong-meng force-pushed the opOnDiffType branch from 6b9a8f5 to 024240f Compare October 7, 2020 19:12

xinrong-meng added 5 commits October 7, 2020 12:35

mod between int and str

1724b78

Disable tests for pandas only

c9c28a9

- involves str column should raise TypeError

90ef3f2

* involves str literal should raise TypeError

a64f420

Optimize modulo error message

3d7905c

xinrong-meng marked this pull request as ready for review October 7, 2020 22:56

xinrong-meng changed the title ~~[WIP] Standardize binary operations between int and str columns~~ Standardize binary operations between int and str columns Oct 7, 2020

xinrong-meng requested a review from ueshin October 7, 2020 23:01

xinrong-meng changed the title ~~Standardize binary operations between int and str columns~~ Standardize binary operations between numeric and str columns Oct 7, 2020

ueshin reviewed Oct 7, 2020

View reviewed changes

ueshin requested review from itholic and HyukjinKwon October 7, 2020 23:28

ueshin reviewed Oct 7, 2020

View reviewed changes

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved

ueshin reviewed Oct 7, 2020

View reviewed changes

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved

Resolve comments on non-tests

70375fc

xinrong-meng changed the title ~~Standardize binary operations between numeric and str columns~~ Standardize binary operations between int and str columns Oct 7, 2020

Add positive tests

9ed863e

itholic approved these changes Oct 8, 2020

View reviewed changes

databricks/koalas/base.py Outdated Show resolved Hide resolved

databricks/koalas/tests/test_dataframe.py Show resolved Hide resolved

databricks/koalas/base.py Outdated Show resolved Hide resolved

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved

itholic reviewed Oct 8, 2020

View reviewed changes

databricks/koalas/base.py Outdated Show resolved Hide resolved

xinrong-meng added 2 commits October 8, 2020 08:57

Single if statements

9ad1d23

Resolve comments on test

3017df6

xinrong-meng added 2 commits October 8, 2020 09:13

Remove pandas-only test

490f0b6

Period in TypeError message

86a7f45

xinrong-meng requested a review from ueshin October 8, 2020 16:39

ueshin approved these changes Oct 8, 2020

View reviewed changes

ueshin merged commit 6c8f4be into databricks:master Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize binary operations between int and str columns #1828

Standardize binary operations between int and str columns #1828

xinrong-meng commented Oct 6, 2020 •

edited by ueshin

Loading

itholic commented Oct 7, 2020 •

edited

Loading

xinrong-meng commented Oct 7, 2020

itholic left a comment

ueshin left a comment

ueshin commented Oct 8, 2020

xinrong-meng commented Oct 8, 2020

itholic commented Oct 9, 2020

Standardize binary operations between int and str columns #1828

Standardize binary operations between int and str columns #1828

Conversation

xinrong-meng commented Oct 6, 2020 • edited by ueshin Loading

Notes

Proposal

Test

itholic commented Oct 7, 2020 • edited Loading

xinrong-meng commented Oct 7, 2020

itholic left a comment

Choose a reason for hiding this comment

ueshin left a comment

Choose a reason for hiding this comment

ueshin commented Oct 8, 2020

xinrong-meng commented Oct 8, 2020

itholic commented Oct 9, 2020

xinrong-meng commented Oct 6, 2020 •

edited by ueshin

Loading

itholic commented Oct 7, 2020 •

edited

Loading