Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49609][PYTHON][CONNECT] Add API compatibility check between Classic and Connect #48085

Closed
wants to merge 7 commits into from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Sep 12, 2024

What changes were proposed in this pull request?

This PR proposes to add API compatibility check between Classic and Connect.

This PR also includes updating both APIs to the same signature.

Why are the changes needed?

APIs supported on both Spark Connect and Spark Classic should guarantee the same signature, such as argument and return types.

For example, test would fail when the signature of API is mismatched:

Signature mismatch in Column method 'dropFields'
Classic: (self, *fieldNames: str) -> pyspark.sql.column.Column
Connect: (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column
<Signature (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column> != <Signature (self, *fieldNames: str) -> pyspark.sql.column.Column>

Expected :<Signature (self, *fieldNames: str) -> pyspark.sql.column.Column>
Actual   :<Signature (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column>

Does this PR introduce any user-facing change?

No, it is a test to prevent future API behavior inconsistencies between Classic and Connect.

How was this patch tested?

Added UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

@itholic
Copy link
Contributor Author

itholic commented Sep 12, 2024

cc @HyukjinKwon @hvanhovell

@xinrong-meng
Copy link
Member

I'm wondering if we should mention the parameter renaming (fix) in the user-facing change section of the PR description

from pyspark.testing.sqlutils import ReusedSQLTestCase


class ConnectCompatibilityTestsMixin:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic can we add some test to verify that this keeps functioning? Basically we need some a couple of broken signatures.

@github-actions github-actions bot added the BUILD label Sep 23, 2024
@HyukjinKwon
Copy link
Member

let's create a umbrella JIRA, and cover other API like functions.

@itholic itholic closed this in 982028e Sep 24, 2024
@itholic
Copy link
Contributor Author

itholic commented Sep 24, 2024

Merged to master. Let me create an umbrella JIRA to address the rest of tasks.

Thanks all for the review!

itholic pushed a commit that referenced this pull request Sep 25, 2024
…ependencies are not found

### What changes were proposed in this pull request?

This PR is a followup of #48085 that skips the compatibility tests if Spark Connect dependencies are not installed.

### Why are the changes needed?

To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11016544408/job/30592416115 which does not have PyArrow installed.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #48239 from HyukjinKwon/SPARK-49609-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Haejoon Lee <[email protected]>
HyukjinKwon added a commit that referenced this pull request Sep 26, 2024
…when connect dependencies not installed

### What changes were proposed in this pull request?

This PR is a followup of #48085 that skips the connect import which requires Connect dependencies.

### Why are the changes needed?

To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11035779484/job/30652736098 which does not have PyArrow installed.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #48259 from HyukjinKwon/SPARK-49609-followup2.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…assic and Connect

### What changes were proposed in this pull request?

This PR proposes to add API compatibility check between Classic and Connect.

This PR also includes updating both APIs to the same signature.

### Why are the changes needed?

APIs supported on both Spark Connect and Spark Classic should guarantee the same signature, such as argument and return types.

For example, test would fail when the signature of API is mismatched:

```
Signature mismatch in Column method 'dropFields'
Classic: (self, *fieldNames: str) -> pyspark.sql.column.Column
Connect: (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column
<Signature (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column> != <Signature (self, *fieldNames: str) -> pyspark.sql.column.Column>

Expected :<Signature (self, *fieldNames: str) -> pyspark.sql.column.Column>
Actual   :<Signature (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column>
```

### Does this PR introduce _any_ user-facing change?

No, it is a test to prevent future API behavior inconsistencies between Classic and Connect.

### How was this patch tested?

Added UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48085 from itholic/SPARK-49609.

Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Haejoon Lee <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…ependencies are not found

### What changes were proposed in this pull request?

This PR is a followup of apache#48085 that skips the compatibility tests if Spark Connect dependencies are not installed.

### Why are the changes needed?

To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11016544408/job/30592416115 which does not have PyArrow installed.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#48239 from HyukjinKwon/SPARK-49609-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Haejoon Lee <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…when connect dependencies not installed

### What changes were proposed in this pull request?

This PR is a followup of apache#48085 that skips the connect import which requires Connect dependencies.

### Why are the changes needed?

To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11035779484/job/30652736098 which does not have PyArrow installed.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#48259 from HyukjinKwon/SPARK-49609-followup2.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants