-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49609][PYTHON][CONNECT] Add API compatibility check between Classic and Connect #48085
Conversation
…assic and Connect
I'm wondering if we should mention the parameter renaming (fix) in the user-facing change section of the PR description |
from pyspark.testing.sqlutils import ReusedSQLTestCase | ||
|
||
|
||
class ConnectCompatibilityTestsMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic can we add some test to verify that this keeps functioning? Basically we need some a couple of broken signatures.
let's create a umbrella JIRA, and cover other API like functions. |
Merged to master. Let me create an umbrella JIRA to address the rest of tasks. Thanks all for the review! |
…ependencies are not found ### What changes were proposed in this pull request? This PR is a followup of #48085 that skips the compatibility tests if Spark Connect dependencies are not installed. ### Why are the changes needed? To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11016544408/job/30592416115 which does not have PyArrow installed. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No Closes #48239 from HyukjinKwon/SPARK-49609-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Haejoon Lee <[email protected]>
…when connect dependencies not installed ### What changes were proposed in this pull request? This PR is a followup of #48085 that skips the connect import which requires Connect dependencies. ### Why are the changes needed? To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11035779484/job/30652736098 which does not have PyArrow installed. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No Closes #48259 from HyukjinKwon/SPARK-49609-followup2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…assic and Connect ### What changes were proposed in this pull request? This PR proposes to add API compatibility check between Classic and Connect. This PR also includes updating both APIs to the same signature. ### Why are the changes needed? APIs supported on both Spark Connect and Spark Classic should guarantee the same signature, such as argument and return types. For example, test would fail when the signature of API is mismatched: ``` Signature mismatch in Column method 'dropFields' Classic: (self, *fieldNames: str) -> pyspark.sql.column.Column Connect: (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column <Signature (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column> != <Signature (self, *fieldNames: str) -> pyspark.sql.column.Column> Expected :<Signature (self, *fieldNames: str) -> pyspark.sql.column.Column> Actual :<Signature (self, *fieldNames: 'ColumnOrName') -> pyspark.sql.column.Column> ``` ### Does this PR introduce _any_ user-facing change? No, it is a test to prevent future API behavior inconsistencies between Classic and Connect. ### How was this patch tested? Added UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48085 from itholic/SPARK-49609. Authored-by: Haejoon Lee <[email protected]> Signed-off-by: Haejoon Lee <[email protected]>
…ependencies are not found ### What changes were proposed in this pull request? This PR is a followup of apache#48085 that skips the compatibility tests if Spark Connect dependencies are not installed. ### Why are the changes needed? To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11016544408/job/30592416115 which does not have PyArrow installed. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48239 from HyukjinKwon/SPARK-49609-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Haejoon Lee <[email protected]>
…when connect dependencies not installed ### What changes were proposed in this pull request? This PR is a followup of apache#48085 that skips the connect import which requires Connect dependencies. ### Why are the changes needed? To recover the PyPy3 build https://github.com/apache/spark/actions/runs/11035779484/job/30652736098 which does not have PyArrow installed. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48259 from HyukjinKwon/SPARK-49609-followup2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR proposes to add API compatibility check between Classic and Connect.
This PR also includes updating both APIs to the same signature.
Why are the changes needed?
APIs supported on both Spark Connect and Spark Classic should guarantee the same signature, such as argument and return types.
For example, test would fail when the signature of API is mismatched:
Does this PR introduce any user-facing change?
No, it is a test to prevent future API behavior inconsistencies between Classic and Connect.
How was this patch tested?
Added UTs.
Was this patch authored or co-authored using generative AI tooling?
No.