-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev #28955
Conversation
cc @viirya, @dbtsai, @MaxGekk, @cloud-fan |
ffe1583
to
7a36dd3
Compare
@@ -501,38 +508,37 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared | |||
} | |||
|
|||
val data = (1 to 4).map(i => Tuple1(Option(i.b))) | |||
import testImplicits._ | |||
withNestedDataFrame(data.toDF()) { case (inputDF, colName, resultFun) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review them one line by one line, assuming they just remove the outer withNestedDataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup
withSQLConf(SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString) { | ||
withSQLConf( | ||
SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString, | ||
SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one diff here.
protected def withParquetDataFrame(df: DataFrame, testVectorized: Boolean = true) | ||
(f: DataFrame => Unit): Unit = { | ||
withTempPath { file => | ||
withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for minimizing the diff in test. After this gets merged, I will minimize the test diff in #28761.
withTempPath { file => | ||
millisData.map(i => Tuple1(Timestamp.valueOf(i))).toDF | ||
.write.format(dataSourceName).save(file.getCanonicalPath) | ||
readParquetFile(file.getCanonicalPath) { df => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From 2 lines to 4 lines? This looks like an exception. Is this inevitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I couldn't find a better way without having another method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for refactoring. Looks neater. I guess you are assuming a backporting to your internal branch, but Apache Spark will not backport this to branch-3.0
and this only adds additional commit. So, minimize the diff
as a follow-up for the existing commits doesn't make sense to Apache Spark.
In short, this is just a normal commit doing refactoring for the future PRs. So, please remove minimizes the diff
from the title and PR description. That's not a benefit to Apache Spark master
branch (AS-IS) because the commit log grows monotonically always.
Also, we had better use a new JIRA ID because all of those(SPARK-25556, SPARK-17636, SPARK-31026 , SPARK-31060) are already shipped as a part of 3.0.0. Otherwise, we will lose a traceability for this improvement commit because this will not land on branch-3.0
.
Test build #124647 has finished for PR 28955 at commit
|
retest this please |
Test build #124690 has finished for PR 28955 at commit
|
retest this please |
Oh sure @dongjoon-hyun. Let's use a new JIRA ID. But just to give you a bit of more contexts, I said "minimize the diff" because it will minimize the diff at #28761 (comment), and if other codes match. I was thinking about backporting this, @dongjoon-hyun to remove the unnecessary diff when you backport. It's a test-only PR so I guess it's fine to backport. For example, you can backport a test from This isn't related to any internal branch stuff :-). it's just from #28761 (comment). |
BTW @dbtsai, let's consider to block a PR even when the comments are from tests in particular when the releases are close. Seems like it can be an issue in this case, and I definitely want to avoid such current situation that complicates backporting and matching with other codes. |
Thank you for updating. |
Test build #124700 has finished for PR 28955 at commit
|
…potential conflicts in dev ### What changes were proposed in this pull request? This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours. Most of changes in tests are back before #27728 by combining `withNestedDataFrame` and `withParquetDataFrame`. Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment) ### Why are the changes needed? For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested. Closes #28955 from HyukjinKwon/SPARK-25556-followup. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 8194d9e) Signed-off-by: HyukjinKwon <[email protected]>
Thank you guys. Merged to master and branch-3.0. |
What changes were proposed in this pull request?
This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours.
Most of changes in tests are back before #27728 by combining
withNestedDataFrame
andwithParquetDataFrame
.Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment)
Why are the changes needed?
For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this.
Does this PR introduce any user-facing change?
No, dev-only.
How was this patch tested?
Manually tested.