[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev #28955

HyukjinKwon · 2020-06-30T12:00:01Z

What changes were proposed in this pull request?

This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours.

Most of changes in tests are back before #27728 by combining withNestedDataFrame and withParquetDataFrame.

Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment)

Why are the changes needed?

For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

Manually tested.

HyukjinKwon · 2020-06-30T12:00:29Z

cc @viirya, @dbtsai, @MaxGekk, @cloud-fan

cloud-fan · 2020-06-30T14:46:49Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala

@@ -501,38 +508,37 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared
    }

    val data = (1 to 4).map(i => Tuple1(Option(i.b)))
-    import testImplicits._
-    withNestedDataFrame(data.toDF()) { case (inputDF, colName, resultFun) =>


I didn't review them one line by one line, assuming they just remove the outer withNestedDataFrame

HyukjinKwon · 2020-06-30T14:48:22Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala

-      withSQLConf(SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString) {
+      withSQLConf(
+        SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString,
+        SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") {


There's one diff here.

HyukjinKwon · 2020-06-30T14:48:31Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala

-  protected def withParquetDataFrame(df: DataFrame, testVectorized: Boolean = true)
-      (f: DataFrame => Unit): Unit = {
-    withTempPath { file =>
-      withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") {


viirya

Thanks for minimizing the diff in test. After this gets merged, I will minimize the test diff in #28761.

dongjoon-hyun · 2020-06-30T16:07:01Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala

+          withTempPath { file =>
+            millisData.map(i => Tuple1(Timestamp.valueOf(i))).toDF
+              .write.format(dataSourceName).save(file.getCanonicalPath)
+            readParquetFile(file.getCanonicalPath) { df =>


From 2 lines to 4 lines? This looks like an exception. Is this inevitable?

Yup. I couldn't find a better way without having another method.

dongjoon-hyun

Thank you for refactoring. Looks neater. I guess you are assuming a backporting to your internal branch, but Apache Spark will not backport this to branch-3.0 and this only adds additional commit. So, minimize the diff as a follow-up for the existing commits doesn't make sense to Apache Spark.

In short, this is just a normal commit doing refactoring for the future PRs. So, please remove minimizes the diff from the title and PR description. That's not a benefit to Apache Spark master branch (AS-IS) because the commit log grows monotonically always.

Also, we had better use a new JIRA ID because all of those(SPARK-25556, SPARK-17636, SPARK-31026 , SPARK-31060) are already shipped as a part of 3.0.0. Otherwise, we will lose a traceability for this improvement commit because this will not land on branch-3.0.

SparkQA · 2020-06-30T19:33:09Z

Test build #124647 has finished for PR 28955 at commit 7a36dd3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-06-30T19:48:27Z

retest this please

SparkQA · 2020-06-30T22:22:31Z

Test build #124690 has finished for PR 28955 at commit 7a36dd3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-06-30T23:30:05Z

retest this please

HyukjinKwon · 2020-07-01T01:16:25Z

Oh sure @dongjoon-hyun. Let's use a new JIRA ID. But just to give you a bit of more contexts, I said "minimize the diff" because it will minimize the diff at #28761 (comment), and if other codes match.

I was thinking about backporting this, @dongjoon-hyun to remove the unnecessary diff when you backport. It's a test-only PR so I guess it's fine to backport. For example, you can backport a test from master to branch-2.4 at https://github.com/apache/spark/blob/branch-2.4/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala#L423-L447

This isn't related to any internal branch stuff :-). it's just from #28761 (comment).

HyukjinKwon · 2020-07-01T01:30:36Z

BTW @dbtsai, let's consider to block a PR even when the comments are from tests in particular when the releases are close. Seems like it can be an issue in this case, and I definitely want to avoid such current situation that complicates backporting and matching with other codes.

dongjoon-hyun · 2020-07-01T04:10:20Z

Thank you for updating.

SparkQA · 2020-07-01T05:04:23Z

Test build #124700 has finished for PR 28955 at commit 7a36dd3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…potential conflicts in dev ### What changes were proposed in this pull request? This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours. Most of changes in tests are back before #27728 by combining `withNestedDataFrame` and `withParquetDataFrame`. Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment) ### Why are the changes needed? For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested. Closes #28955 from HyukjinKwon/SPARK-25556-followup. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 8194d9e) Signed-off-by: HyukjinKwon <[email protected]>

HyukjinKwon · 2020-07-01T05:15:43Z

Thank you guys. Merged to master and branch-3.0.

probot-autolabeler bot added the SQL label Jun 30, 2020

Avoid changing test utils and minimise the diff

7a36dd3

HyukjinKwon force-pushed the SPARK-25556-followup branch from ffe1583 to 7a36dd3 Compare June 30, 2020 12:03

cloud-fan reviewed Jun 30, 2020

View reviewed changes

cloud-fan approved these changes Jun 30, 2020

View reviewed changes

HyukjinKwon commented Jun 30, 2020

View reviewed changes

viirya approved these changes Jun 30, 2020

View reviewed changes

viirya reviewed Jun 30, 2020

View reviewed changes

dongjoon-hyun reviewed Jun 30, 2020

View reviewed changes

maropu approved these changes Jun 30, 2020

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Avoids changing test utils and minimizes the diff~~ [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Keep the original tests and codes to avoid potential conflicts Jul 1, 2020

HyukjinKwon changed the title ~~[SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Keep the original tests and codes to avoid potential conflicts in dev~~ [SPARK-32142] Keep the original tests and codes to avoid potential conflicts in dev Jul 1, 2020

HyukjinKwon changed the title ~~[SPARK-32142] Keep the original tests and codes to avoid potential conflicts in dev~~ [SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev Jul 1, 2020

HyukjinKwon closed this in 8194d9e Jul 1, 2020

HyukjinKwon deleted the SPARK-25556-followup branch July 27, 2020 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev #28955

[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev #28955

HyukjinKwon commented Jun 30, 2020 •

edited

Loading

HyukjinKwon commented Jun 30, 2020

cloud-fan Jun 30, 2020

HyukjinKwon Jun 30, 2020

HyukjinKwon Jun 30, 2020

HyukjinKwon Jun 30, 2020

viirya left a comment

dongjoon-hyun Jun 30, 2020

HyukjinKwon Jul 1, 2020

dongjoon-hyun left a comment •

edited

Loading

SparkQA commented Jun 30, 2020

viirya commented Jun 30, 2020

SparkQA commented Jun 30, 2020

maropu commented Jun 30, 2020

HyukjinKwon commented Jul 1, 2020 •

edited

Loading

HyukjinKwon commented Jul 1, 2020

dongjoon-hyun commented Jul 1, 2020

SparkQA commented Jul 1, 2020

HyukjinKwon commented Jul 1, 2020

[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev #28955

[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev #28955

Conversation

HyukjinKwon commented Jun 30, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Jun 30, 2020

cloud-fan Jun 30, 2020

Choose a reason for hiding this comment

HyukjinKwon Jun 30, 2020

Choose a reason for hiding this comment

HyukjinKwon Jun 30, 2020

Choose a reason for hiding this comment

HyukjinKwon Jun 30, 2020

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment

dongjoon-hyun Jun 30, 2020

Choose a reason for hiding this comment

HyukjinKwon Jul 1, 2020

Choose a reason for hiding this comment

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

SparkQA commented Jun 30, 2020

viirya commented Jun 30, 2020

SparkQA commented Jun 30, 2020

maropu commented Jun 30, 2020

HyukjinKwon commented Jul 1, 2020 • edited Loading

HyukjinKwon commented Jul 1, 2020

dongjoon-hyun commented Jul 1, 2020

SparkQA commented Jul 1, 2020

HyukjinKwon commented Jul 1, 2020

HyukjinKwon commented Jun 30, 2020 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

HyukjinKwon commented Jul 1, 2020 •

edited

Loading