Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev #28955

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 30, 2020

What changes were proposed in this pull request?

This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours.

Most of changes in tests are back before #27728 by combining withNestedDataFrame and withParquetDataFrame.

Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment)

Why are the changes needed?

For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

Manually tested.

@HyukjinKwon
Copy link
Member Author

cc @viirya, @dbtsai, @MaxGekk, @cloud-fan

@@ -501,38 +508,37 @@ abstract class ParquetFilterSuite extends QueryTest with ParquetTest with Shared
}

val data = (1 to 4).map(i => Tuple1(Option(i.b)))
import testImplicits._
withNestedDataFrame(data.toDF()) { case (inputDF, colName, resultFun) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review them one line by one line, assuming they just remove the outer withNestedDataFrame

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

withSQLConf(SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString) {
withSQLConf(
SQLConf.DATETIME_JAVA8API_ENABLED.key -> java8Api.toString,
SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one diff here.

protected def withParquetDataFrame(df: DataFrame, testVectorized: Boolean = true)
(f: DataFrame => Unit): Unit = {
withTempPath { file =>
withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_WRITE.key -> "CORRECTED") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for minimizing the diff in test. After this gets merged, I will minimize the test diff in #28761.

withTempPath { file =>
millisData.map(i => Tuple1(Timestamp.valueOf(i))).toDF
.write.format(dataSourceName).save(file.getCanonicalPath)
readParquetFile(file.getCanonicalPath) { df =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From 2 lines to 4 lines? This looks like an exception. Is this inevitable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I couldn't find a better way without having another method.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for refactoring. Looks neater. I guess you are assuming a backporting to your internal branch, but Apache Spark will not backport this to branch-3.0 and this only adds additional commit. So, minimize the diff as a follow-up for the existing commits doesn't make sense to Apache Spark.

In short, this is just a normal commit doing refactoring for the future PRs. So, please remove minimizes the diff from the title and PR description. That's not a benefit to Apache Spark master branch (AS-IS) because the commit log grows monotonically always.

Also, we had better use a new JIRA ID because all of those(SPARK-25556, SPARK-17636, SPARK-31026 , SPARK-31060) are already shipped as a part of 3.0.0. Otherwise, we will lose a traceability for this improvement commit because this will not land on branch-3.0.

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124647 has finished for PR 28955 at commit 7a36dd3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jun 30, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124690 has finished for PR 28955 at commit 7a36dd3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jun 30, 2020

retest this please

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jul 1, 2020

Oh sure @dongjoon-hyun. Let's use a new JIRA ID. But just to give you a bit of more contexts, I said "minimize the diff" because it will minimize the diff at #28761 (comment), and if other codes match.

I was thinking about backporting this, @dongjoon-hyun to remove the unnecessary diff when you backport. It's a test-only PR so I guess it's fine to backport. For example, you can backport a test from master to branch-2.4 at https://github.com/apache/spark/blob/branch-2.4/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala#L423-L447

This isn't related to any internal branch stuff :-). it's just from #28761 (comment).

@HyukjinKwon HyukjinKwon changed the title [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Avoids changing test utils and minimizes the diff [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Keep the original tests and codes to avoid potential conflicts Jul 1, 2020
@HyukjinKwon HyukjinKwon changed the title [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Keep the original tests and codes to avoid potential conflicts [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Keep the original tests and codes to avoid potential conflicts in dev Jul 1, 2020
@HyukjinKwon
Copy link
Member Author

BTW @dbtsai, let's consider to block a PR even when the comments are from tests in particular when the releases are close. Seems like it can be an issue in this case, and I definitely want to avoid such current situation that complicates backporting and matching with other codes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][FOLLOW-UP] Keep the original tests and codes to avoid potential conflicts in dev [SPARK-32142] Keep the original tests and codes to avoid potential conflicts in dev Jul 1, 2020
@HyukjinKwon HyukjinKwon changed the title [SPARK-32142] Keep the original tests and codes to avoid potential conflicts in dev [SPARK-32142][SQL][TESTS] Keep the original tests and codes to avoid potential conflicts in dev Jul 1, 2020
@dongjoon-hyun
Copy link
Member

Thank you for updating.

@SparkQA
Copy link

SparkQA commented Jul 1, 2020

Test build #124700 has finished for PR 28955 at commit 7a36dd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

HyukjinKwon added a commit that referenced this pull request Jul 1, 2020
…potential conflicts in dev

### What changes were proposed in this pull request?

This PR proposes to partially reverts back in the tests and some codes at #27728 without touching any behaivours.

Most of changes in tests are back before #27728 by combining `withNestedDataFrame` and `withParquetDataFrame`.

Basically, it addresses the comments #27728 (comment), and my own comment in another PR at #28761 (comment)

### Why are the changes needed?

For maintenance purpose and to avoid a potential conflicts during backports. And also in case when other codes are matched with this.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

Closes #28955 from HyukjinKwon/SPARK-25556-followup.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 8194d9e)
Signed-off-by: HyukjinKwon <[email protected]>
@HyukjinKwon
Copy link
Member Author

Thank you guys. Merged to master and branch-3.0.

@HyukjinKwon HyukjinKwon deleted the SPARK-25556-followup branch July 27, 2020 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants