Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35220][SQL] DayTimeIntervalType/YearMonthIntervalType show different between Hive SerDe and row format delimited #32335

Closed
wants to merge 2 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

DayTimeIntervalType/YearMonthIntervalString show different between Hive SerDe and row format delimited.
Create this pr to add a test and have disscuss.

For this problem I think we have two direction:

  1. leave it as current and add a item t explain this in migration guide docs.
  2. Since we should not change hive serde's behavior, so we can cast spark row format delimited's behavior to use cast DayTimeIntervalType/YearMonthIntervalType as HIVE_STYLE

Why are the changes needed?

Add UT

Does this PR introduce any user-facing change?

No

How was this patch tested?

added ut

…ifferent between hive SerDe and row format delimited
@github-actions github-actions bot added the SQL label Apr 25, 2021
@AngersZhuuuu
Copy link
Contributor Author

Gentle ping @cloud-fan @MaxGekk @maropu

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-35220][SQL] DayTimeIntervalType/YearMonthIntervalString show different between Hive SerDe and row format delimited [SPARK-35220][SQL] DayTimeIntervalType/YearMonthIntervalType show different between Hive SerDe and row format delimited Apr 25, 2021
@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42441/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42441/

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Test build #137920 has finished for PR 32335 at commit ab94c21.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2021

Test build #137921 has finished for PR 32335 at commit 9f1bd02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@AngersZhuuuu
Copy link
Contributor Author

@HyukjinKwon Since you just merged this, I think I need to add a follow up one to add this behavior in migration guide? ok?

@HyukjinKwon
Copy link
Member

please go ahead.

|FROM v
|""".stripMargin),
identity,
Row("INTERVAL '1 00:00:00' DAY TO SECOND", "INTERVAL '0-10' YEAR TO MONTH") :: Nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the spark-sql shell and df.show have different formats for intervals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the spark-sql shell and df.show have different formats for intervals?

Yea, have this problem too, since spark sql follow hive format. What should I to do next?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is interval format the only difference between hive format and spark cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is interval format the only difference between hive format and spark cast?

Yea, ANSI_STYLE and HIVE_STYLE

Copy link
Contributor

@cloud-fan cloud-fan Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a new Expression ToHiveString and use it in df.show and TRANSFORM, so that they are consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a new Expression ToHiveString and use it in df.show and TRANSFORM, so that they are consistent.

Yea, create a ticket https://issues.apache.org/jira/browse/SPARK-35228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants