Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35112][SQL] Support Cast string to day-second interval #32271

Closed
wants to merge 15 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Apr 21, 2021

What changes were proposed in this pull request?

Support Cast string to day-seconds interval

Why are the changes needed?

Users can cast day-second interval string to DayTimeIntervalType.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

@AngersZhuuuu
Copy link
Contributor Author

FYI @MaxGekk

@github-actions github-actions bot added the SQL label Apr 21, 2021
@AngersZhuuuu AngersZhuuuu changed the title Spark 35112 [SPARK-35112]][SQL] Support Cast string to day-second interval Apr 21, 2021
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take into account days in safeFromDayTimeString()

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42255/

@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42255/

@MaxGekk MaxGekk changed the title [SPARK-35112]][SQL] Support Cast string to day-second interval [SPARK-35112][SQL] Support Cast string to day-second interval Apr 21, 2021
@SparkQA
Copy link

SparkQA commented Apr 21, 2021

Test build #137728 has finished for PR 32271 at commit b41fd25.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42565/

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42565/

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Test build #138046 has finished for PR 32271 at commit a82bfee.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@MaxGekk I meet a case

IntervalUtils.fromDayTimeString("2:03:04") throw exception

[info]   Cause: java.lang.IllegalArgumentException: requirement failed: Interval string must match day-time format of '^(?<sign>[+|-])?(?<day>\d+) (?<hour>\d{1,2}):(?<minute>\d{1,2}):(?<second>(\d{1,2})(\.(\d{1,9}))?)$': 2:03:04, set spark.sql.legacy.fromDayTimeString.enabled to true to restore the behavior before Spark 3.0.

It's a But right? since IntervalUtils.fromDayTimeString's comment is

  /**
   * Parse dayTime string in form: [-]d HH:mm:ss.nnnnnnnnn and [-]HH:mm:ss.nnnnnnnnn
   *
   * adapted from HiveIntervalDayTime.valueOf
   */
  def fromDayTimeString(s: String): CalendarInterval = {
    fromDayTimeString(s, DAY, SECOND)
  }

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-35112][SQL] Support Cast string to day-second interval [WIP][SPARK-35112][SQL] Support Cast string to day-second interval Apr 29, 2021
@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42598/

@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42598/

@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42600/

@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Test build #138078 has finished for PR 32271 at commit 35dd9f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Test build #138080 has finished for PR 32271 at commit daa9675.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

Current test

Incorrect evaluation (codegen off): cast(cast(INTERVAL '106751991 04:00:54.775807' DAY TO SECOND as string) as day-time interval), actual: 14454775807, expected: 9223372036854775807 (ExpressionEvalHelper.scala:209)

@SparkQA
Copy link

SparkQA commented Apr 30, 2021

Test build #138105 has finished for PR 32271 at commit 6ff0522.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member

MaxGekk commented Apr 30, 2021

@AngersZhuuuu Is it still WIP? Please, ping me when it is ready.

@AngersZhuuuu AngersZhuuuu changed the title [WIP][SPARK-35112][SQL] Support Cast string to day-second interval [SPARK-35112][SQL] Support Cast string to day-second interval May 1, 2021
@AngersZhuuuu
Copy link
Contributor Author

@MaxGekk Can be reviewed now, but currently this method does not seem to be very efficient,Maybe should I add a new method to calculate calculate total microseconds separately?

@AngersZhuuuu
Copy link
Contributor Author

@MaxGekk I think it's ready for review now, but I am confused how can I use
dayTimePattern(DAY -> SECOND) in match case ?

When I write

val regex = dayTimePattern(DAY-> SECOND)

intervalStr match {
    case dayTimePattern(sign, day, hour, minute, secondAndMicro)
.......
}

It can't match, do you know how to match this? If it can be done, we can simplify more code.

@SparkQA
Copy link

SparkQA commented May 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42644/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42644/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Test build #138123 has finished for PR 32271 at commit 46fff9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 1, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42648/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Test build #138127 has finished for PR 32271 at commit 50addb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general except a few comments.

@@ -1775,6 +1775,48 @@ class CastSuite extends CastSuiteBase {
}
}

test("SPARK-35112: Cast string to day-time interval") {
checkEvaluation(cast(Literal.create("0 0:0:0"), DayTimeIntervalType), 0L)
checkEvaluation(cast(Literal.create("INTERVAL '0 0:0:0' DAY TO SECOND"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check INTERVAL in lower case + spaces:

Suggested change
checkEvaluation(cast(Literal.create("INTERVAL '0 0:0:0' DAY TO SECOND"),
checkEvaluation(cast(Literal.create(" interval '0 0:0:0' Day TO second "),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu Your force push reverted the changes I approved. Please, revert them back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu Your force push reverted the changes I approved. Please, revert them back.

Updated

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting for test results.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, address comments.

@SparkQA
Copy link

SparkQA commented May 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42653/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42653/

@SparkQA
Copy link

SparkQA commented May 1, 2021

Test build #138132 has finished for PR 32271 at commit 4f8fc78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

Please, address comments.

Sorry for my late, since yesterday's run failed so I revert and recheck, but some thing urgent break my work.

@SparkQA
Copy link

SparkQA commented May 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42660/

@SparkQA
Copy link

SparkQA commented May 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42660/

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merging to master.
Thank you, @AngersZhuuuu .

@MaxGekk MaxGekk closed this in caa46ce May 2, 2021
@SparkQA
Copy link

SparkQA commented May 2, 2021

Test build #138139 has finished for PR 32271 at commit 4e2608c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -150,6 +150,58 @@ object IntervalUtils {
}
}

private val unquotedDaySecondPattern =
"([+|-])?(\\d+) (\\d{1,2}):(\\d{1,2}):(\\d{1,2})(\\.\\d{1,9})?"
private val quotedDaySecondPattern = (s"^$unquotedDaySecondPattern$$").r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unquotedDaySecondRegex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unquotedDaySecondRegex?

Change this in #32444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants