Skip to content

Commit

Permalink
Support to_utc_timestamp [databricks] (#10144)
Browse files Browse the repository at this point in the history
* Support to_utc_timestamp

Signed-off-by: Haoyang Li <[email protected]>

* add tests

Signed-off-by: Haoyang Li <[email protected]>

* perf tests

Signed-off-by: Haoyang Li <[email protected]>

* claen up

Signed-off-by: Haoyang Li <[email protected]>

* claen up

Signed-off-by: Haoyang Li <[email protected]>

* address comment

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
  • Loading branch information
thirtiseven authored Jan 8, 2024
1 parent d4ec3e9 commit 328a514
Show file tree
Hide file tree
Showing 8 changed files with 253 additions and 84 deletions.
1 change: 1 addition & 0 deletions docs/additional-functionality/advanced_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
<a name="sql.expression.TimeAdd"></a>spark.rapids.sql.expression.TimeAdd| |Adds interval to timestamp|true|None|
<a name="sql.expression.ToDegrees"></a>spark.rapids.sql.expression.ToDegrees|`degrees`|Converts radians to degrees|true|None|
<a name="sql.expression.ToRadians"></a>spark.rapids.sql.expression.ToRadians|`radians`|Converts degrees to radians|true|None|
<a name="sql.expression.ToUTCTimestamp"></a>spark.rapids.sql.expression.ToUTCTimestamp|`to_utc_timestamp`|Render the input timestamp in UTC|true|None|
<a name="sql.expression.ToUnixTimestamp"></a>spark.rapids.sql.expression.ToUnixTimestamp|`to_unix_timestamp`|Returns the UNIX timestamp of the given time|true|None|
<a name="sql.expression.TransformKeys"></a>spark.rapids.sql.expression.TransformKeys|`transform_keys`|Transform keys in a map using a transform function|true|None|
<a name="sql.expression.TransformValues"></a>spark.rapids.sql.expression.TransformValues|`transform_values`|Transform values in a map using a transform function|true|None|
Expand Down
174 changes: 121 additions & 53 deletions docs/supported_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -6638,7 +6638,7 @@ are limited.
<td> </td>
<td> </td>
<td> </td>
<td><em>PS<br/>Only timezones equivalent to UTC are supported</em></td>
<td><em>PS<br/>Only non-DST(Daylight Savings Time) timezones are supported</em></td>
<td> </td>
<td> </td>
<td> </td>
Expand Down Expand Up @@ -15441,6 +15441,74 @@ are limited.
<td> </td>
</tr>
<tr>
<td rowSpan="3">ToUTCTimestamp</td>
<td rowSpan="3">`to_utc_timestamp`</td>
<td rowSpan="3">Render the input timestamp in UTC</td>
<td rowSpan="3">None</td>
<td rowSpan="3">project</td>
<td>timestamp</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td><em>PS<br/>UTC is only supported TZ for TIMESTAMP</em></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>timezone</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td><em>PS<br/>Only non-DST(Daylight Savings Time) timezones are supported</em></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>result</td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td><em>PS<br/>UTC is only supported TZ for TIMESTAMP</em></td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td rowSpan="3">ToUnixTimestamp</td>
<td rowSpan="3">`to_unix_timestamp`</td>
<td rowSpan="3">Returns the UNIX timestamp of the given time</td>
Expand Down Expand Up @@ -15645,6 +15713,32 @@ are limited.
<td> </td>
</tr>
<tr>
<th>Expression</th>
<th>SQL Functions(s)</th>
<th>Description</th>
<th>Notes</th>
<th>Context</th>
<th>Param/Output</th>
<th>BOOLEAN</th>
<th>BYTE</th>
<th>SHORT</th>
<th>INT</th>
<th>LONG</th>
<th>FLOAT</th>
<th>DOUBLE</th>
<th>DATE</th>
<th>TIMESTAMP</th>
<th>STRING</th>
<th>DECIMAL</th>
<th>NULL</th>
<th>BINARY</th>
<th>CALENDAR</th>
<th>ARRAY</th>
<th>MAP</th>
<th>STRUCT</th>
<th>UDT</th>
</tr>
<tr>
<td rowSpan="4">UnaryMinus</td>
<td rowSpan="4">`negative`</td>
<td rowSpan="4">Negate a numeric value</td>
Expand Down Expand Up @@ -15735,32 +15829,6 @@ are limited.
<td> </td>
</tr>
<tr>
<th>Expression</th>
<th>SQL Functions(s)</th>
<th>Description</th>
<th>Notes</th>
<th>Context</th>
<th>Param/Output</th>
<th>BOOLEAN</th>
<th>BYTE</th>
<th>SHORT</th>
<th>INT</th>
<th>LONG</th>
<th>FLOAT</th>
<th>DOUBLE</th>
<th>DATE</th>
<th>TIMESTAMP</th>
<th>STRING</th>
<th>DECIMAL</th>
<th>NULL</th>
<th>BINARY</th>
<th>CALENDAR</th>
<th>ARRAY</th>
<th>MAP</th>
<th>STRUCT</th>
<th>UDT</th>
</tr>
<tr>
<td rowSpan="4">UnaryPositive</td>
<td rowSpan="4">`positive`</td>
<td rowSpan="4">A numeric value with a + in front of it</td>
Expand Down Expand Up @@ -16018,6 +16086,32 @@ are limited.
<td> </td>
</tr>
<tr>
<th>Expression</th>
<th>SQL Functions(s)</th>
<th>Description</th>
<th>Notes</th>
<th>Context</th>
<th>Param/Output</th>
<th>BOOLEAN</th>
<th>BYTE</th>
<th>SHORT</th>
<th>INT</th>
<th>LONG</th>
<th>FLOAT</th>
<th>DOUBLE</th>
<th>DATE</th>
<th>TIMESTAMP</th>
<th>STRING</th>
<th>DECIMAL</th>
<th>NULL</th>
<th>BINARY</th>
<th>CALENDAR</th>
<th>ARRAY</th>
<th>MAP</th>
<th>STRUCT</th>
<th>UDT</th>
</tr>
<tr>
<td rowSpan="2">Upper</td>
<td rowSpan="2">`upper`, `ucase`</td>
<td rowSpan="2">String uppercase operator</td>
Expand Down Expand Up @@ -16112,32 +16206,6 @@ are limited.
<td> </td>
</tr>
<tr>
<th>Expression</th>
<th>SQL Functions(s)</th>
<th>Description</th>
<th>Notes</th>
<th>Context</th>
<th>Param/Output</th>
<th>BOOLEAN</th>
<th>BYTE</th>
<th>SHORT</th>
<th>INT</th>
<th>LONG</th>
<th>FLOAT</th>
<th>DOUBLE</th>
<th>DATE</th>
<th>TIMESTAMP</th>
<th>STRING</th>
<th>DECIMAL</th>
<th>NULL</th>
<th>BINARY</th>
<th>CALENDAR</th>
<th>ARRAY</th>
<th>MAP</th>
<th>STRUCT</th>
<th>UDT</th>
</tr>
<tr>
<td rowSpan="3">WindowExpression</td>
<td rowSpan="3"> </td>
<td rowSpan="3">Calculates a return value for every input row of a table based on a group (or "window") of rows</td>
Expand Down
53 changes: 32 additions & 21 deletions integration_tests/src/main/python/date_time_test.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2020-2023, NVIDIA CORPORATION.
# Copyright (c) 2020-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -286,41 +286,52 @@ def test_unsupported_fallback_to_unix_timestamp(data_gen):
spark, [("a", data_gen), ("b", string_gen)], length=10).selectExpr(
"to_unix_timestamp(a, b)"),
"ToUnixTimestamp")

supported_timezones = ["Asia/Shanghai", "UTC", "UTC+0", "UTC-0", "GMT", "GMT+0", "GMT-0", "EST", "MST", "VST"]
unsupported_timezones = ["PST", "NST", "AST", "America/Los_Angeles", "America/New_York", "America/Chicago"]

@pytest.mark.parametrize('time_zone', ["Asia/Shanghai", "UTC", "UTC+0", "UTC-0", "GMT", "GMT+0", "GMT-0"], ids=idfn)
@pytest.mark.parametrize('data_gen', [timestamp_gen], ids=idfn)
@tz_sensitive_test
@pytest.mark.parametrize('time_zone', supported_timezones, ids=idfn)
@allow_non_gpu(*non_utc_allow)
def test_from_utc_timestamp(data_gen, time_zone):
def test_from_utc_timestamp(time_zone):
assert_gpu_and_cpu_are_equal_collect(
lambda spark: unary_op_df(spark, data_gen).select(f.from_utc_timestamp(f.col('a'), time_zone)))
lambda spark: unary_op_df(spark, timestamp_gen).select(f.from_utc_timestamp(f.col('a'), time_zone)))

@allow_non_gpu('ProjectExec')
@pytest.mark.parametrize('time_zone', ["PST", "NST", "AST", "America/Los_Angeles", "America/New_York", "America/Chicago"], ids=idfn)
@pytest.mark.parametrize('data_gen', [timestamp_gen], ids=idfn)
@tz_sensitive_test
def test_from_utc_timestamp_unsupported_timezone_fallback(data_gen, time_zone):
@pytest.mark.parametrize('time_zone', unsupported_timezones, ids=idfn)
def test_from_utc_timestamp_unsupported_timezone_fallback(time_zone):
assert_gpu_fallback_collect(
lambda spark: unary_op_df(spark, data_gen).select(f.from_utc_timestamp(f.col('a'), time_zone)),
lambda spark: unary_op_df(spark, timestamp_gen).select(f.from_utc_timestamp(f.col('a'), time_zone)),
'FromUTCTimestamp')

@pytest.mark.parametrize('time_zone', ["UTC", "Asia/Shanghai", "EST", "MST", "VST"], ids=idfn)
@pytest.mark.parametrize('data_gen', [timestamp_gen], ids=idfn)
@tz_sensitive_test
@allow_non_gpu('ProjectExec')
def test_unsupported_fallback_from_utc_timestamp():
time_zone_gen = StringGen(pattern="UTC")
assert_gpu_fallback_collect(
lambda spark: gen_df(spark, [("a", timestamp_gen), ("tzone", time_zone_gen)]).selectExpr(
"from_utc_timestamp(a, tzone)"),
'FromUTCTimestamp')

@allow_non_gpu(*non_utc_allow)
def test_from_utc_timestamp_supported_timezones(data_gen, time_zone):
@pytest.mark.parametrize('time_zone', supported_timezones, ids=idfn)
def test_to_utc_timestamp(time_zone):
assert_gpu_and_cpu_are_equal_collect(
lambda spark: unary_op_df(spark, data_gen).select(f.from_utc_timestamp(f.col('a'), time_zone)))
lambda spark: unary_op_df(spark, timestamp_gen).select(f.to_utc_timestamp(f.col('a'), time_zone)))

@allow_non_gpu('ProjectExec')
@pytest.mark.parametrize('time_zone', unsupported_timezones, ids=idfn)
@pytest.mark.parametrize('data_gen', [timestamp_gen], ids=idfn)
def test_unsupported_fallback_from_utc_timestamp(data_gen):
time_zone_gen = StringGen(pattern="UTC")
def test_to_utc_timestamp_unsupported_timezone_fallback(data_gen, time_zone):
assert_gpu_fallback_collect(
lambda spark: gen_df(spark, [("a", data_gen), ("tzone", time_zone_gen)]).selectExpr(
"from_utc_timestamp(a, tzone)"),
'FromUTCTimestamp')
lambda spark: unary_op_df(spark, data_gen).select(f.to_utc_timestamp(f.col('a'), time_zone)),
'ToUTCTimestamp')

@allow_non_gpu('ProjectExec')
def test_unsupported_fallback_to_utc_timestamp():
time_zone_gen = StringGen(pattern="UTC")
assert_gpu_fallback_collect(
lambda spark: gen_df(spark, [("a", timestamp_gen), ("tzone", time_zone_gen)]).selectExpr(
"to_utc_timestamp(a, tzone)"),
'ToUTCTimestamp')

@allow_non_gpu('ProjectExec')
@pytest.mark.parametrize('data_gen', [long_gen], ids=idfn)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1798,10 +1798,21 @@ object GpuOverrides extends Logging {
ExprChecks.binaryProject(TypeSig.TIMESTAMP, TypeSig.TIMESTAMP,
("timestamp", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP),
("timezone", TypeSig.lit(TypeEnum.STRING)
.withPsNote(TypeEnum.STRING, "Only timezones equivalent to UTC are supported"),
.withPsNote(TypeEnum.STRING,
"Only non-DST(Daylight Savings Time) timezones are supported"),
TypeSig.lit(TypeEnum.STRING))),
(a, conf, p, r) => new FromUTCTimestampExprMeta(a, conf, p, r)
),
expr[ToUTCTimestamp](
"Render the input timestamp in UTC",
ExprChecks.binaryProject(TypeSig.TIMESTAMP, TypeSig.TIMESTAMP,
("timestamp", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP),
("timezone", TypeSig.lit(TypeEnum.STRING)
.withPsNote(TypeEnum.STRING,
"Only non-DST(Daylight Savings Time) timezones are supported"),
TypeSig.lit(TypeEnum.STRING))),
(a, conf, p, r) => new ToUTCTimestampExprMeta(a, conf, p, r)
),
expr[Pmod](
"Pmod",
// Decimal support disabled https://github.com/NVIDIA/spark-rapids/issues/7553
Expand Down
Loading

0 comments on commit 328a514

Please sign in to comment.