[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value #29428

AngersZhuuuu · 2020-08-14T02:01:09Z

What changes were proposed in this pull request?

For SQL

SELECT TRANSFORM(a, b, c)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
  NULL DEFINED AS 'null'
  USING 'cat' AS (a, b, c)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
  NULL DEFINED AS 'NULL'
FROM testData

The correct

TOK_TABLEROWFORMATFIELD should be , nut actually ','

TOK_TABLEROWFORMATLINES should be \n but actually '\n'

Why are the changes needed?

Fix string value format

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

…ormat value

AngersZhuuuu · 2020-08-14T02:01:21Z

FYI @maropu @cloud-fan

SparkQA · 2020-08-14T06:13:51Z

Test build #127430 has finished for PR 29428 at commit b4d816e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-08-14T06:52:43Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

@@ -330,4 +331,44 @@ class SparkSqlParserSuite extends AnalysisTest {
    assertEqual("ADD FILE /path with space/abc.txt", AddFileCommand("/path with space/abc.txt"))
    assertEqual("ADD JAR /path with space/abc.jar", AddJarCommand("/path with space/abc.jar"))
  }
+
+  test("SPARK-32608: script transform with row format delimit") {
+    assertEqual(


Could you add end-2-end tests, too?

Could you add end-2-end tests, too?

Added in BasicScriptTransformationExecSuite

maropu · 2020-08-14T09:53:18Z

sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala

+      df.createTempView("v")
+
+      // input/output same delimit
+      val query1 = sql(


nit: could you inline this query in line 369?

369

Yea and extract decimalToString

sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala

maropu

cc: @cloud-fan

SparkQA · 2020-08-14T11:35:25Z

Test build #127444 has finished for PR 29428 at commit 65f69ba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-14T14:10:32Z

Test build #127455 has finished for PR 29428 at commit 0a6c574.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AngersZhuuuu · 2020-08-14T14:24:12Z

retest this please

SparkQA · 2020-08-14T18:49:38Z

Test build #127462 has finished for PR 29428 at commit 0a6c574.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AngersZhuuuu · 2020-08-16T13:14:11Z

@maropu extract entry method as common util method.

maropu · 2020-08-16T13:24:29Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

@@ -38,6 +38,7 @@ import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructType
 * defined in the Catalyst module.
 */
 class SparkSqlParserSuite extends AnalysisTest {
+  import org.apache.spark.sql.catalyst.dsl.expressions._


Why did you put this import here instead of the top?

Why did you put this import here instead of the top?

Copy from PlanParserSuite.....
Should I move this line to top in PlanParserSuite in pr #29414

Ur, I see. Its okay as it is.

maropu · 2020-08-16T13:24:52Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala

@@ -83,6 +83,11 @@ object ParserUtils {
    node.getText.slice(1, node.getText.size - 1)
  }

+  /** Collect the entries if any. */
+  def entry(key: String, value: Token): Seq[(String, String)] = {


Ah, I see. This update looks okay.

SparkQA · 2020-08-16T17:32:39Z

Test build #127487 has finished for PR 29428 at commit de41b19.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AngersZhuuuu · 2020-08-17T11:27:36Z

cc @cloud-fan

cloud-fan · 2020-08-19T08:31:35Z

good catch! merging to master

cloud-fan · 2020-08-19T08:32:33Z

@AngersZhuuuu can you open a new PR for 3.0?

AngersZhuuuu · 2020-08-19T08:33:03Z

@AngersZhuuuu can you open a new PR for 3.0?

Sure

viirya · 2020-08-23T05:13:12Z

@AngersZhuuuu The test "org.apache.spark.sql.hive.execution.HiveScriptTransformationSuite.SPARK-32608: Script Transform ROW FORMAT DELIMIT value should format value" is failed under hive-1.2 profile in master and branch-3.0 branches. Can you look at it?

AngersZhuuuu · 2020-08-23T05:41:40Z

@AngersZhuuuu The test "org.apache.spark.sql.hive.execution.HiveScriptTransformationSuite.SPARK-32608: Script Transform ROW FORMAT DELIMIT value should format value" is failed under hive-1.2 profile in master and branch-3.0 branches. Can you look at it?

Checking

viirya · 2020-08-23T05:44:49Z

@AngersZhuuuu Thanks. BTW, my PR accidentially caused compilation error for hive-1.2 profile, I'm reverting it in #29519 29519 first, so you can debug and fix the failed test.

AngersZhuuuu · 2020-08-23T05:50:27Z

@AngersZhuuuu Thanks. BTW, my PR accidentially caused compilation error for hive-1.2 profile, I'm reverting it in #29519 29519 first, so you can debug and fix the failed test.

Can you show me some link about this UT failed in hiev-1.2

viirya · 2020-08-23T06:22:02Z

@AngersZhuuuu

master branch: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127796/testReport/
branch-3.0 branch: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127785/testReport/

AngersZhuuuu · 2020-08-23T06:27:47Z

@AngersZhuuuu

master branch: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127796/testReport/
branch-3.0 branch: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127785/testReport/

I have know the reason, raise a pr soon and will @ you later.

viirya · 2020-08-23T06:32:23Z

Thanks @AngersZhuuuu

AngersZhuuuu · 2020-08-23T06:33:40Z

Thanks @AngersZhuuuu

See #29520

…ansform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in #29428 (comment) by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29520 from AngersZhuuuu/SPARK-32608-FOLLOW. Authored-by: angerszhu <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]>

…pt Transform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in #29428 (comment) by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29521 from AngersZhuuuu/SPARK-32608-3.0-FOLLOW-UP. Authored-by: angerszhu <[email protected]> Signed-off-by: Liang-Chi Hsieh <[email protected]>

[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should f…

b929651

…ormat value

probot-autolabeler bot added the SQL label Aug 14, 2020

Update SparkSqlParserSuite.scala

b4d816e

maropu reviewed Aug 14, 2020

View reviewed changes

Update BaseScriptTransformationSuite.scala

65f69ba

maropu reviewed Aug 14, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala Outdated Show resolved Hide resolved

maropu approved these changes Aug 14, 2020

View reviewed changes

Update BaseScriptTransformationSuite.scala

0a6c574

make entry as common method

de41b19

maropu reviewed Aug 16, 2020

View reviewed changes

cloud-fan closed this in 03e2de9 Aug 19, 2020

AngersZhuuuu mentioned this pull request Aug 23, 2020

[SPARK-32608][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Script Transform ROW FORMAT DELIMIT value should format value #29520

Closed

AngersZhuuuu mentioned this pull request Aug 23, 2020

[SPARK-32608][SQL][3.0][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Script Transform ROW FORMAT DELIMIT value should format value #29521

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value #29428

[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value #29428

AngersZhuuuu commented Aug 14, 2020

AngersZhuuuu commented Aug 14, 2020 •

edited

Loading

SparkQA commented Aug 14, 2020

maropu Aug 14, 2020

AngersZhuuuu Aug 14, 2020

maropu Aug 14, 2020

AngersZhuuuu Aug 14, 2020

maropu left a comment

SparkQA commented Aug 14, 2020

SparkQA commented Aug 14, 2020

AngersZhuuuu commented Aug 14, 2020

SparkQA commented Aug 14, 2020

AngersZhuuuu commented Aug 16, 2020

maropu Aug 16, 2020

AngersZhuuuu Aug 16, 2020

maropu Aug 16, 2020

maropu Aug 16, 2020

SparkQA commented Aug 16, 2020

AngersZhuuuu commented Aug 17, 2020

cloud-fan commented Aug 19, 2020

cloud-fan commented Aug 19, 2020

AngersZhuuuu commented Aug 19, 2020

viirya commented Aug 23, 2020

AngersZhuuuu commented Aug 23, 2020

viirya commented Aug 23, 2020

AngersZhuuuu commented Aug 23, 2020

viirya commented Aug 23, 2020 •

edited

Loading

AngersZhuuuu commented Aug 23, 2020

viirya commented Aug 23, 2020

AngersZhuuuu commented Aug 23, 2020

[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value #29428

[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value #29428

Conversation

AngersZhuuuu commented Aug 14, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AngersZhuuuu commented Aug 14, 2020 • edited Loading

SparkQA commented Aug 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maropu left a comment

Choose a reason for hiding this comment

SparkQA commented Aug 14, 2020

SparkQA commented Aug 14, 2020

AngersZhuuuu commented Aug 14, 2020

SparkQA commented Aug 14, 2020

AngersZhuuuu commented Aug 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 16, 2020

AngersZhuuuu commented Aug 17, 2020

cloud-fan commented Aug 19, 2020

cloud-fan commented Aug 19, 2020

AngersZhuuuu commented Aug 19, 2020

viirya commented Aug 23, 2020

AngersZhuuuu commented Aug 23, 2020

viirya commented Aug 23, 2020

AngersZhuuuu commented Aug 23, 2020

viirya commented Aug 23, 2020 • edited Loading

AngersZhuuuu commented Aug 23, 2020

viirya commented Aug 23, 2020

AngersZhuuuu commented Aug 23, 2020

AngersZhuuuu commented Aug 14, 2020 •

edited

Loading

viirya commented Aug 23, 2020 •

edited

Loading