Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value #29428

Closed
wants to merge 5 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

For SQL

SELECT TRANSFORM(a, b, c)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
  NULL DEFINED AS 'null'
  USING 'cat' AS (a, b, c)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
  NULL DEFINED AS 'NULL'
FROM testData

The correct

TOK_TABLEROWFORMATFIELD should be , nut actually ','

TOK_TABLEROWFORMATLINES should be \n but actually '\n'

Why are the changes needed?

Fix string value format

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Aug 14, 2020

FYI @maropu @cloud-fan

@SparkQA
Copy link

SparkQA commented Aug 14, 2020

Test build #127430 has finished for PR 29428 at commit b4d816e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -330,4 +331,44 @@ class SparkSqlParserSuite extends AnalysisTest {
assertEqual("ADD FILE /path with space/abc.txt", AddFileCommand("/path with space/abc.txt"))
assertEqual("ADD JAR /path with space/abc.jar", AddJarCommand("/path with space/abc.jar"))
}

test("SPARK-32608: script transform with row format delimit") {
assertEqual(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add end-2-end tests, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add end-2-end tests, too?

Added in BasicScriptTransformationExecSuite

df.createTempView("v")

// input/output same delimit
val query1 = sql(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you inline this query in line 369?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

369

Yea and extract decimalToString

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Aug 14, 2020

Test build #127444 has finished for PR 29428 at commit 65f69ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2020

Test build #127455 has finished for PR 29428 at commit 0a6c574.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 14, 2020

Test build #127462 has finished for PR 29428 at commit 0a6c574.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@maropu extract entry method as common util method.

@@ -38,6 +38,7 @@ import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructType
* defined in the Catalyst module.
*/
class SparkSqlParserSuite extends AnalysisTest {
import org.apache.spark.sql.catalyst.dsl.expressions._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you put this import here instead of the top?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you put this import here instead of the top?

Copy from PlanParserSuite.....
Should I move this line to top in PlanParserSuite in pr #29414

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I see. Its okay as it is.

@@ -83,6 +83,11 @@ object ParserUtils {
node.getText.slice(1, node.getText.size - 1)
}

/** Collect the entries if any. */
def entry(key: String, value: Token): Seq[(String, String)] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. This update looks okay.

@SparkQA
Copy link

SparkQA commented Aug 16, 2020

Test build #127487 has finished for PR 29428 at commit de41b19.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

cc @cloud-fan

@cloud-fan
Copy link
Contributor

good catch! merging to master

@cloud-fan
Copy link
Contributor

@AngersZhuuuu can you open a new PR for 3.0?

@cloud-fan cloud-fan closed this in 03e2de9 Aug 19, 2020
@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu can you open a new PR for 3.0?

Sure

@viirya
Copy link
Member

viirya commented Aug 23, 2020

@AngersZhuuuu The test "org.apache.spark.sql.hive.execution.HiveScriptTransformationSuite.SPARK-32608: Script Transform ROW FORMAT DELIMIT value should format value" is failed under hive-1.2 profile in master and branch-3.0 branches. Can you look at it?

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu The test "org.apache.spark.sql.hive.execution.HiveScriptTransformationSuite.SPARK-32608: Script Transform ROW FORMAT DELIMIT value should format value" is failed under hive-1.2 profile in master and branch-3.0 branches. Can you look at it?

Checking

@viirya
Copy link
Member

viirya commented Aug 23, 2020

@AngersZhuuuu Thanks. BTW, my PR accidentially caused compilation error for hive-1.2 profile, I'm reverting it in #29519 29519 first, so you can debug and fix the failed test.

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu Thanks. BTW, my PR accidentially caused compilation error for hive-1.2 profile, I'm reverting it in #29519 29519 first, so you can debug and fix the failed test.

Can you show me some link about this UT failed in hiev-1.2

@AngersZhuuuu
Copy link
Contributor Author

@viirya
Copy link
Member

viirya commented Aug 23, 2020

Thanks @AngersZhuuuu

@AngersZhuuuu
Copy link
Contributor Author

Thanks @AngersZhuuuu

See #29520

viirya pushed a commit that referenced this pull request Aug 23, 2020
…ansform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in #29428 (comment) by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29520 from AngersZhuuuu/SPARK-32608-FOLLOW.

Authored-by: angerszhu <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
viirya pushed a commit that referenced this pull request Aug 23, 2020
…pt Transform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in #29428 (comment) by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29521 from AngersZhuuuu/SPARK-32608-3.0-FOLLOW-UP.

Authored-by: angerszhu <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants