[SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core #29085

AngersZhuuuu · 2020-07-13T10:33:18Z

What changes were proposed in this pull request?

Implement script transformation in sql/core module

Renamed file hive/execution/ScriptTransformationExec to hive/execution/HiveScriptTransformationExec (rename file)
Implement SparkScriptTransformationExec based on BaseScriptTransformationExec
Implement SparkScriptTransformationWriterThread based on BaseScriptTransformationWriterThread of writing data
Add rule SparkScripts to support convert script LogicalPlan to SparkPlan in Spark SQL (without hive mode)
Build a common suite BaseScriptTransformationSuite
Add SparkScriptTransformationSuite

In these PR, here offers a very limited support for ROW FORMAT DELIMITED format - it does not rely on a Hive's SerDe class.:

Input: we treat all data as String we cast data as string and pass to script, always in script we process data as string.
Output: here we get input from script as string, and we build a fieldWrite for output type to convert string to corresponding type of data. and we can't support Array/Map/Struct and UserDefinedType now.

private lazy val fieldWriters: Seq[String => Any] = output.map { attr =>
    val converter = CatalystTypeConverters.createToCatalystConverter(attr.dataType)
    attr.dataType match {
      case StringType => (data: String) => converter(data)
      case ByteType => (data: String) => converter(data.toByte)
      case IntegerType => (data: String) => converter(data.toInt)
      case ShortType => (data: String) => converter(data.toShort)
      case LongType => (data: String) => converter(data.toLong)
      case FloatType => (data: String) => converter(data.toFloat)
      case DoubleType => (data: String) => converter(data.toDouble)
      case dt: DecimalType => (data: String) => converter(BigDecimal(data))
      case DateType if conf.datetimeJava8ApiEnabled => (data: String) =>
        converter(DateTimeUtils.stringToDate(
          UTF8String.fromString(data),
          DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
          .map(DateTimeUtils.daysToLocalDate).orNull)
      case DateType => (data: String) =>
        converter(DateTimeUtils.stringToDate(
          UTF8String.fromString(data),
          DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
          .map(DateTimeUtils.toJavaDate).orNull)
      case TimestampType if conf.datetimeJava8ApiEnabled => (data: String) =>
        converter(DateTimeUtils.stringToTimestamp(
          UTF8String.fromString(data),
          DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
          .map(DateTimeUtils.microsToInstant).orNull)
      case TimestampType => (data: String) =>
        converter(DateTimeUtils.stringToTimestamp(
          UTF8String.fromString(data),
          DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
          .map(DateTimeUtils.toJavaTimestamp).orNull)
      case CalendarIntervalType => (data: String) =>
        converter(IntervalUtils.stringToInterval(UTF8String.fromString(data)))
      case dataType: DataType => (data: String) => converter(data)
    }

For current ScriptTrasformation with default serde, we can't support complext datatype (Map/Array/Struct) and
spark's special datatype (TimestampType/CalanderIntervalType) too, in

And after this pr, @alfozan will raise a new pr to add two native SerDe classes (SimpleSerDe for ROW FORMAT DELIMITED and DelimitedJSONSerDe for the JSON variant), In that pr it will support handle Complex data type and spark's own special Type.

One more need to explain is that, in current code, the way to choose to use default or witch serde method is not the final way. it need to discusses more in next pr with spark's own serde.

Why are the changes needed?

Support run scrip in Spark

Does this PR introduce any user-facing change?

User can use script transform without hive support

How was this patch tested?

added UT

AngersZhuuuu · 2020-07-13T10:50:33Z

cc @cloud-fan @maropu @HyukjinKwon

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala

cloud-fan · 2020-07-13T13:03:09Z

can you explain the serde part? How can we do script transformation in sql/core without the hive serde lib?

AngersZhuuuu · 2020-07-13T13:10:11Z

can you explain the serde part? How can we do script transformation in sql/core without the hive serde lib?

In most case, we won't use script with serde, so we can implement script transform in sql/core first only with default format.
You can see the code, when inputSerdeClass & outputSerdeClass is empty, the code script use won't relay on hive.

In our product, we don't have transform script with serde.
And we can discusses if we need to support spark's serde in sql/core.

AngersZhuuuu · 2020-07-13T13:15:10Z

@alfozan Hi, alfozan, I know that in facebook using script transform a lot, in your case, do you will use script transform with serde?

cloud-fan · 2020-07-13T13:30:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala

+/**
+ * The wrapper class of Spark script transformation input and output schema properties
+ */
+case class SparkScriptIOSchema (


Why is this class so big while it doesn't support hive serde?

Why is this class so big while it doesn't support hive serde?

For this , I think we should change this after decide if need to implement serde in script of sql/core

The implementation here offers a very limited support for ROW FORMAT DELIMITED format - it does not rely on a Hive's SerDe class.

A complete implementation (SerDes class for ROW FORMAT DELIMITED) can be added later and will live in the same folder.
#29085 (comment)

cloud-fan · 2020-07-13T13:34:54Z

At least we should define how to convert catalyst values to strings, right? UnsafeArray.toString just gives you meaningless binary string.

AngersZhuuuu · 2020-07-13T14:43:36Z

At least we should define how to convert catalyst values to strings, right? UnsafeArray.toString just gives you meaningless binary string.

So we need to handle string format in BaseScriptTransformationWriterThread.processRowsWithoutSerde() to make sure for each data type, the final data is write

SparkQA · 2020-07-13T15:44:50Z

Test build #125768 has finished for PR 29085 at commit dfcec3c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-13T20:08:30Z

Test build #125775 has finished for PR 29085 at commit a693722.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-14T08:50:20Z

Can we use Cast to turn catalyst value to string and pass to the script?

AngersZhuuuu · 2020-07-14T10:31:40Z

Can we use Cast to turn catalyst value to string and pass to the script?

Nice advise! Updated

alfozan · 2020-07-14T11:39:38Z

@alfozan Hi, alfozan, I know that in facebook using script transform a lot, in your case, do you will use script transform with serde?

@AngersZhuuuu Yes, we implemented two native SerDe classes (SimpleSerDe for ROW FORMAT DELIMITED and DelimitedJSONSerDe for the JSON variant) so there's no longer dependency on Hive's SerDes. I'd be happy to create a PR after this one with SerDe classes.

For more: see https://www.slideshare.net/databricks/powering-custom-apps-at-facebook-using-spark-script-transformation slide 34

cloud-fan · 2020-07-14T11:48:57Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/BaseScriptTransformationSuite.scala

+
+  import spark.implicits._
+
+  var noSerdeIOSchema: BaseScriptTransformIOSchema = _


can we make it a val or def and ask child to override it?

cloud-fan · 2020-07-14T11:50:51Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/BaseScriptTransformationSuite.scala

+      child: SparkPlan,
+      ioschema: BaseScriptTransformIOSchema): BaseScriptTransformationExec = {
+    scriptType.toUpperCase(Locale.ROOT) match {
+      case "SPARK" => new SparkScriptTransformationExec(


instead of asking the child to override scriptType, we should just ask child to implement a method to create BaseScriptTransformationExec.

Then we can move the base suite to sql/core.

instead of asking the child to override scriptType, we should just ask child to implement a method to create BaseScriptTransformationExec.

Then we can move the base suite to sql/core.

There are inherit confit between SparkPlanTest and SQLTestUtils

cc @cloud-fan when move BaseScriptTransformationSuite to sql/core there are method conflict like below, how can I solve this is better?

[error] /Users/angerszhu/Documents/project/AngersZhu/spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala:31: overriding method spark in trait SharedSparkSessionBase of type => org.apache.spark.sql.SparkSession; [error] value spark in trait TestHiveSingleton of type org.apache.spark.sql.SparkSession has weaker access privileges; it should be public [error] class HiveScriptTransformationSuite extends BaseScriptTransformationSuite with TestHiveSingleton { [error] ^ [error] on

cloud-fan · 2020-07-14T11:52:40Z

What's the behavior of hive if the script transformation doesn't specify a serde? Does Hive pick a default serde, or it well defines the behavior of non-serde?

AngersZhuuuu · 2020-07-14T13:03:47Z

@AngersZhuuuu Yes, we implemented two native SerDe classes (SimpleSerDe for ROW FORMAT DELIMITED and DelimitedJSONSerDe for the JSON variant) so there's no longer dependency on Hive's SerDes. I'd be happy to create a PR after this one with SerDe classes.

Yea, that's what I want to do next, glad to hear that you will share your code.

SparkQA · 2020-07-14T14:28:55Z

Test build #125827 has finished for PR 29085 at commit 5bfa669.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AngersZhuuuu · 2020-07-14T14:44:28Z

What's the behavior of hive if the script transformation doesn't specify a serde? Does Hive pick a default serde, or it well defines the behavior of non-serde?

In current code, when we don't write serde with transform, it will use LazySimpleSerde

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

Lines 717 to 723 in d6a68e0

    
           case null => 
        
             // Use default (serde) format. 
        
             val name = conf.getConfString("hive.script.serde", 
        
               "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe") 
        
             val props = Seq("field.delim" -> "\t") 
        
             val recordHandler = Option(conf.getConfString(configKey, defaultConfigValue)) 
        
             (Nil, Option(name), props, recordHandler)

it means only when you write a wrong serde, and ScriptTransformationExec can't find corresponding serde class, it will execute code about

spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala

Lines 236 to 238 in d6a68e0

    
           if (inputSerde == null) { 
        
             processRowsWithoutSerde() 
        
           } else {

spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala

Lines 175 to 187 in d6a68e0

    
           if (outputSerde == null) { 
        
             val prevLine = curLine 
        
             curLine = reader.readLine() 
        
             if (!ioschema.schemaLess) { 
        
               new GenericInternalRow( 
        
                 prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD")) 
        
                   .map(CatalystTypeConverters.convertToCatalyst)) 
        
             } else { 
        
               new GenericInternalRow( 
        
                 prevLine.split(ioschema.outputRowFormatMap("TOK_TABLEROWFORMATFIELD"), 2) 
        
                   .map(CatalystTypeConverters.convertToCatalyst)) 
        
             } 
        
           } else {

As this #29085 (comment) comment, we know that without serde, we can't handle input data string correctly, same reason, we can't handle output data too, so add a wrapper method to convert string to corresponding data type.

In this #29085 (comment) Jenkins result you can see output data type probelm

AngersZhuuuu · 2020-07-14T14:47:06Z

What's the behavior of hive if the script transformation doesn't specify a serde? Does Hive pick a default serde, or it well defines the behavior of non-serde?

In current pr, I just add a temporary method, and wait @alfozan 's spark's serde. As far as I know, DelimitedJSONSerDe can handle complex data type such as Array, Map, Struct.

cloud-fan · 2020-07-14T16:33:21Z

so eventually we don't need to use Cast to convert catalyst value to string? There will always be a serde (default or user-specified).

AngersZhuuuu · 2020-07-14T16:40:28Z

so eventually we don't need to use Cast to convert catalyst value to string? There will always be a serde (default or user-specified).

Yes, but in this step, we should do this to keep data and UT right. need to Cast and Wrapper

SparkQA · 2020-07-14T20:11:17Z

Test build #125840 has finished for PR 29085 at commit ec754e2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-22T16:18:58Z

Test build #126326 has finished for PR 29085 at commit 4615733.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-22T17:18:46Z

Test build #126329 has finished for PR 29085 at commit 08d97c8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-22T18:41:42Z

Test build #126333 has finished for PR 29085 at commit 7916d72.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-22T20:46:33Z

Test build #126340 has finished for PR 29085 at commit a769aa7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-22T21:30:45Z

Test build #126345 has finished for PR 29085 at commit d93f7fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-07-22T23:19:40Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

@@ -744,8 +744,29 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
    selectClause.hints.asScala.foldRight(withWindow)(withHints)
  }

+  // Decode and input/output format.
+  type Format = (Seq[(String, String)], Option[String], Seq[(String, String)], Option[String])


Format -> ScriptIOFormat? Then, could you make the comment above clearer?

maropu · 2020-07-22T23:21:06Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

@@ -1031,4 +1031,96 @@ class PlanParserSuite extends AnalysisTest {
    assertEqual("select a, b from db.c;;;", table("db", "c").select('a, 'b))
    assertEqual("select a, b from db.c; ;;  ;", table("db", "c").select('a, 'b))
  }
+
+  test("SPARK-32106: TRANSFORM without serde") {


TRANSFORM without serde -> TRANSFORM plan?

Also, could you check ROW FORMAT SERDE, too?

Also, could you check ROW FORMAT SERDE, too?

Add UT

maropu · 2020-07-22T23:34:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

+  object SparkScripts extends Strategy {
+    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+      case logical.ScriptTransformation(input, script, output, child, ioschema)
+        if ioschema.inputSerdeClass.isEmpty && ioschema.outputSerdeClass.isEmpty =>


We need to check this here? Seems like it has been checked in https://github.com/apache/spark/pull/29085/files#diff-9847f5cef7cf7fbc5830fbc6b779ee10R783-R784 ?

We need to check this here? Seems like it has been checked in https://github.com/apache/spark/pull/29085/files#diff-9847f5cef7cf7fbc5830fbc6b779ee10R783-R784 ?

Yea, don't need now

maropu · 2020-07-22T23:35:45Z

sql/core/src/test/resources/sql-tests/inputs/transform.sql

+NULL DEFINED AS 'NULL'
+FROM t;
+
+


remove blank.

maropu · 2020-07-22T23:42:14Z

sql/core/src/test/resources/sql-tests/inputs/transform.sql

+NULL DEFINED AS 'NULL'
+FROM t;
+
+-- SPARK-31937 transform with defined row format delimit


This JIRA is related to this query? I read it though, I'm not sure about the relationship. What kind of exceptions does this query throws?

This JIRA is related to this query? I read it though, I'm not sure about the relationship. What kind of exceptions does this query throws?

Test for support Array/Map/Struct
Remove now and add it in that pr:

maropu · 2020-07-22T23:52:09Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala

 import org.apache.spark.sql.execution._
 import org.apache.spark.sql.hive.HiveInspectors
 import org.apache.spark.sql.hive.HiveShim._
-import org.apache.spark.sql.types.DataType
-import org.apache.spark.util.{CircularBuffer, RedirectThread, Utils}
+import org.apache.spark.sql.types.{DataType, StringType}


StringType not used.

maropu · 2020-07-22T23:57:03Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala

-      TaskContext.get(),
-      hadoopConf
-    )
+  private def initSerDe(


Sorry for the confusion, but, on second thought, its better to pull out hive-serde related functions from HiveScriptTransformationExec then create a companion object having them for readability maropu@972775b. WDTY?

Sorry for the confusion, but, on second thought, its better to pull out hive-serde related functions from HiveScriptTransformationExec then create a companion object having them for readability maropu@972775b. WDTY?

Agree, make ScripTransformExec only handle data process.

maropu · 2020-07-23T00:00:25Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala

  }

-  test("SPARK-30973: TRANSFORM should wait for the termination of the script (with serde)") {
+  test("SPARK-32106: TRANSFORM supports complex data types end to end (hive serde) ") {


remove the space in the end.

maropu · 2020-07-23T00:12:13Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala

@@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors {
      case DateType => dateTypeInfo
      case TimestampType => timestampTypeInfo
      case NullType => voidTypeInfo
+      case dt =>
+        throw new AnalysisException("TRANSFORM with hive serde does not support " +


I think HiveInspectors is not related to TRANSFORM, so could you make the error message more general?

maropu · 2020-07-23T00:13:05Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkScriptTransformationSuite.scala

+    }
+  }
+
+  test("TRANSFORM doesn't support ArrayType/MapType/StructType as output data type (no serde)") {


SPARK-32106:

AngersZhuuuu · 2020-07-23T02:18:56Z

All done

maropu · 2020-07-23T02:40:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala

+      case udt: UserDefinedType[_] =>
+        wrapperConvertException(data => udt.deserialize(data), converter)
+      case dt =>
+        throw new SparkException("TRANSFORM without serde does not support " +


nit: TRANSFORM -> s"$nodeName...

nit: TRANSFORM -> s"$nodeName...

Done

maropu · 2020-07-23T02:47:40Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala

@@ -1063,6 +1063,9 @@ private[hive] trait HiveInspectors {
      case DateType => dateTypeInfo
      case TimestampType => timestampTypeInfo
      case NullType => voidTypeInfo
+      case dt =>
+        throw new AnalysisException("HiveInspectors does not support convert " +


nit: s"${dt.catalogString}" cannot be converted to Hive TypeInfo"

nit: s"${dt.catalogString}" cannot be converted to Hive TypeInfo"

same reason like #29085 (comment)

maropu · 2020-07-23T02:48:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala

+        wrapperConvertException(data => udt.deserialize(data), converter)
+      case dt =>
+        throw new SparkException("TRANSFORM without serde does not support " +
+          s"${dt.getClass.getSimpleName} as output data type")


dt.getClass.getSimpleName -> dt.catalogString

dt.getClass.getSimpleName -> dt.catalogString

It is not general, for ArrayType will show array<int>, StructType will show struct<string, int, etc.. >
WDYT?

maropu · 2020-07-23T02:53:01Z

Looks almost okay now, so could you split this PR into pieces? I think its somewhat big fro reviews. For example;

More refactoring PR for HiveScriptTransformationExec and BaseScriptTransfromationExec just like SPARK-32105
PR to improve test coverage of HiveScriptTransformationExec
Then, PR to implement Spark-native TRRANSFORM

WDTY?

AngersZhuuuu · 2020-07-23T03:30:06Z

Looks almost okay now, so could you split this PR into pieces? I think its somewhat big fro reviews. For example;

More refactoring PR for HiveScriptTransformationExec and BaseScriptTransfromationExec just like SPARK-32105

PR to improve test coverage of HiveScriptTransformationExec

Then, PR to implement Spark-native TRRANSFORM

WDTY?

Yea, raise pr one by one.

SparkQA · 2020-07-23T06:34:30Z

Test build #126372 has finished for PR 29085 at commit be80c27.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-23T06:37:32Z

Test build #126376 has finished for PR 29085 at commit 03d3409.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-23T07:05:01Z

Test build #126373 has finished for PR 29085 at commit 7f3cff8.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

alfozan · 2020-07-24T00:58:37Z

3. Then, PR to implement Spark-native TRANSFORM

as mentioned above (adding Spark native SerDes), I'll open a PR once part 1 is merged

cloud-fan · 2020-07-30T10:48:10Z

retest this please

SparkQA · 2020-07-31T03:49:36Z

Test build #126818 has finished for PR 29085 at commit 03d3409.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

# What changes were proposed in this pull request? This PR comes from the comment: #29085 (comment) - Extract common Script IOSchema `ScriptTransformationIOSchema` - avoid repeated judgement extract process output row method `createOutputIteratorWithoutSerde` && `createOutputIteratorWithSerde` - add default no serde IO schemas `ScriptTransformationIOSchema.defaultIOSchema` ### Why are the changes needed? Refactor code ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? NO Closes #29199 from AngersZhuuuu/spark-32105-followup. Authored-by: angerszhu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

HyukjinKwon · 2020-09-04T09:04:38Z

cc @wangyum too

github-actions · 2020-12-14T00:50:27Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

### What changes were proposed in this pull request? * Implement `SparkScriptTransformationExec` based on `BaseScriptTransformationExec` * Implement `SparkScriptTransformationWriterThread` based on `BaseScriptTransformationWriterThread` of writing data * Add rule `SparkScripts` to support convert script LogicalPlan to SparkPlan in Spark SQL (without hive mode) * Add `SparkScriptTransformationSuite` test spark spec case * add test in `SQLQueryTestSuite` And we will close #29085 . ### Why are the changes needed? Support user use Script Transform without Hive ### Does this PR introduce _any_ user-facing change? User can use Script Transformation without hive in no serde mode. Such as : **default no serde ** ``` SELECT TRANSFORM(a, b, c) USING 'cat' AS (a int, b string, c long) FROM testData ``` **no serde with spec ROW FORMAT DELIMITED** ``` SELECT TRANSFORM(a, b, c) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\u0002' MAP KEYS TERMINATED BY '\u0003' LINES TERMINATED BY '\n' NULL DEFINED AS 'null' USING 'cat' AS (a, b, c) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\u0004' MAP KEYS TERMINATED BY '\u0005' LINES TERMINATED BY '\n' NULL DEFINED AS 'NULL' FROM testData ``` ### How was this patch tested? Added UT Closes #29414 from AngersZhuuuu/SPARK-32106-MINOR. Authored-by: angerszhu <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

[SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

dfcec3c

probot-autolabeler bot added the SQL label Jul 13, 2020

cloud-fan reviewed Jul 13, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Jul 13, 2020

View reviewed changes

AngersZhuuuu added 2 commits July 13, 2020 21:51

save

e53744b

save

a693722

AngersZhuuuu mentioned this pull request Jul 14, 2020

[SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause #29087

Closed

follow comment

5bfa669

cloud-fan reviewed Jul 14, 2020

View reviewed changes

fix input and out put format

ec754e2

maropu reviewed Jul 23, 2020

View reviewed changes

address comment

be80c27

Update PlanParserSuite.scala

7f3cff8

maropu reviewed Jul 23, 2020

View reviewed changes

address comment

03d3409

maropu mentioned this pull request Jul 23, 2020

[SPARK-32403][SQL] Refactor current ScriptTransformationExec #29199

Closed

maropu mentioned this pull request Nov 13, 2020

[SPARK-32106][SQL] Implement script transform in sql/core #29414

Closed

github-actions bot added the Stale label Dec 14, 2020

github-actions bot closed this Dec 15, 2020

AngersZhuuuu mentioned this pull request Dec 30, 2020

[SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform #30957

Closed


		import spark.implicits._

		var noSerdeIOSchema: BaseScriptTransformIOSchema = _

[SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core #29085

[SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core #29085

Conversation

AngersZhuuuu commented Jul 13, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AngersZhuuuu commented Jul 13, 2020

cloud-fan commented Jul 13, 2020

AngersZhuuuu commented Jul 13, 2020

AngersZhuuuu commented Jul 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alfozan Jul 14, 2020 • edited Loading

Choose a reason for hiding this comment

cloud-fan commented Jul 13, 2020

AngersZhuuuu commented Jul 13, 2020

SparkQA commented Jul 13, 2020

SparkQA commented Jul 13, 2020

cloud-fan commented Jul 14, 2020

AngersZhuuuu commented Jul 14, 2020

alfozan commented Jul 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Jul 14, 2020

AngersZhuuuu commented Jul 14, 2020

SparkQA commented Jul 14, 2020

AngersZhuuuu commented Jul 14, 2020

AngersZhuuuu commented Jul 14, 2020

cloud-fan commented Jul 14, 2020

AngersZhuuuu commented Jul 14, 2020

SparkQA commented Jul 14, 2020

SparkQA commented Jul 22, 2020

SparkQA commented Jul 22, 2020

SparkQA commented Jul 22, 2020

SparkQA commented Jul 22, 2020

SparkQA commented Jul 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AngersZhuuuu commented Jul 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AngersZhuuuu Jul 23, 2020 • edited Loading

Choose a reason for hiding this comment

maropu commented Jul 23, 2020

AngersZhuuuu commented Jul 23, 2020

SparkQA commented Jul 23, 2020

SparkQA commented Jul 23, 2020

SparkQA commented Jul 23, 2020

alfozan commented Jul 24, 2020

cloud-fan commented Jul 30, 2020

SparkQA commented Jul 31, 2020

HyukjinKwon commented Sep 4, 2020

github-actions bot commented Dec 14, 2020

AngersZhuuuu commented Jul 13, 2020 •

edited

Loading

alfozan Jul 14, 2020 •

edited

Loading

alfozan commented Jul 14, 2020 •

edited

Loading

AngersZhuuuu Jul 23, 2020 •

edited

Loading