[Bug] [Connector-V2] HiveSource Connector read orc file format table error in Spark Engine #2837

EricJoy2048 · 2022-09-21T13:11:27Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

The test table schema and test data can found in #2793.

Test job error.

SeaTunnel Version

dev

SeaTunnel Config

env {
  # You can set flink configuration here
  source.parallelism = 3
  job.name="test_hiveorc_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source_orc"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

Running Command

sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hiveorc_to_console.conf --deploy-mode client --master local

Error Exception

22/09/21 21:07:54 INFO hive.metastore: Trying to connect to metastore with URI thrift://ctyun7:9083
22/09/21 21:07:54 INFO hive.metastore: Connected to metastore.
22/09/21 21:07:55 INFO impl.OrcCodecPool: Got brand-new codec ZLIB
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.orc.Reader.close()V
        at org.apache.seatunnel.connectors.seatunnel.file.source.reader.OrcReadStrategy.getSeaTunnelRowTypeInfo(OrcReadStrategy.java:125)
        at org.apache.seatunnel.connectors.seatunnel.file.hdfs.source.HdfsFileSource.prepare(HdfsFileSource.java:70)
        at org.apache.seatunnel.connectors.seatunnel.hive.source.HiveSource.prepare(HiveSource.java:82)
        at org.apache.seatunnel.core.starter.spark.execution.SourceExecuteProcessor.initializePlugins(SourceExecuteProcessor.java:78)
        at org.apache.seatunnel.core.starter.spark.execution.AbstractPluginExecuteProcessor.<init>(AbstractPluginExecuteProcessor.java:49)
        at org.apache.seatunnel.core.starter.spark.execution.SourceExecuteProcessor.<init>(SourceExecuteProcessor.java:49)
        at org.apache.seatunnel.core.starter.spark.execution.SparkExecution.<init>(SparkExecution.java:48)
        at org.apache.seatunnel.core.starter.spark.command.SparkApiTaskExecuteCommand.execute(SparkApiTaskExecuteCommand.java:53)
        at org.apache.seatunnel.core.starter.Seatunnel.run(Seatunnel.java:40)
        at org.apache.seatunnel.core.starter.spark.SeatunnelSpark.main(SeatunnelSpark.java:34)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/09/21 21:07:56 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/09/21 21:07:56 INFO server.AbstractConnector: Stopped Spark@66c9697e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/09/21 21:07:56 INFO ui.SparkUI: Stopped Spark web UI at http://ctyun9:4040
22/09/21 21:07:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/09/21 21:07:56 INFO memory.MemoryStore: MemoryStore cleared
22/09/21 21:07:56 INFO storage.BlockManager: BlockManager stopped
22/09/21 21:07:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/09/21 21:07:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/09/21 21:07:56 INFO spark.SparkContext: Successfully stopped SparkContext
22/09/21 21:07:56 INFO util.ShutdownHookManager: Shutdown hook called
22/09/21 21:07:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c7c49a7f-014d-4ad9-b0a4-a1ee61b00457
22/09/21 21:07:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bf3a9487-0333-412f-8b30-ff899ec46ec8

Flink or Spark Version

Spark version: 2.4.3, scala version 2.11.12
Hadoop version: Hadoop 2.10.2

Java or Scala Version

JDK 1.8

Screenshots

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

EricJoy2048 added the bug label Sep 21, 2022

EricJoy2048 mentioned this issue Sep 21, 2022

[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test #2793

Closed

12 tasks

EricJoy2048 assigned TyrantLucifer Sep 21, 2022

EricJoy2048 mentioned this issue Sep 22, 2022

[Fix][Connector-V2] Fix HiveSource Connector read orc table error #2845

Merged

Hisoka-X closed this as completed in #2845 Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [Connector-V2] HiveSource Connector read orc file format table error in Spark Engine #2837

[Bug] [Connector-V2] HiveSource Connector read orc file format table error in Spark Engine #2837

EricJoy2048 commented Sep 21, 2022

[Bug] [Connector-V2] HiveSource Connector read orc file format table error in Spark Engine #2837

[Bug] [Connector-V2] HiveSource Connector read orc file format table error in Spark Engine #2837

Comments

EricJoy2048 commented Sep 21, 2022

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct