[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test #2793

EricJoy2048 · 2022-09-19T07:59:20Z

SeaTunnel version: dev
Hadoop version: Hadoop 2.10.2
Flink version: 1.12.7
Spark version: 2.4.3, scala version 2.11.12

Problems found to be fixed

Hive Source Connector

1. test text file format table

create table test_hive_source(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE,
     test_array                            ARRAY<INT>,
     test_map                              MAP<STRING, FLOAT>,
     test_struct                           STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING);

// insert 10 rows in the table partition(test_par1='par1', test_par2='par2')
insert into table test_hive_source partition(test_par1='par1', test_par2='par2') select 
1 as test_tinyint,
1 as test_smallint,
100 as test_int,
40000000000 as test_bigint,
true as test_boolean,
1.01 as test_float,
1.002 as test_double,
'gan' as test_string,
'DataDataData' as test_binary,
current_timestamp() as test_timestamp,
83.2 as test_decimal,
'char64' as test_char,
'varchar64' as test_varchar,
cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as date) as test_date,
array(1,2) as test_array,
map("name",cast('1.11' as float),"age",cast('1.11' as float)) as test_map,
NAMED_STRUCT('street', 'London', 'city','W1a9JF','state','Finished','zip', 123) as test_struct;

// insert 10 rows in the table partition(test_par1='par1', test_par2='par2_1')
insert into table test_hive_source partition(test_par1='par1', test_par2='par2_1') select 
test_tinyint,
 test_smallint,
test_int,
test_bigint,
test_boolean,
test_float,
test_double,
test_string,
test_binary,
test_timestamp,
test_decimal,
test_char,
test_varchar,
test_date,
test_array,
test_map,
test_struct from test_hive_source;

// insert 20 rows in the table partition(test_par1='par1_1', test_par2='par2_2')
insert into table test_hive_source partition(test_par1='par1_1', test_par2='par2_2') select 
test_tinyint,
 test_smallint,
test_int,
test_bigint,
test_boolean,
test_float,
test_double,
test_string,
test_binary,
test_timestamp,
test_decimal,
test_char,
test_varchar,
test_date,
test_array,
test_map,
test_struct from test_hive_source;

Total rows:40

10      par1    par2
10      par1    par2_1
20      par1_1  par2_2

1.1 Test in Flink Engine

1.1.1 Job Config File

env {
  # You can set flink configuration here
  execution.parallelism = 3
  job.name="test_hive_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

1.1.2 Submit Job Command

sh start-seatunnel-flink-connector-v2.sh --config ../config/flink_hive_to_console.conf

1.1.3 Check Result in Flink Job log

The fields in data is lost, it can be fix in the feature #2473
The rows is right.

1.2 Test in Spark Engine

1.2.1 Job Config File

env {
  # You can set flink configuration here
  source.parallelism = 3
  job.name="test_hive_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

1.2.2 Submit Job Command --deploy-mode CLIENT --master local

sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hive_to_console.conf --deploy-mode client --master local

1.2.3 Check data result

The fields in data is lost, it can be fix in the feature.
The rows is right.

1.2.4 Submit Job Command --deploy-mode client --master yarn

sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hive_to_console.conf --deploy-mode client --master yarn

1.2.5 Check data result

2. test orc file format table

create table test_hive_source_orc(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE,
     test_array                            ARRAY<INT>,
     test_map                              MAP<STRING, FLOAT>,
     test_struct                           STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING)
stored as orcfile;

The test data is same as text file format table.

2.1 Test in Flink engine

2.1.1 Job Config File

env {
  # You can set flink configuration here
  execution.parallelism = 3
  job.name="test_hiveorc_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source_orc"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

2.1.2 Submit Job Command

 sh start-seatunnel-flink-connector-v2.sh --config ../config/flink_hiveorc_to_console.conf

2.1.3 Check Data Result

The fields in data is right.

The rows is right.

2.2 Test in Spark Engine

2.2.1 Job Config File

env {
  # You can set flink configuration here
  source.parallelism = 3
  job.name="test_hiveorc_source_to_console"
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**

  Hive {
    table_name = "test_hive.test_hive_source_orc"
    metastore_uri = "thrift://ctyun7:9083"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/source-plugins/Fake
}

transform {

}

sink {
  # choose stdout output plugin to output data to console
  Console {
  }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/flink/configuration/sink-plugins/Console
}

2.2.2 Submit Job Command

sh start-seatunnel-spark-connector-v2.sh --config ../config/spark_hiveorc_to_console.conf --deploy-mode client --master local

2.2.3 Check Data Result

3. test parquet file format table

create table test_hive_source_parquet(
     test_tinyint                          TINYINT,
     test_smallint                       SMALLINT,
     test_int                                INT,
     test_bigint                           BIGINT,
     test_boolean                       BOOLEAN,
     test_float                             FLOAT,
     test_double                         DOUBLE,
     test_string                           STRING,
     test_binary                          BINARY,
     test_timestamp                  TIMESTAMP,
     test_decimal                       DECIMAL(8,2),
     test_char                             CHAR(64),
     test_varchar                        VARCHAR(64),
     test_date                             DATE,
     test_array                            ARRAY<INT>,
     test_map                              MAP<STRING, FLOAT>,
     test_struct                           STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
     )
PARTITIONED BY (test_par1 STRING, test_par2 STRING)
stored as PARQUET;

The text was updated successfully, but these errors were encountered:

john8628 · 2022-09-20T01:38:01Z

please assign it to me , I will try it ;

EricJoy2048 · 2022-09-20T01:54:55Z

please assign it to me , I will try it ;

Which issue do you want to work?

I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.

john8628 · 2022-09-20T05:57:10Z

please assign it to me , I will try it ;

Which issue do you want to work?

I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.

I want to work for #2792 ,but it seems more difficult for me ;

EricJoy2048 · 2022-09-20T06:14:40Z

please assign it to me , I will try it ;

Which issue do you want to work?
I do the test now and I will record all the problems found in the test and add todo list here. You can receive the issue you want to handle.

I want to work for #2792 ,but it seems more difficult for me ;

I fix #2792 already. You can look at other tasks. We have many issues marked as good first issue and want help.

github-actions · 2022-11-19T00:26:45Z

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions · 2022-11-29T00:25:51Z

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

EricJoy2048 added help wanted good first issue good first issue and removed help wanted good first issue good first issue labels Sep 19, 2022

TyrantLucifer assigned john8628 Sep 20, 2022

EricJoy2048 mentioned this issue Sep 20, 2022

[Bug] [Connector-V2] Hive Source Connector read text table error. #2799

Closed

3 tasks

This was referenced Sep 20, 2022

[Bug] [Connector-V2] Hive Source Connector parallelism not work #2812

Closed

[Bug] [Connector-V2] Hive Source Connector test in Flink Engine error when set execution.parallelism = 3 #2815

Closed

EricJoy2048 changed the title ~~[Test] [Test Hive Connector V2] Test the Hive Connector V2 and record the problems encountered in the test~~ [Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test Sep 20, 2022

EricJoy2048 mentioned this issue Sep 21, 2022

[Bug] [Connector-V2] HiveSource Connector read orc file format table error in Spark Engine #2837

Closed

3 tasks

EricJoy2048 assigned EricJoy2048 and unassigned john8628 Sep 22, 2022

EricJoy2048 added the test label Sep 22, 2022

EricJoy2048 added this to the 2.2.0 milestone Sep 23, 2022

This was referenced Sep 24, 2022

[Test] [Test Hive Connector V2] Test the Hive Sink Connector V2 and record the problems encountered in the test #2867

Closed

[E2E][Connector-V2] Improve Hive Source Connector E2E #2894

Closed

github-actions bot added the stale label Nov 19, 2022

github-actions bot closed this as completed Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test #2793

[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test #2793

EricJoy2048 commented Sep 19, 2022 •

edited

Loading

john8628 commented Sep 20, 2022

EricJoy2048 commented Sep 20, 2022

john8628 commented Sep 20, 2022

EricJoy2048 commented Sep 20, 2022

github-actions bot commented Nov 19, 2022

github-actions bot commented Nov 29, 2022

[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test #2793

[Test] [Test Hive Connector V2] Test the Hive Source Connector V2 and record the problems encountered in the test #2793

Comments

EricJoy2048 commented Sep 19, 2022 • edited Loading

Problems found to be fixed

Hive Source Connector

1. test text file format table

1.1 Test in Flink Engine

1.1.1 Job Config File

1.1.2 Submit Job Command

1.1.3 Check Result in Flink Job log

1.2 Test in Spark Engine

1.2.1 Job Config File

1.2.2 Submit Job Command --deploy-mode CLIENT --master local

1.2.3 Check data result

1.2.4 Submit Job Command --deploy-mode client --master yarn

1.2.5 Check data result

2. test orc file format table

2.1 Test in Flink engine

2.1.1 Job Config File

2.1.2 Submit Job Command

2.1.3 Check Data Result

2.2 Test in Spark Engine

2.2.1 Job Config File

2.2.2 Submit Job Command

2.2.3 Check Data Result

3. test parquet file format table

john8628 commented Sep 20, 2022

EricJoy2048 commented Sep 20, 2022

john8628 commented Sep 20, 2022

EricJoy2048 commented Sep 20, 2022

github-actions bot commented Nov 19, 2022

github-actions bot commented Nov 29, 2022

EricJoy2048 commented Sep 19, 2022 •

edited

Loading