update to new #8

fengjian428 · 2022-05-21T13:52:48Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

…exit gracefully (#4264)

Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.

…ontinuous mode (#5073) - Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever. - Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.

…uration (#5287)

…able (#5527)

* [MINOR] Fixing close for HoodieCatalog's test

#5526) * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized Co-authored-by: xicm <[email protected]>

…#5462) - Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen. - Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord. - Other minor fixes around using static values instead of looking up hashmap.

…e … (#5516) Co-authored-by: aliceyyan <[email protected]>

…5497) - getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost. Co-authored-by: sivabalan <[email protected]>

…odieDeltaStreamer (#5559)

…g Hoodie Writing Efficiency. (#5562) Co-authored-by: yuezhang <[email protected]>

…s. Added delete partition support to integ tests (#5501) - Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it. - Added delete_partition support to integ test framework using spark-datasource. - Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions) - Added tests for 4 concurrent spark datasource writers (multi-writer tests). - Fixed readme w/ sample commands for multi-writer.

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

#5545) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

Co-authored-by: wqwl611 <[email protected]>

- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <[email protected]>

…e for Spark SQL (#5495)

#4480) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index

…artbeat 0 (#5583)

… create table for Spark SQL

* [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <[email protected]>

…ice when deciding small buckets (#5594)

…ter dispersion (#5590)

#5564) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand * Set hoodie.query.as.ro.table in serde properties

…ntiation (#5600)

#5603)

Co-authored-by: [email protected] <loukey_7821>

…5314)" (#5622) This reverts commit 6f9b02d.

…pe (#5620) * Unify clustering/compaction related procedures' output type * Address review comments

…nitTable (#5617) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested.

…ka connector is used in HUDi (#5626) * HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi Co-authored-by: aliceyyan <[email protected]>

…k datasource table (#5532)

…ush in integ test (#5646)

Co-authored-by: yuezhang <[email protected]>

qianchutao and others added 30 commits May 5, 2022 09:33

[MINOR] Optimize code logic (#5499)

d794f4f

[HUDI-2875] Make HoodieParquetWriter Thread safe and memory executor …

abb4893

…exit gracefully (#4264)

[HUDI-4042] Support truncate-partition for Spark-3.2 (#5506)

248b059

[HUDI-4017] Improve spark sql coverage in CI (#5512)

c319ee9

Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.

[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ config…

9625d16

…uration (#5287)

[MINOR] Fixing class not found when using flink and enable metadata t…

80f9989

…able (#5527)

[MINOR] fixing flaky tests in deltastreamer tests (#5521)

569a76a

[HUDI-4055]refactor ratelimiter to avoid stack overflow (#5530)

75eaa0b

[MINOR] Fixing close for HoodieCatalog's test (#5531)

4c70840

* [MINOR] Fixing close for HoodieCatalog's test

[HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOpti… (

6b47ef6

#5526) * [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized Co-authored-by: xicm <[email protected]>

[HUDI-4044] When reading data from flink-hudi to external storage, th…

6fd21d0

…e … (#5516) Co-authored-by: aliceyyan <[email protected]>

[HUDI-4003] Try to read all the log file to parse schema (#5473)

4258a71

[HUDI-4079] Supports showing table comment for hudi with spark3 (#5546)

7f0c1f3

[HUDI-4085] Fixing flakiness with parquet empty batch tests in TestHo…

b10ca7e

…odieDeltaStreamer (#5559)

[HUDI-3963][Claim RFC number 53] Use Lock-Free Message Queue Improvin…

ecd47e7

…g Hoodie Writing Efficiency. (#5562) Co-authored-by: yuezhang <[email protected]>

[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5528)

701f8c0

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

[MINOR] Fix a NPE for Option (#5461)

8ad0bb9

[HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compact… (

7fb436d

#5545) * [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files

[HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink (#5574)

a704e37

* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink

[HUDI-4072] Fix NULL schema for empty batches in deltastreamer (#5543)

5c4813f

[HUDI-4097] add table info to jobStatus (#5529)

52e63b3

Co-authored-by: wqwl611 <[email protected]>

[HUDI-3980] Suport kerberos hbase index (#5464)

6e16e71

- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection. Co-authored-by: xicm <[email protected]>

[HUDI-4001] Filter the properties should not be used when create tabl…

75f8476

…e for Spark SQL (#5495)

fix hive sync no partition table error (#5585)

1fded18

[HUDI-3123] consistent hashing index: basic write path (upsert/insert) (

61030d8

#4480) 1. basic write path(insert/upsert) implementation 2. adapt simple bucket index

[HUDI-4098] Metadata table heartbeat for instant has expired, last he…

43e0819

…artbeat 0 (#5583)

dongkelun and others added 22 commits May 16, 2022 23:26

[HUDI-4103] [HUDI-4001] Filter the properties should not be used when…

a7a42e4

… create table for Spark SQL

[HUDI-3654] Preparations for hudi metastore. (#5572)

ad773b3

* [HUDI-3654] Preparations for hudi metastore. Co-authored-by: gengxiaoyu <[email protected]>

[HUDI-4104] DeltaWriteProfile includes the pending compaction file sl…

fdd96cc

…ice when deciding small buckets (#5594)

[HUDI-4101] BucketIndexPartitioner should take partition path for bet…

d52d133

…ter dispersion (#5590)

[HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand (

d422f69

#5564) * [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand * Set hoodie.query.as.ro.table in serde properties

[HUDI-4110] Clean the marker files for flink compaction (#5604)

99555c8

[MINOR] Fixing spark long running yaml for non-partitioned (#5607)

f8b9399

[minor] Some code refactoring for LogFileComparator and Instant insta…

ebbe56e

…ntiation (#5600)

[HUDI-4109] Copy the old record directly when it is chosen for merging (

f1f8a1a

#5603)

Clean the marker files for flink compaction (#5611)

a1017c6

Co-authored-by: [email protected] <loukey_7821>

[HUDI-3942] [RFC-50] Improve Timeline Server (#5392)

008616c

[HUDI-4111] Bump ANTLR runtime version in Spark 3.x (#5606)

199f642

Revert "[HUDI-3870] Add timeout rollback for flink online compaction (#…

551aa95

…5314)" (#5622) This reverts commit 6f9b02d.

[HUDI-4116] Unify clustering/compaction related procedures' output ty…

6573469

…pe (#5620) * Unify clustering/compaction related procedures' output type * Address review comments

[HUDI-4114] Remove the unnecessary fs view sync for BaseWriteClient#i…

6f37863

…nitTable (#5617) No need to #sync actively because the table instance is instantiated freshly, its view manager has empty fiew instantces, the fs view would be synced lazily when is it requested.

[HUDI-4119] the first read result is incorrect when Flink upsert- Kaf…

1da0b21

…ka connector is used in HUDi (#5626) * HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi Co-authored-by: aliceyyan <[email protected]>

[HUDI-4130] Remove the upgrade/downgrade for flink #initTable (#5642)

c7576f7

[HUDI-3985] Refactor DLASyncTool to support read hoodie table as spar…

85b146d

…k datasource table (#5532)

[MINOR] Minor fixes to exception log and removing unwanted metrics fl…

7d02b1f

…ush in integ test (#5646)

[HUDI-4122] Fix NPE caused by adding kafka nodes (#5632)

2af9830

[MINOR] remove unused gson test dependency (#5652)

b5adba3

[HUDI-3858] Shade javax.servlet for Spark bundle jar (#5295)

8ec625d

Co-authored-by: yuezhang <[email protected]>

fengjian428 merged commit ee3559c into fengjian428:master May 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update to new #8

update to new #8

fengjian428 commented May 21, 2022

update to new #8

update to new #8

Conversation

fengjian428 commented May 21, 2022

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist