Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to new #8

Merged
merged 52 commits into from
May 21, 2022
Merged

update to new #8

merged 52 commits into from
May 21, 2022

Conversation

fengjian428
Copy link
Owner

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

qianchutao and others added 30 commits May 5, 2022 09:33
Add GitHub actions tasks to run spark sql UTs under spark 3.1 and 3.2.
…ontinuous mode (#5073)

- Added a postWriteTerminationStrategy to deltastreamer continuous mode. One can enable by setting the appropriate termination strategy using DeltastreamerConfig.postWriteTerminationStrategyClass. If not, continuous mode is expected to run forever.
- Added one concrete impl for termination strategy as NoNewDataTerminationStrategy which shuts down deltastreamer if there is no new data to consume from source for N consecutive rounds.
* [MINOR] Fixing close for HoodieCatalog's test
#5526)

* [HUDI-4053] Flaky ITTestHoodieDataSource.testStreamWriteBatchReadOptimized

Co-authored-by: xicm <[email protected]>
…#5462)

- Avoid using udf for key generator for SimpleKeyGen and NonPartitionedKeyGen.
- Fixed NonPartitioned Key generator to directly fetch record key from row rather than involving GenericRecord.
- Other minor fixes around using static values instead of looking up hashmap.
…5497)

- getDataSize has non-trivial overhead in the current ParquetWriter impl, requiring traversal of already composed Column Groups in memory. Instead we can sample these calls to getDataSize to amortize its cost.

Co-authored-by: sivabalan <[email protected]>
…g Hoodie Writing Efficiency. (#5562)


Co-authored-by: yuezhang <[email protected]>
…s. Added delete partition support to integ tests (#5501)

- Added pure immutable test yamls to integ test framework. Added SparkBulkInsertNode as part of it.
- Added delete_partition support to integ test framework using spark-datasource.
- Added a single yaml to test all non core write operations (insert overwrite, insert overwrite table and delete partitions)
- Added tests for 4 concurrent spark datasource writers (multi-writer tests).
- Fixed readme w/ sample commands for multi-writer.
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
#5545)

* [HUDI-4078][HUDI-FLINK]BootstrapOperator contains the pending compaction files
* [HUDI-3336][HUDI-FLINK]Support custom hadoop config for flink
- Add configurations in HoodieHBaseIndexConfig.java to support kerberos hbase connection.

Co-authored-by: xicm <[email protected]>
#4480)

 1. basic write path(insert/upsert) implementation
 2. adapt simple bucket index
dongkelun and others added 22 commits May 16, 2022 23:26
* [HUDI-3654] Preparations for hudi metastore.

Co-authored-by: gengxiaoyu <[email protected]>
#5564)

* [HUDI-4087] Support dropping RO and RT table in DropHoodieTableCommand

* Set hoodie.query.as.ro.table in serde properties
…pe (#5620)

* Unify clustering/compaction related procedures' output type

* Address review comments
…nitTable (#5617)

No need to #sync actively because the table instance is instantiated freshly,
its view manager has empty fiew instantces, the fs view would be synced lazily when
is it requested.
…ka connector is used in HUDi (#5626)

* HUDI-4119 the first read result is incorrect when Flink upsert- Kafka connector is used in HUDi

Co-authored-by: aliceyyan <[email protected]>
@fengjian428 fengjian428 merged commit ee3559c into fengjian428:master May 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.