Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-2397] Add --enable-sync parameter #3608

Merged
merged 4 commits into from
Sep 13, 2021
Merged

[HUDI-2397] Add --enable-sync parameter #3608

merged 4 commits into from
Sep 13, 2021

Conversation

kination
Copy link
Contributor

@kination kination commented Sep 6, 2021

Tips

What is the purpose of the pull request

HoodieMultiTableDeltaStreamer is just a wrapper on top of HoodieDeltaStreamer and --enable-sync parameter needs to be introduced there so that both the streamer classes are in sync.

Verify this pull request

  • Add enableMetaSync config for HoodieMultiTableDeltaStreamer, to sync with HoodieDeltaStreamer

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hudi-bot
Copy link

hudi-bot commented Sep 6, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@kination
Copy link
Contributor Author

kination commented Sep 6, 2021

@pratyakshsharma is enableMetaSync just a replacement of enableHiveSync, or is there some other changes?

if (cfg.enableHiveSync && StringUtils.isNullOrEmpty(tableProperties.getString(DataSourceWriteOptions.HIVE_TABLE().key(), ""))) {
        throw new HoodieException("Hive sync table field not provided!");
      }

=> Is it okay to replace this to enableMetaSync?

@vinothchandar
Copy link
Member

@pratyakshsharma there are quite a few requests around the multi delta streamer. Are you still interested in helping improve it and be point person for all feature contribution? please let me know

@pratyakshsharma
Copy link
Contributor

@pratyakshsharma there are quite a few requests around the multi delta streamer. Are you still interested in helping improve it and be point person for all feature contribution? please let me know

Big yes to that :) . Please let me know if I can help for anything.

@pratyakshsharma
Copy link
Contributor

@djKooks Thank you for raising this. Will check this today.

Copy link
Contributor

@pratyakshsharma pratyakshsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to replace this to enableMetaSync?

Yes we can replace it with enableMetaSync here.

Small comments, rest looks good. :)

@@ -138,7 +139,7 @@ public void testMultiTableExecutionWithKafkaSource() throws IOException {
testUtils.sendMessages(topicName1, Helpers.jsonifyRecords(dataGenerator.generateInsertsAsPerSchema("000", 5, HoodieTestDataGenerator.TRIP_SCHEMA)));
testUtils.sendMessages(topicName2, Helpers.jsonifyRecords(dataGenerator.generateInsertsAsPerSchema("000", 10, HoodieTestDataGenerator.SHORT_TRIP_SCHEMA)));

HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", JsonKafkaSource.class.getName(), false);
HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", JsonKafkaSource.class.getName(), false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us keep enableMetaSync as false where enableHiveSync is also false? Otherwise it might lead to confusion.

@@ -187,7 +188,7 @@ public void testMultiTableExecutionWithParquetSource() throws IOException {
// add only common props. later we can add per table props
String parquetPropsFile = populateCommonPropsAndWriteToFile();

HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(parquetPropsFile, dfsBasePath + "/config", ParquetDFSSource.class.getName(), false,
HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(parquetPropsFile, dfsBasePath + "/config", ParquetDFSSource.class.getName(), false, true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -218,7 +219,7 @@ public void testMultiTableExecutionWithParquetSource() throws IOException {

@Test
public void testTableLevelProperties() throws IOException {
HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", TestDataSource.class.getName(), false);
HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", TestDataSource.class.getName(), false, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@kination kination marked this pull request as ready for review September 12, 2021 06:02
@kination
Copy link
Contributor Author

@pratyakshsharma thanks for comment. Updated 🙏

@pratyakshsharma
Copy link
Contributor

LGTM. Will merge it once CI passes.

Thank you for your contribution @djKooks

@kination
Copy link
Contributor Author

kination commented Sep 13, 2021

@pratyakshsharma thanks for help~please share if there's other features needs to be improved/fixed 🙏
(will keep look-on the project)

@pratyakshsharma
Copy link
Contributor

pratyakshsharma commented Sep 13, 2021

@pratyakshsharma thanks for help~please share if there's other features needs to be improved/fixed pray
(will keep look-on the project)

Definitely! You can search for jiras with newbie label and pick them up one by one :)

@pratyakshsharma pratyakshsharma merged commit c79017c into apache:master Sep 13, 2021
@kination kination deleted the jung/multi-table-streamer-enablesync branch September 13, 2021 06:36
@vinothchandar
Copy link
Member

@pratyakshsharma This broke master due to a failing test. Please be careful about merging. by ensuring the CI in azure actually passes before you land.

https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=2180&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf5243b02e7

#3654 I am going to land this soon, which I think fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants