Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string #4987

Merged
merged 3 commits into from
Mar 15, 2022

Conversation

wangxianghu
Copy link
Contributor

@wangxianghu wangxianghu commented Mar 8, 2022

What is the purpose of the pull request

Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string

Brief change log

Verify this pull request

This change can be verified by
org.apache.hudi.utilities.sources.TestJsonKafkaSourcePostProcessor#testMaxwellJsonKafkaSourcePostProcessor

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@wangxianghu wangxianghu changed the title [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string [HUDI-3547] [WIP] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string Mar 8, 2022
@nsivabalan nsivabalan self-assigned this Mar 8, 2022
@wangxianghu wangxianghu marked this pull request as draft March 10, 2022 13:50
@wangxianghu wangxianghu changed the title [HUDI-3547] [WIP] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string Mar 11, 2022
@wangxianghu wangxianghu marked this pull request as ready for review March 11, 2022 10:34
@wangxianghu
Copy link
Contributor Author

@hudi-bot run azure

@wangxianghu wangxianghu marked this pull request as draft March 11, 2022 18:56
@wangxianghu wangxianghu force-pushed the HUDI-3547 branch 2 times, most recently from 5dd4d57 to e44cfa5 Compare March 12, 2022 14:11
@wangxianghu wangxianghu marked this pull request as ready for review March 12, 2022 14:12
@wangxianghu wangxianghu force-pushed the HUDI-3547 branch 2 times, most recently from f8502b4 to 0f23ca9 Compare March 12, 2022 14:28
@wangxianghu
Copy link
Contributor Author

@hudi-bot run azure

@wangxianghu
Copy link
Contributor Author

hi @XuQianJin-Stars, can you help review this ?

boolean isDelete = record.get(HoodieRecord.HOODIE_IS_DELETED).booleanValue();

assertFalse(isDelete);
assertNull(database);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the database null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the database null?

Because the processor have extract the content of data field as the result, field database doesn't belong to data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the database null?

Because the processor have extract the content of data field as the result, field database doesn't belong to data

There is a problem. If Maxwell is used to parse multiple tables in the same database at the same time, how can we distinguish between database and table if it is not parsed in record?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can config hoodie.deltastreamer.source.json.kafka.post.processor.maxwell.database.regex and hoodie.deltastreamer.source.json.kafka.post.processor.maxwell.table.regex to filter out the right database and table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original data -> database and table regex -> extract data -> tag delete or not (some more process within delete)-> return

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I see.

@XuQianJin-Stars
Copy link
Contributor

This PR is very good overall.

// delete
} else if (DELETE.equals(type)) {
// tag this record as delete.
result.put(HoodieRecord.HOODIE_IS_DELETED, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete logic be put into a method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete logic be put into a method?

done

@XuQianJin-Stars
Copy link
Contributor

+1, when CI is success.

@wangxianghu
Copy link
Contributor Author

@hudi-bot run azure

1 similar comment
@wangxianghu
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@wangxianghu wangxianghu merged commit 3b59b76 into apache:master Mar 15, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
… Maxwell json string (apache#4987)

* [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string

* add ut

* Address comment
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022
… Maxwell json string (apache#4987)

* [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string

* add ut

* Address comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants