-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string #4987
Conversation
@hudi-bot run azure |
… Maxwell json string
5dd4d57
to
e44cfa5
Compare
f8502b4
to
0f23ca9
Compare
@hudi-bot run azure |
hi @XuQianJin-Stars, can you help review this ? |
boolean isDelete = record.get(HoodieRecord.HOODIE_IS_DELETED).booleanValue(); | ||
|
||
assertFalse(isDelete); | ||
assertNull(database); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the database null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the database null?
Because the processor have extract the content of data
field as the result, field database
doesn't belong to data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the database null?
Because the processor have extract the content of
data
field as the result, fielddatabase
doesn't belong todata
There is a problem. If Maxwell is used to parse multiple tables in the same database at the same time, how can we distinguish between database and table if it is not parsed in record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can config hoodie.deltastreamer.source.json.kafka.post.processor.maxwell.database.regex
and hoodie.deltastreamer.source.json.kafka.post.processor.maxwell.table.regex
to filter out the right database and table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original data -> database and table regex -> extract data -> tag delete or not (some more process within delete)-> return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I see.
This PR is very good overall. |
// delete | ||
} else if (DELETE.equals(type)) { | ||
// tag this record as delete. | ||
result.put(HoodieRecord.HOODIE_IS_DELETED, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can delete logic be put into a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can delete logic be put into a method?
done
+1, when CI is success. |
@hudi-bot run azure |
1 similar comment
@hudi-bot run azure |
… Maxwell json string (apache#4987) * [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string * add ut * Address comment
… Maxwell json string (apache#4987) * [HUDI-3547] Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string * add ut * Address comment
What is the purpose of the pull request
Introduce MaxwellSourcePostProcessor to extract data from Maxwell json string
Brief change log
Verify this pull request
This change can be verified by
org.apache.hudi.utilities.sources.TestJsonKafkaSourcePostProcessor#testMaxwellJsonKafkaSourcePostProcessor
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.