Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cdc-pipeline-connector][doris] Introduce Doris cdc pipeline DataSink #2729

Closed
wants to merge 26 commits into from

Conversation

JNSimba
Copy link
Member

@JNSimba JNSimba commented Nov 22, 2023

This closes #2646
add a DorisDataSink that implement interface of DataSink to build a pipeline.

private void applyAddColumnEvent(AddColumnEvent event) throws IOException, IllegalArgumentException {
TableId tableId = event.tableId();
List<AddColumnEvent.ColumnWithPosition> addedColumns = event.getAddedColumns();
for(AddColumnEvent.ColumnWithPosition col: addedColumns){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some kinds of ColumnPosition, Does here deal with LAST type only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because adding a value column in Doris does not support adding it to the middle of multiple Key columns.
At the same time, the default is to use JSON format to import. The order of the columns will not affect the data quality. The value column can be directly appended to the end.

wudi added 5 commits November 23, 2023 14:33
# Conflicts:
#	flink-cdc-common/src/main/java/com/ververica/cdc/common/schema/Schema.java
@JNSimba
Copy link
Member Author

JNSimba commented Nov 23, 2023

Addressed the comments. PTAL @lvyanquan

@gtk96
Copy link
Contributor

gtk96 commented Nov 24, 2023

@JNSimba Run 'mvn spotless:apply' to fix these violations.

@JNSimba
Copy link
Member Author

JNSimba commented Nov 24, 2023

@JNSimba Run 'mvn spotless:apply' to fix these violations.

Thanks for the reminder, it has been processed @gtk96

@lvyanquan
Copy link
Contributor

#2734 removes GenericRecordData and adds BinaryRecordData, so we can get Objects directly from RecordData and need to use FieldGetter. I've raised a pr to give an example to use it in ValuesDataSink.

/** A serializer for Event to Tuple2<String, byte[]> */
public class DorisEventSerializer implements DorisRecordSerializer<Event> {
private ObjectMapper objectMapper = new ObjectMapper();
private Map<TableId, Schema> schemaMaps = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A troublesome problem is that we need to maintain the schemaMaps in State to recover from failure, so we need to add a subclass of DorisWriter or DorisBatchWriter to overwrite snapshotState method.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for this comment, as the discuss before, CreateTableEvent will always sent before DataChangEvent, so we don't need to consider this situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that it will resend CreateTableEvent even if it is restarted.

# Conflicts:
#	flink-cdc-common/src/main/java/com/ververica/cdc/common/data/GenericStringData.java
wudi added 2 commits November 28, 2023 21:52
# Conflicts:
#	flink-cdc-common/src/main/java/com/ververica/cdc/common/utils/SchemaUtils.java
wudi added 4 commits November 29, 2023 16:30
# Conflicts:
#	flink-cdc-common/src/main/java/com/ververica/cdc/common/utils/SchemaUtils.java
@JNSimba
Copy link
Member Author

JNSimba commented Nov 29, 2023

Addressed the comments. PTAL @lvyanquan

@lvyanquan
Copy link
Contributor

lvyanquan commented Nov 30, 2023

Thanks for your contribution. Over look good to me. Left some comments, and can you clean the commit message?

@JNSimba
Copy link
Member Author

JNSimba commented Nov 30, 2023

Addressed the comments. PTAL @lvyanquan

@JNSimba
Copy link
Member Author

JNSimba commented Dec 1, 2023

Addressed the comments. PTAL @lvyanquan

@JNSimba JNSimba changed the title [WIP][3.0][cdc-runtime] add DorisDataSink for Pipeline [cdc-pipeline-connector][doris] Introduce Doris cdc pipeline DataSink Dec 2, 2023
@leonardBang
Copy link
Contributor

Resolved by 4abd86a

@leonardBang leonardBang closed this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[flink-cdc-pipeline-connectors] Add Implementation of DataSink In Doris
4 participants