Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka sink: Optimize DML with column type changes that don't change data #8095

Closed
dveeden opened this issue Jan 17, 2023 · 8 comments
Closed
Labels
area/ticdc Issues or PRs related to TiCDC. component/sink Sink component. type/feature Issues about a new feature

Comments

@dveeden
Copy link
Contributor

dveeden commented Jan 17, 2023

Is your feature request related to a problem?

Setup

TiUP Playground with TiCDC

tiup playground --ticdc 1 --tiflash 0 --without-monitor nightly

Kafka

export CONFLUENT_HOME=/home/dvaneeden/confluent-7.3.1
confluent local services start

Changefeed

tiup cdc:nightly cli changefeed create \
--sink-uri="kafka://127.0.0.1:9092/test?protocol=avro" \
--schema-registry="http://127.0.0.1:8081"

Consumer

cd $CONFLUENT_HOME
./bin/kafka-avro-console-consumer --topic test \
--bootstrap-server 127.0.0.1:9092 -\
-property schema.registry.url=http://127.0.0.1:8081 \
--from-beginning

Table

CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, c1 CHAR(5));
INSERT INTO t1(c1) VALUES ("foo"),("bar"),("baz");

Tests

ALTER TABLE t1 MODIFY COLUMN c1 CHAR(6);

Result: No consumer output

ALTER TABLE t1 MODIFY COLUMN c1 VARCHAR(6);

Result:

{"id":1,"c1":{"string":"foo"}}
{"id":2,"c1":{"string":"bar"}}
{"id":3,"c1":{"string":"baz"}}
ALTER TABLE t1 MODIFY COLUMN c1 VARCHAR(7);

Result: No consumer output

The problem here is that changing the column type from CHAR to VARCHAR results in all rows being updated in the sink, which for large tables can be problematic.

Describe the feature you'd like

Handle changes between CHAR and VARCHAR the same as length changes for these columns without sending the full table to the sink.

Related: pingcap/tidb#40574

Describe alternatives you've considered

No response

Teachability, Documentation, Adoption, Migration Strategy

No response

@dveeden dveeden added the type/feature Issues about a new feature label Jan 17, 2023
@dveeden
Copy link
Contributor Author

dveeden commented Jan 17, 2023

/component sink
/area ticdc

@ti-chi-bot ti-chi-bot added component/sink Sink component. area/ticdc Issues or PRs related to TiCDC. labels Jan 17, 2023
@keweishang
Copy link

In general, whenever DDL is applied to a table, we'd like for TiCDC to NOT re-send the existing rows from the table. Essentially, we want TiCDC to ignore DDL.

@bb7133
Copy link
Member

bb7133 commented Jan 17, 2023

Thanks for reporting this.

This has been planned already and some predecessor tasks are done(pingcap/tidb#39159): with the txn_source, we are able to filter the txn written by ALTER TABLE ... MODIFY and ignore them in TiCDC.

@bb7133
Copy link
Member

bb7133 commented Jan 17, 2023

Essentially, we want TiCDC to ignore DDL.

Hi @keweishang , could you explain a little bit why you want to "ignore DDL"? If you are talking about the DDL statement itself, it might not be a good idea since the schema change will not be replicated.

@keweishang
Copy link

keweishang commented Jan 18, 2023

Hi @bb7133 we just don't want Kafka to received all the existing rows again. After the DDL is applied and new rows are inserted into the table, we want Kafka to received the new rows with the new schema. Also we're not really interested in the DDL event itself. Debezium sends the DDL event with the DDL statement to a separate Kafka topic, but we never really used it.

@asddongmen
Copy link
Contributor

asddongmen commented Jan 18, 2023

Hi @bb7133 we just don't want Kafka to received all the existing rows again. After the DDL is applied and new rows are inserted into the table, we want Kafka to received the new rows with the new schema. Also we're not really interested in the DDL event itself. Debezium sends the DDL event with the DDL statement to a separate Kafka topic, but we never really used it.
@keweishang
TiCDC already support this feature, FYI: https://docs.pingcap.com/tidb/dev/ticdc-filter#event-filter-rules

@nongfushanquan
Copy link
Contributor

/assign @hi-rustin

@Rustin170506 Rustin170506 removed their assignment Jan 3, 2024
@asddongmen
Copy link
Contributor

asddongmen commented Jan 20, 2024

Duplicate of: #8686
This feature is already supported in: v7.1.0, v6.5.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. component/sink Sink component. type/feature Issues about a new feature
Projects
None yet
Development

No branches or pull requests

7 participants