(TiCDC) back pressure mechanism #10298

zhangjinpeng87 · 2023-12-13T20:02:05Z

// This is a long term goal.

Scenario Description

TiDB -> TiCDC -> MySQL/TiDB/Kafka/External Storage

In the case of:

Downstream system's experiencing slowness or short of resources (provisioned resource is not enough and need scale up/out)
The upstream TiDB encountered a peak throughput (5x or 10x throughput) for a while (10minutes or half hour), but the downstream sinking speed can not catch up with the throughput

Current TiCDC behavior is fetching all upstream data changes ASAP and cache them in TiCDC, if the sinking speed can not catchup with the data changes producing speed, it will eat a lot of resource (memory + sorter disk) to cache all these new produced data changes and these data changes will pile up more and more. This is a huge risk for TiCDC in terms of stability, it may cause OOM issue and disk out of space issue for TiCDC, as well as sorter compaction can't catchup upstream data changes speeds cause TiCDC slowdown issue.

Back Pressure Mechanism

If TiCDC can pull/fetch new data changes according to the capability of downstream consumers, if the downstream system is experiencing temporary slowness, TiCDC slowdown the new data changes fetch speed. TiCDC don't need to hold so many data changes, it will result in predictable resource consumption for TiCDC which can improve the stability of TiCDC.

Preparations

The upstream TiDB/TiKV should have the capability of holding not consumed incremental data changes. Comparing to use current TiKV MVCC mechanism to store these incremental data changes, if upstream TiDB introduce txn/redo log to store time series data changes, it would be easier to achieve such back pressure mechanism for TiCDC.

Replication Lag Monitoring

Before and after this back pressure mechanism, if there is peak throughput or downstream slowness problem, the replication lag suppose to increase. Users should can monitor the replication lag and take action like scale up/out downstream to resolve such issue.

zhangjinpeng87 added type/enhancement The issue or PR belongs to an enhancement. type/feature Issues about a new feature labels Dec 13, 2023

zhangjinpeng87 mentioned this issue Dec 21, 2023

stability (cdc) improve the stability of TiCDC #10343

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(TiCDC) back pressure mechanism #10298

(TiCDC) back pressure mechanism #10298

zhangjinpeng87 commented Dec 13, 2023 •

edited

Loading

(TiCDC) back pressure mechanism #10298

(TiCDC) back pressure mechanism #10298

Comments

zhangjinpeng87 commented Dec 13, 2023 • edited Loading

Scenario Description

Back Pressure Mechanism

Preparations

Replication Lag Monitoring

zhangjinpeng87 commented Dec 13, 2023 •

edited

Loading