You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Downstream system's experiencing slowness or short of resources (provisioned resource is not enough and need scale up/out)
The upstream TiDB encountered a peak throughput (5x or 10x throughput) for a while (10minutes or half hour), but the downstream sinking speed can not catch up with the throughput
Current TiCDC behavior is fetching all upstream data changes ASAP and cache them in TiCDC, if the sinking speed can not catchup with the data changes producing speed, it will eat a lot of resource (memory + sorter disk) to cache all these new produced data changes and these data changes will pile up more and more. This is a huge risk for TiCDC in terms of stability, it may cause OOM issue and disk out of space issue for TiCDC, as well as sorter compaction can't catchup upstream data changes speeds cause TiCDC slowdown issue.
Back Pressure Mechanism
If TiCDC can pull/fetch new data changes according to the capability of downstream consumers, if the downstream system is experiencing temporary slowness, TiCDC slowdown the new data changes fetch speed. TiCDC don't need to hold so many data changes, it will result in predictable resource consumption for TiCDC which can improve the stability of TiCDC.
Preparations
The upstream TiDB/TiKV should have the capability of holding not consumed incremental data changes. Comparing to use current TiKV MVCC mechanism to store these incremental data changes, if upstream TiDB introduce txn/redo log to store time series data changes, it would be easier to achieve such back pressure mechanism for TiCDC.
Replication Lag Monitoring
Before and after this back pressure mechanism, if there is peak throughput or downstream slowness problem, the replication lag suppose to increase. Users should can monitor the replication lag and take action like scale up/out downstream to resolve such issue.
The text was updated successfully, but these errors were encountered:
// This is a long term goal.
Scenario Description
TiDB -> TiCDC -> MySQL/TiDB/Kafka/External Storage
In the case of:
Current TiCDC behavior is fetching all upstream data changes ASAP and cache them in TiCDC, if the sinking speed can not catchup with the data changes producing speed, it will eat a lot of resource (memory + sorter disk) to cache all these new produced data changes and these data changes will pile up more and more. This is a huge risk for TiCDC in terms of stability, it may cause OOM issue and disk out of space issue for TiCDC, as well as sorter compaction can't catchup upstream data changes speeds cause TiCDC slowdown issue.
Back Pressure Mechanism
If TiCDC can pull/fetch new data changes according to the capability of downstream consumers, if the downstream system is experiencing temporary slowness, TiCDC slowdown the new data changes fetch speed. TiCDC don't need to hold so many data changes, it will result in predictable resource consumption for TiCDC which can improve the stability of TiCDC.
Preparations
The upstream TiDB/TiKV should have the capability of holding not consumed incremental data changes. Comparing to use current TiKV MVCC mechanism to store these incremental data changes, if upstream TiDB introduce txn/redo log to store time series data changes, it would be easier to achieve such back pressure mechanism for TiCDC.
Replication Lag Monitoring
Before and after this back pressure mechanism, if there is peak throughput or downstream slowness problem, the replication lag suppose to increase. Users should can monitor the replication lag and take action like scale up/out downstream to resolve such issue.
The text was updated successfully, but these errors were encountered: