Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
功能介绍
为了调节 Load 阶段每次写入目标库的事务大小,支持在 otter manager 配置 otter node 的
com.alibaba.otter.node.etl.load.loader.db.DbLoadAction.batchSize
参数:otter/node/etl/src/main/java/com/alibaba/otter/node/etl/load/loader/db/DbLoadAction.java
Line 103 in 7f80d17
manager 界面举例:
添加 pipeline
查看 pipeline 信息
背景
业务场景为有一张存在突发大量写入(INSERT、DELETE)的表需要通过 otter 进行同步,观察到突发大量写入(3000+ 行/秒)时,同步出现延迟
观察 select 日志和 load 日志,发现:
由于 pipeline 在 E.、T. 阶段没有配置字段转换等规则,推测为 L. 阶段目标库写入存在瓶颈
查看目标库表设计,存在较多索引,同时发现
com.alibaba.otter.node.etl.load.loader.db.DbLoadAction.batchSize
为写死的50
(即 Load 阶段一个事务最多写入 50 条数据),因此怀疑目标库频繁的写入事务重建索引等操作降低了写入的吞吐量,将此数值调大后,发现吞吐量有较大提升修改前
batchSize = 50,同步出现延迟(代表 Load 打满)时,吞吐量约为 60,000 行/分钟
batchSize-50-吞吐
batchSize-50-延迟
修改后
batchSize = 10000,同时调大了消费批次大小以确保每个 batch 行数足够满足每个事务。同步出现延迟时,吞吐量约为 350,000 行/分钟;吞吐量为 88,000 行/分钟 时,没有出现延迟
batchSize-10000-吞吐
batchSize-10000-延迟
建议配置
如果用户遇到的问题及目标库结构类似,可以进行相似调整,注意以下几点: