-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: masetr-slave maybe inconsistent due to bgsave may provide a wrong binlog offset #2818
fix: masetr-slave maybe inconsistent due to bgsave may provide a wrong binlog offset #2818
Conversation
WalkthroughThe recent changes enhance the control flow of database locking in the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/pika_command.cc (1 hunks)
Additional comments not posted (3)
src/pika_command.cc (3)
871-874
: Ensure the correctness of conditional locking logic.The conditional acquisition of the shared lock based on
IsSuspend()
improves performance by avoiding unnecessary locking. However, verify that theIsSuspend()
method accurately reflects the command's suspension state to prevent potential data races or deadlocks.
882-884
: Ensure the correctness of conditional unlocking logic.The conditional release of the shared lock based on
IsSuspend()
ensures that the lock is only released if it was acquired. Verify that theIsSuspend()
method accurately reflects the command's suspension state to prevent potential data races or deadlocks.
875-875
: Verify the removal of the DEFER block for unlocking.The removal of the DEFER block for unlocking the database lock indicates a shift towards more explicit control of the lock's lifecycle. Ensure that this change does not introduce any potential issues with lock management and that the explicit control is correctly implemented.
附1: 2 原版 第二次: 第三次: 参数: 1亿负载,benchmark 20线程数: 1 本PR: 2 原版代码: |
Co-authored-by: cheniujh <[email protected]>
Co-authored-by: cheniujh <[email protected]>
Co-authored-by: cheniujh <[email protected]>
该PR修复了 Issue #2805
问题重述:
Pika做bgsave时会给DB::dbs_rw_这把大锁上写锁,而所有命令执行时都会在链路上给这把dbs_rw_上读锁。所以bgsave上了写锁以后,相当于暂时阻写来打快照,在bgsave对dbs_rw_的写锁范围内,做了两件事:1 打快照 2 获取此时的最新binlog offset(从节点全量传输完快照之后才知道该从哪里开始续传binlog)
问题在于:在命令执行链路上的读锁,其范围只覆盖了DoCommand(),也就是写DB,Dobinlog()是在该读锁放开以后采取执行的。这就意味着:当bgsave处成功独占了dbs_rw_之后,写链路上可能还有未执行完的DoBinlog(也就是还有Binlog正在写入),那么bgsave此时获取的binlog offset实际上是一个偏前的位置,按照正确预期bgsave应该等到这些Binlog完全落盘了才能去获取Binlog offset,因为pika的binlog目前不幂等,得到一个偏前的位置,会在后续造成Slave重复消费某部分Binlog,进而导致主从不一致。
解决方案:
本PR将命令执行链路上针对dbs_rw_的读锁修改了覆盖范围,让他将DoCommand,Dobinlog都覆盖进去。换而言之,一条普通命令在执行前会先获取dbs_rw_的读锁,之后在完成了WriteDB, WriteBInlog之后才会放开读锁。这就能确保当bgsave能给dbs_rw_加上独占的写锁时,之前的所有请求/命令对应的Binlog也已经落盘,这样获取到的Binlog Offset就是准确的。
另外由于调整了锁范围,所以进行了性能测试对比(本PR vs 原代码),大致结论为:
1 每次负载500W请求,共测试4次,平均值如下:
原代码:QPS为153401, P99为3.279ms
本PR:QPS为152590, P99为3.359ms
结论:本PR在500万请求量下,QPS降幅在0.08%,P99的增幅在2%左右,未见明显性能下降。
2 每次负载1亿请求,共测试1次,具体如下:
原代码:QPS为144849, P99为3.615ms
本PR:QPS为145694, P99为3.335ms
结论:本PR在1亿请求量下未引起性能下降。
PS:
./redis-benchmark -h * -p 9221 -t set -n 100000000/5000000 -r 10000000000000 -d 512 -c 300 --threads 20
This PR fixes Issue #2805.
Problem Restatement:
When Pika performs
bgsave
, it acquires a write lock on theDB::dbs_rw_
mutex, while all command executions acquire a read lock on the samedbs_rw_
mutex. Consequently, whenbgsave
acquires the write lock, it temporarily blocks writes to take a snapshot. Within the scope ofbgsave
's write lock ondbs_rw_
, two tasks are performed: 1) taking a snapshot and 2) obtaining the latest binlog offset (to determine where to resume binlog transmission after the slave completes the snapshot).The issue lies in the fact that the read lock on the command execution path only covers
DoCommand()
, which involves writing to the DB.DoBinlog()
is executed after the read lock is released. This means that whenbgsave
successfully acquires the exclusive write lock ondbs_rw_
, there may still be pendingDoBinlog
operations (i.e., binlogs still being written). Thus, the binlog offset obtained bybgsave
is prematurely captured. Ideally,bgsave
should wait for all pending binlogs to be fully written before capturing the binlog offset. Since Pika's binlog is currently non-idempotent, obtaining a premature binlog offset can lead to the slave repeatedly consuming certain portions of the binlog, resulting in master-slave inconsistencies.Solution:
This PR modifies the read lock coverage on the command execution path for
dbs_rw_
. It extends the read lock to encompass bothDoCommand
andDoBinlog
. In other words, a normal command will first acquire a read lock ondbs_rw_
, and it will only release this read lock after completing bothWriteDB
andWriteBinlog
. This ensures that whenbgsave
acquires the exclusive write lock ondbs_rw_
, all previous requests/commands and their corresponding binlogs have been fully written. As a result, the binlog offset obtained will be accurate.