-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: successive exec of flushdb may cause delete old db fail #2790
fix: successive exec of flushdb may cause delete old db fail #2790
Conversation
2 add an atomic flag for flushdb 3 add callback function when deleted old path 4 revised some ugly code
WalkthroughRecent updates introduce more robust control and error handling for database flush operations, including new methods and improved logging. Key changes involve adding callback support for directory purges, enhancing flushing logic to append timestamps for conflict resolution, and better tracking the success of operations. These modifications ensure smoother and more informative administrative commands and transactions in the system. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PikaAdmin
participant PikaDB
participant PikaServer
User->>PikaAdmin: Initiate FlushallCmd
PikaAdmin->>PikaDB: Call FlushAllWithoutLock
PikaDB-->>PikaAdmin: Return success/fail status
PikaAdmin->>PikaAdmin: Update flushall_succeed_ flag
PikaAdmin->>PikaServer: Call DoBinlog if flushall_succeed_
PikaServer-->>PikaAdmin: Binlog operation status
PikaAdmin->>User: Return operation result
User->>PikaServer: Request directory purge with callback
PikaServer->>PikaServer: Schedule PurgeDirTask
PikaServer->>PikaServer: Execute DoPurgeDirWithCallBack
PikaServer-->>User: Confirm purge completion
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (7)
- include/pika_admin.h (2 hunks)
- include/pika_db.h (2 hunks)
- include/pika_server.h (1 hunks)
- src/pika_admin.cc (5 hunks)
- src/pika_db.cc (3 hunks)
- src/pika_server.cc (2 hunks)
- src/pika_transaction.cc (1 hunks)
Additional comments not posted (25)
include/pika_db.h (1)
151-153
: LGTM!The new method
ResetIsFlushingDBToFalse
correctly sets theis_flushing_db_
flag to false.src/pika_transaction.cc (1)
Line range hint
63-67
:
Ensure proper error handling forFlushdbCmd
.The
DoWithoutLock
method now returns a boolean indicating success or failure. Ensure that the failure case is handled appropriately.include/pika_server.h (1)
205-205
: LGTM! Verify callback usage.The new method
PurgeDirWithCallBack
correctly adds functionality for directory purging with a callback. Ensure that the callback function is used correctly in all relevant places.src/pika_db.cc (3)
47-47
: LGTM!The
is_flushing_db_
flag is correctly initialized tofalse
in the constructor.
201-205
: LGTM! Verify the correctness of the logic.The logic to handle the
is_flushing_db_
flag in theFlushDBWithoutLock
method ensures that only one flush operation can occur at a time. Proper logging is included for failure cases.
228-230
: LGTM! Verify callback usage.The usage of the callback function to reset the
is_flushing_db_
flag in theFlushDBWithoutLock
method is correct and ensures that the flag is reset after the directory is purged.include/pika_admin.h (9)
188-188
: Change return type tobool
forFlushAllWithoutLock
.Changing the return type to
bool
is a good practice to indicate success or failure. Ensure that this method is properly tested and that the return values are correctly handled in the calling code.
189-189
: AddDoBinlog
method.The addition of the
DoBinlog
method is noted. Ensure that this method is correctly implemented and integrated with the rest of the class.
193-193
: Change return type tobool
forDoWithoutLock
.Changing the return type to
bool
is a good practice to indicate success or failure. Ensure that this method is properly tested and that the return values are correctly handled in the calling code.
194-194
: AddClear
method.The addition of the
Clear
method is noted. This method resets theflushall_succeed_
flag, which is crucial for maintaining the state of the command. Ensure that this method is correctly called where necessary.
196-196
: Addflushall_succeed_
flag.The addition of the
flushall_succeed_
flag is noted. This flag is essential for tracking the success of the flush operation. Ensure that this flag is correctly set and checked in the relevant methods.
212-212
: AddDoBinlog
method.The addition of the
DoBinlog
method is noted. Ensure that this method is correctly implemented and integrated with the rest of the class.
213-213
: Change return type tobool
forDoWithoutLock
.Changing the return type to
bool
is a good practice to indicate success or failure. Ensure that this method is properly tested and that the return values are correctly handled in the calling code.
217-220
: AddClear
method.The addition of the
Clear
method is noted. This method resets theflush_succeed_
flag and clears thedb_name_
, which is crucial for maintaining the state of the command. Ensure that this method is correctly called where necessary.
222-222
: Addflush_succeed_
flag.The addition of the
flush_succeed_
flag is noted. This flag is essential for tracking the success of the flush operation. Ensure that this flag is correctly set and checked in the relevant methods.src/pika_server.cc (3)
813-816
: Verify callback behavior.The
PurgeDirWithCallBack
function schedules a directory purge operation with a callback. Ensure that the callback function does not introduce side effects or unexpected behavior.
51-57
: Ensure callback execution under all circumstances.The
DoPurgeDirWithCallBack
function calls the callback after the directory deletion. Ensure that the callback is executed even ifpstd::DeleteDir
fails.
44-49
: Verify appropriate usage of callback.The
PurgeDirArgsWithCallBackArgs
struct encapsulates the arguments for directory purge operations with a callback. Ensure that thestd::function<void()>
callback is used appropriately and doesn't cause memory leaks or dangling references.src/pika_admin.cc (7)
495-495
: Initialization offlushall_succeed_
The initialization of the
flushall_succeed_
flag tofalse
is a good practice to ensure the success state is correctly tracked.
514-524
: Usage offlushall_succeed_
flagThe update to use the
flushall_succeed_
flag to track the success of the flush operation is appropriate. Ensure that the flag is correctly set and checked to communicate the operation's success.
533-536
: Check forflushall_succeed_
flagThe check for the
flushall_succeed_
flag before clearing the cache is a good practice to prevent unnecessary cache clearing in case of a flush failure.
543-554
: Return type change tobool
The change to return a
bool
indicating the success of the flush operation allows for better error handling and communication of the operation's success.
557-570
: Return type change tobool
The change to return a
bool
indicating the success of the flush operation allows for better error handling and communication of the operation's success. The function also updates the cache if the flush operation is successful.
571-574
: Addition ofDoBinlog
functionThe addition of the
DoBinlog
function ensures that binlog operations are handled correctly after a successful flush operation.
Line range hint
578-646
:
Updates toFlushdbCmd
classThe updates to the
FlushdbCmd
class, including the use of a success flag andbool
return type for theDoWithoutLock
method, ensure consistency in handling flush operations across different commands. These changes improve error handling and communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/pika_db.cc (1 hunks)
- src/pika_server.cc (2 hunks)
Files skipped from review as they are similar to previous changes (2)
- src/pika_db.cc
- src/pika_server.cc
a4a1d2b
to
86578e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/pika_admin.cc (5 hunks)
- src/pika_db.cc (1 hunks)
- src/pika_server.cc (2 hunks)
Files skipped from review as they are similar to previous changes (3)
- src/pika_admin.cc
- src/pika_db.cc
- src/pika_server.cc
remove debug log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/pika_admin.cc (5 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/pika_admin.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/pika_admin.cc (5 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/pika_admin.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/pika_admin.cc (5 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/pika_admin.cc
please add test |
Yeah, please add a test. |
* 1 add return flag for flushdb * removed duplicated logic --------- Co-authored-by: cheniujh <[email protected]>
…mFoundation#2790) * 1 add return flag for flushdb 2 add an atomic flag for flushdb 3 add callback function when deleted old path 4 revised some ugly code * add timestamp for deleting suffix when flushdb * remove debug remove debug log * change a return value to fixed "false" * add failed reason to user resp * removed duplicated logic --------- Co-authored-by: cheniujh <[email protected]>
…mFoundation#2790) * 1 add return flag for flushdb * removed duplicated logic --------- Co-authored-by: cheniujh <[email protected]>
该PR修复了 Issue #2698
问题原因:
Pika的flushdb, 删除旧实例的目录是异步执行,client这边直接返回ok,但如果之前的旧目录还没有删除完毕,新到达的flushdb会实际上执行失败。 如果用户短时间内提交好几条flushdb, 其实就第一条会成功,后面几条很有可能会失败(尤其是之前旧DB目录比较大的情况下),然后pika是不管flushdb是否成功都返回了ok。Issue中gdb抓取到的异常,是由于旧的db_path_deleting目录还存在(异步执行删除),所以下一条flushdb试图执行rename逻辑时(将db_path,rename为db_path_deleting)会发现db_path_deleting目录已经存在,所以rename失败,但是原代码中没有对这种失败的处理,尽管db目录没有清除掉,但是照样会写binlog,会清空缓存,还会告知用户flushdb已经成功, 这是完全错误的。
问题修复:
1 给flushdb增加了is_flushing_db标志位和异步回调函数,这样下一条flushdb如果在前一个flushdb的异步任务(删除旧db目录)没有执行完之前,就会执行失败。(更换了一种修复方式,直接在rename旧db时,在后缀deleteing后面加上时间戳,这样即使有多个flushdb连续达到,也不会因为旧目录没有删除掉而触发rename失败)
2 给flushdb链路增加了返回值,如果flushdb失败,会告知用户执行失败,而不是无脑返回OK
3 如果flushdb失败了,就不写binlog,也不去清空缓存
4 对于flushall命令,但凡有某个db flush失败,都返回告知用户
This PR fixes Issue #2698.
Cause of the Problem:
The issue arises from Pika's
flush
operation, where the deletion of the old instance directory is performed asynchronously. The client side immediately receives anOK
, but if the old directory has not been fully deleted, any newflushdb
commands will actually fail. If a user submits severalflushdb
commands in quick succession, only the first one is likely to succeed. The subsequent ones will most likely fail (particularly if the old DB directory is quite large). Pika returnedOK
regardless of whether theflushdb
succeeded. The exception captured bygdb
in the issue is due to the olddb_path_deleting
directory still existing (as the deletion is asynchronous). Consequently, the nextflushdb
tries to execute the rename operation (renamingdb_path
todb_path_deleting
), but finds that thedb_path_deleting
directory already exists and thus the rename fails. The original code didn't handle this failure; the binlog was still written, the cache was cleared, and the user was informed thatflushdb
was successful despite the fact that the DB directory was not cleaned up.Problem Fix:
A flagis_flushing_db
and a callback function were added to theflushdb
. This ensures that any subsequentflushdb
commands will fail if the asynchronous task of deleting the old DB directory from a previousflushdb
has not completed.(An alternative fix was implemented: by appending a timestamp to the suffix of
db_path_deleting
during the rename operation. Therefore, even if multipleflushdb
commands arrive consecutively, the rename will not fail due to the old directory not being deleted.)flushdb
now returns a value. Ifflushdb
fails, it will inform the user of the failure instead of blindly returningOK
.If
flushdb
fails, neither the binlog is written nor is the cache cleared.For the
flushall
command, if anyflushdb
fails, the user is informed of the failure.Summary by CodeRabbit
New Features
Bug Fixes
Improvements