-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: db cannot resume after disk has no space #1765
fix: db cannot resume after disk has no space #1765
Conversation
# If db has a background error, it will try to manually call resume() to resume db if satisfy the least free disk to resume. | ||
# Its default value is 600 seconds. | ||
#manually-resume-interval : 600 | ||
|
||
# This window-size determines the amount of data that can be transmitted in a single synchronization process. | ||
# [Tip] In the scenario of high network latency. Increasing this size can improve synchronization efficiency. | ||
# Its default value is 9000. the [maximum] value is 90000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the provided code patch, here are some observations and suggestions:
-
The added configurations
least-free-disk-resume-size
andmanually-resume-interval
seem appropriate in terms of functionality. However, it's important to ensure that these values are suitable for your specific use case and environment. -
For the
least-free-disk-resume-size
configuration, the value of 268435456 bytes corresponds to approximately 256MB, as mentioned in the comment. If this value aligns with your requirements, there doesn't appear to be any bug or issue. -
Similarly, for the
manually-resume-interval
configuration, the default value of 600 seconds (10 minutes) seems reasonable. Again, make sure it suits your needs. -
Reviewing the rest of the code patch, there don't appear to be any bugs or risks. However, it's difficult to provide a comprehensive review without seeing the context or the surrounding code.
-
If you have any specific concerns or further details regarding the code, please provide them for a more thorough review.
Remember to thoroughly test the changes and monitor the system after applying the patch to ensure it behaves as expected in your specific setup.
本 pr 已经 amd64 arch 之上的 ubuntu 20.04 OS 测试通过 |
uint64_t free_size = disk_info.f_bsize * disk_info.f_bfree; | ||
struct timeval now; | ||
gettimeofday(&now, nullptr); | ||
if (last_check_resume_time_.tv_sec == 0 || now.tv_sec - last_check_resume_time_.tv_sec >= interval) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在这个 if 上面加上英文注释,体现 ”第一次检验 或者 距离上次 check 时间超过 interval“ 这个 condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
第二,我建议先检测下当前磁盘总体使用情况,整体空闲率低于某个阈值【如 30%】时,才进来执行 check 操作,以减少加锁次数
@@ -1487,6 +1490,46 @@ void PikaServer::AutoKeepAliveRSync() { | |||
} | |||
} | |||
|
|||
void PikaServer::AutoResumeDB() { | |||
struct statfs disk_info; | |||
int ret = statfs(g_pika_conf->db_path().c_str(), &disk_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it uses statfs
(deprecated) on the other one. I perfer to use statvfs
based on https://www.mkssoftware.com/docs/man3/statfs.3.asp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# If db has a background error, it will try to manually call resume() to resume db if satisfy the least free disk to resume. | ||
# Its default value is 60 seconds. | ||
#manually-resume-interval : 60 | ||
|
||
# This window-size determines the amount of data that can be transmitted in a single synchronization process. | ||
# [Tip] In the scenario of high network latency. Increasing this size can improve synchronization efficiency. | ||
# Its default value is 9000. the [maximum] value is 90000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some observations and suggestions for the code patch:
-
The commented out section "#compact-interval :" indicates that the configuration option is not being used. If it's necessary, make sure to uncomment it and provide a value.
-
In the added configurations related to disk usage and resume, ensure that the values provided are appropriate for your system requirements and environment.
-
The comment "[NOTICE]: least-free-disk-resume-size should not be smaller than write-buffer-size!" highlights an important consideration. Make sure to set the "least-free-disk-resume-size" value accordingly to avoid potential issues.
-
Review the default values for "min-check-resume-ratio" (0.7) and "manually-resume-interval" (60 seconds) to confirm if they align with your desired behavior.
-
In the last comment about "window-size," consider reviewing the current value (9000) and determining if increasing it would be beneficial for your specific network latency scenario.
-
Overall, code reviews often require context and cannot be done solely based on the provided code snippet. It's recommended to review the entire codebase, including dependencies and integrated components, to identify potential bugs, risks, and improvement opportunities more accurately.
Remember to thoroughly test any changes made and consider seeking feedback from other developers familiar with the project for a comprehensive review.
* add auto resume * update resume code * add conf code * format code * add min-check-resume-ratio
* add auto resume * update resume code * add conf code * format code * add min-check-resume-ratio
Close #1150