Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: db cannot resume after disk has no space #1765

Merged
merged 7 commits into from
Jul 26, 2023
Merged

fix: db cannot resume after disk has no space #1765

merged 7 commits into from
Jul 26, 2023

Conversation

Yangsx-1
Copy link
Contributor

Close #1150

# If db has a background error, it will try to manually call resume() to resume db if satisfy the least free disk to resume.
# Its default value is 600 seconds.
#manually-resume-interval : 600

# This window-size determines the amount of data that can be transmitted in a single synchronization process.
# [Tip] In the scenario of high network latency. Increasing this size can improve synchronization efficiency.
# Its default value is 9000. the [maximum] value is 90000.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the provided code patch, here are some observations and suggestions:

  1. The added configurations least-free-disk-resume-size and manually-resume-interval seem appropriate in terms of functionality. However, it's important to ensure that these values are suitable for your specific use case and environment.

  2. For the least-free-disk-resume-size configuration, the value of 268435456 bytes corresponds to approximately 256MB, as mentioned in the comment. If this value aligns with your requirements, there doesn't appear to be any bug or issue.

  3. Similarly, for the manually-resume-interval configuration, the default value of 600 seconds (10 minutes) seems reasonable. Again, make sure it suits your needs.

  4. Reviewing the rest of the code patch, there don't appear to be any bugs or risks. However, it's difficult to provide a comprehensive review without seeing the context or the surrounding code.

  5. If you have any specific concerns or further details regarding the code, please provide them for a more thorough review.

Remember to thoroughly test the changes and monitor the system after applying the patch to ensure it behaves as expected in your specific setup.

@AlexStocks AlexStocks requested a review from a user July 19, 2023 13:20
src/pika_server.cc Outdated Show resolved Hide resolved
@AlexStocks
Copy link
Collaborator

AlexStocks commented Jul 19, 2023

本 pr 已经 amd64 arch 之上的 ubuntu 20.04 OS 测试通过

conf/pika.conf Outdated Show resolved Hide resolved
uint64_t free_size = disk_info.f_bsize * disk_info.f_bfree;
struct timeval now;
gettimeofday(&now, nullptr);
if (last_check_resume_time_.tv_sec == 0 || now.tv_sec - last_check_resume_time_.tv_sec >= interval) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在这个 if 上面加上英文注释,体现 ”第一次检验 或者 距离上次 check 时间超过 interval“ 这个 condition

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第二,我建议先检测下当前磁盘总体使用情况,整体空闲率低于某个阈值【如 30%】时,才进来执行 check 操作,以减少加锁次数

conf/pika.conf Outdated Show resolved Hide resolved
@@ -1487,6 +1490,46 @@ void PikaServer::AutoKeepAliveRSync() {
}
}

void PikaServer::AutoResumeDB() {
struct statfs disk_info;
int ret = statfs(g_pika_conf->db_path().c_str(), &disk_info);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it uses statfs(deprecated) on the other one. I perfer to use statvfs based on https://www.mkssoftware.com/docs/man3/statfs.3.asp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infdahai Could you please help me review the code? #1822 I have made relevant changes to your suggestion

# If db has a background error, it will try to manually call resume() to resume db if satisfy the least free disk to resume.
# Its default value is 60 seconds.
#manually-resume-interval : 60

# This window-size determines the amount of data that can be transmitted in a single synchronization process.
# [Tip] In the scenario of high network latency. Increasing this size can improve synchronization efficiency.
# Its default value is 9000. the [maximum] value is 90000.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some observations and suggestions for the code patch:

  1. The commented out section "#compact-interval :" indicates that the configuration option is not being used. If it's necessary, make sure to uncomment it and provide a value.

  2. In the added configurations related to disk usage and resume, ensure that the values provided are appropriate for your system requirements and environment.

  3. The comment "[NOTICE]: least-free-disk-resume-size should not be smaller than write-buffer-size!" highlights an important consideration. Make sure to set the "least-free-disk-resume-size" value accordingly to avoid potential issues.

  4. Review the default values for "min-check-resume-ratio" (0.7) and "manually-resume-interval" (60 seconds) to confirm if they align with your desired behavior.

  5. In the last comment about "window-size," consider reviewing the current value (9000) and determining if increasing it would be beneficial for your specific network latency scenario.

  6. Overall, code reviews often require context and cannot be done solely based on the provided code snippet. It's recommended to review the entire codebase, including dependencies and integrated components, to identify potential bugs, risks, and improvement opportunities more accurately.

Remember to thoroughly test any changes made and consider seeking feedback from other developers familiar with the project for a comprehensive review.

@AlexStocks AlexStocks merged commit 5511c42 into OpenAtomFoundation:unstable Jul 26, 2023
@Yangsx-1 Yangsx-1 deleted the add-capacity-check branch July 27, 2023 07:53
bigdaronlee163 pushed a commit to bigdaronlee163/pika that referenced this pull request Jun 8, 2024
* add auto resume

* update resume code

* add conf code

* format code

* add min-check-resume-ratio
cheniujh pushed a commit to cheniujh/pika that referenced this pull request Sep 24, 2024
* add auto resume

* update resume code

* add conf code

* format code

* add min-check-resume-ratio
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pika 实例硬盘容量满,清理数据后,pika仍然保持错误的状态,需要重启才能恢复
4 participants