Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add a user-friendly repl metric "repl_connect_status" in the resp of info command #2656

Conversation

cheniujh
Copy link
Collaborator

@cheniujh cheniujh commented May 17, 2024

Due to the merge of #2638(Changed the processing of TrySync Resp from asynchronous to synchronous.) , slave may stay in WaitReply state for a while(in some extreme scenario, WaitReply state could last even 1-2 minutes), during the period of slave being "WaitReply" state, the metric "master_link_status"(which is fetched by info command) is down, which might trigger monitoring alerts set by the operations personnel.
So it's needed to provide an more granular metric for the user/operations personnel to know what is going on when master_link_status is down. And that's why the following monitoring metric is added by this PR:
metirc name: repl_connect_status
value range: {no_connect, try_to_incr_sync, try_to_full_sync, syncing_full, connecting, error}

由于 #2638 的合并(将TrySync Resp的处理由异步改成了同步),slave可能会在WaitReply状态下停留一段时间(在某些极端情况下,WaitReply状态可能会持续1-2分钟),在slave处于“WaitReply”状态期间,通过info命令获取的“master_link_status”指标会显示为down,这可能会触发运维人员设置的监控警报。
因此,需要提供一个更细粒度的指标,以便用户/运维人员在master_link_status为down时了解实际情况。这也是为什么通过这个PR添加了以下监控指标:
指标名称:repl_connect_status
值范围:{no_connect, try_to_incr_sync, try_to_full_sync, syncing_full, connecting, connected, error}

关于该值如何使用:请见 Disscussion #2689

@github-actions github-actions bot added ☢️ Bug Something isn't working ✏️ Feature New feature or request labels May 17, 2024
@cheniujh cheniujh added 3.5.4 4.0.0 and removed ☢️ Bug Something isn't working labels May 17, 2024
@cheniujh cheniujh added 3.5.5 and removed 3.5.4 labels May 17, 2024
@baerwang baerwang merged commit af3be67 into OpenAtomFoundation:unstable May 17, 2024
21 checks passed
chenbt-hz pushed a commit to chenbt-hz/pika that referenced this pull request Jun 3, 2024
bigdaronlee163 pushed a commit to bigdaronlee163/pika that referenced this pull request Jun 8, 2024
@cheniujh cheniujh deleted the add_info_metric_repl_connect_status branch June 24, 2024 03:21
cheniujh added a commit to cheniujh/pika that referenced this pull request Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.5.5 4.0.0 ✏️ Feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants