metrics, observer: purge backend metrics when backend is down for too long #585

djshow832 · 2024-07-02T10:52:25Z

What problem does this PR solve?

Issue Number: close #582

Problem Summary:
The backend metric is never cleared after the backend is down. In auto-scaling workload, the backends change frequently and then the metrics keep growing.

What is changed and how it works:

Purge the backend metrics after it's down for over 2 hours
Remove the status label for BackendStatusGauge. BackendStatusGauge is not used in Grafana so it's fine.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Steps:

Update the threshold from 2 hours to 2 minutes and build TiProxy
Create a cluster with tiup playground v8.1.0 --db=2 --tiproxy=1 --tiproxy.version=v1.1.0 --tiflash=0
Run sysbench to create metrics
Scale-in the TiDB 127.0.0.1:4001
Check the metrics immediately and 127.0.0.1:4001 existed

curl -L 127.1:3080/metrics | grep -c 127.0.0.1:4001
204

Check the metrics after a few minutes and 127.0.0.1:4001 disappeared:

curl -L 127.1:3080/metrics | grep -c 127.0.0.1:4001
0

Check the Grafana, the metrics of 127.0.0.1:4001 existed when it was up but disappeared after it was down:

Notable changes

Has configuration change
Has HTTP API interfaces change
Has tiproxyctl change
Other user behavior changes

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

- Purge backend metrics when the backend is down for too long

codecov-commenter · 2024-07-02T10:57:29Z

Codecov Report

Attention: Patch coverage is 93.50649% with 5 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@1ace159). Learn more about missing BASE report.

Files	Patch %	Lines
pkg/balance/observer/backend_observer.go	70.58%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #585   +/-   ##
=======================================
  Coverage        ?   67.85%           
=======================================
  Files           ?       76           
  Lines           ?     6953           
  Branches        ?        0           
=======================================
  Hits            ?     4718           
  Misses          ?     1899           
  Partials        ?      336

Flag	Coverage Δ
unit	`67.85% <93.50%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ti-chi-bot · 2024-07-03T02:52:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xhebox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [xhebox]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2024-07-03T02:52:09Z

[LGTM Timeline notifier]

Timeline:

2024-07-03 02:52:09.788811941 +0000 UTC m=+1379256.274300774: ☑️ agreed by xhebox.

purge backend metrics

b689616

ti-chi-bot bot requested review from bb7133 and xhebox July 2, 2024 10:52

ti-chi-bot bot added the size/L label Jul 2, 2024

xhebox approved these changes Jul 3, 2024

View reviewed changes

ti-chi-bot bot added lgtm approved labels Jul 3, 2024

ti-chi-bot bot merged commit e4b7832 into pingcap:main Jul 3, 2024
5 checks passed

djshow832 deleted the del_label branch July 3, 2024 08:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics, observer: purge backend metrics when backend is down for too long #585

metrics, observer: purge backend metrics when backend is down for too long #585

djshow832 commented Jul 2, 2024 •

edited

Loading

codecov-commenter commented Jul 2, 2024

ti-chi-bot bot commented Jul 3, 2024

ti-chi-bot bot commented Jul 3, 2024

metrics, observer: purge backend metrics when backend is down for too long #585

metrics, observer: purge backend metrics when backend is down for too long #585

Conversation

djshow832 commented Jul 2, 2024 • edited Loading

What problem does this PR solve?

Check List

Release note

codecov-commenter commented Jul 2, 2024

Codecov Report

ti-chi-bot bot commented Jul 3, 2024

ti-chi-bot bot commented Jul 3, 2024

[LGTM Timeline notifier]

djshow832 commented Jul 2, 2024 •

edited

Loading