Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat: add perf-counter for backup request #419

Merged
merged 7 commits into from
Mar 19, 2020

Conversation

levy5307
Copy link
Contributor

@levy5307 levy5307 commented Mar 18, 2020

Add a perf-counter to statistics the qps of backup request for each app.

Related issue: apache/incubator-pegasus#251

@@ -166,6 +170,8 @@ void replica::on_client_read(dsn::message_ex *request)
response_client_read(request, ERR_INVALID_STATE);
return;
}
} else {
_counter_table_level_backup_request_qps->increment();
Copy link
Contributor

@foreverneverer foreverneverer Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't (or can't) distinguish request types(get, multi_get..)?

Copy link
Contributor Author

@levy5307 levy5307 Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to distinguish get/multi_get, we should add the perf-counter in pegasus, not rdsn. But in pegasus, we can't get whether the request is backup request or not.
There is still one way to implement it, just like the way table level latency use. I think it is too complicated.

Copy link
Contributor

@foreverneverer foreverneverer Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if can implement, we should consider whether to distinguish request types, and I think it is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is necessary? I think what we need is the total qps of backup request(which means the added server load), but the request type is not very important.

@neverchanje
Copy link
Contributor

neverchanje commented Mar 19, 2020

This new counter is still not enough. If the purpose is to prevent backup_request_delay_ms configured too low to actually reduce the tail latency, we need to observe the "additional load" at table-level.

So on collector, we can aggregate the backup request QPS of all servers. This QPS can be compared with the read-QPS on primary, to calculate the "backup request ratio". If the ratio is larger than 50%, we can suggest the user reduce the delay time.

For now, the collector collects metrics only on primaries. We still need some works on the collector.

@levy5307
Copy link
Contributor Author

This new counter is still not enough. If the purpose is to prevent backup_request_delay_ms configured too low to actually reduce the tail latency, we need to observe the "additional load" at table-level.

So on collector, we can aggregate the backup request QPS of all servers. This QPS can be compared with the read-QPS on primary, to calculate the "backup request ratio". If the ratio is larger than 50%, we can suggest the user reduce the delay time.

For now, the collector collects metrics only on primaries. We still need some works on the collector.

Can falcon do these work?

@neverchanje
Copy link
Contributor

@levy5307 No such method currently.

@levy5307 levy5307 merged commit 2ed6d36 into XiaoMi:master Mar 19, 2020
@neverchanje neverchanje added the type/perf-counter PR that made modification on perf-counter, which should be noted in release note. label May 14, 2020
@levy5307 levy5307 deleted the backup-request-perf-counter branch May 26, 2020 06:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
2.0.0 type/perf-counter PR that made modification on perf-counter, which should be noted in release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants