Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(collector): add statistics for partition hotspot #444

Merged
merged 165 commits into from
Feb 17, 2020

Conversation

Smityz
Copy link
Contributor

@Smityz Smityz commented Dec 16, 2019

Add statistics for partition hotspot

1 Background

At present, Pegasus lists the monitoring values for hotspot detection primitively, using manual methods for fault detection and troubleshooting, which is cumbersome for operation and maintenance. In order to make the data more visually presented, we have added a hotspot detection module, which scores relevant values of hotspot issues and feeds them back to falcon, reducing the burden on workload of operation and maintenance.

The function of this module is to collect partition-level information in real time, and calculate the hotspot value of each partition according to the corresponding algorithm and feed it back to falcon. Through the change curve of the hotspot value of each partition, operation and maintenance personnel can more quickly infer the location of the hotspot.

At the same time, this work is also one of the solutions for hot data reading and writing problems.

2 Algorithm Framework

2.1 Relationship

In info_collector.cpp, each app's relative data will be collected according to the configuration of _app_stat_interval_seconds (the default interval is 10s).

2.2 Data Structure

2.2.1 Database Level

Using map<string, hotspot_calculator *> _hotspot_calculator_store holds all the hotpot_calculator pointers of the application, and uses the corresponding algorithm and data to perform operations when traversing the app.

2.2.2 App Level

For each app, there is a hotpot_calculator:

class hotspot_calculator
{
private:
    const std::string _app_name;
    std::vector<::dsn::perf_counter_wrapper> _points;
    std::queue<std::vector<hotspot_partition_data>> _app_data;
    std::unique_ptr<hotspot_policy> _policy;
    static const int kMaxQueueSize = 100;
};

_app_name is the name of the app, _policy is the algorithm we select to use, _points is the hotspot point we calculate and feed back to perf_counter. _app_data saves historical data of each partition in this app. And we use queue to save the space, which allows use saving 100 historical data.

2.2.3 Paritition Level

std::queue<std::vector<hotspot_partition_data>> _app_data;
vector is used to save the partitions' data of this app, hotspot_partition_data is used to save data of one partition

double total_qps;
double total_cu;

Currently our algorithm use these data of partition to calculate.

2.2.4 Class Diagram

Class

2.3 Demo

for (const auto &app_rows : all_rows) {
    // hotspot_calculator is to detect hotspots
    hotspot_calculator *hotspot_calculator =
        get_hotspot_calculator(app_rows.first, app_rows.second.size());
    hotspot_calculator->aggregate(app_rows.second);
    // new policy can be designed by strategy pattern in hotspot_partition_data.h
    hotspot_calculator->start_alg();
}

2.4 New perf-counter:

# `app_name` and `partition_index` depends on actual condition
app.pegasus*app.stat.hotspots@`app_name`.`partition_index`

acelyc111
acelyc111 previously approved these changes Feb 17, 2020
@acelyc111 acelyc111 merged commit 695b366 into apache:master Feb 17, 2020
@neverchanje neverchanje mentioned this pull request Mar 31, 2020
@neverchanje neverchanje added type/config-change Added or modified configuration that should be noted on release note of new version. and removed type/config-change Added or modified configuration that should be noted on release note of new version. labels Mar 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants