-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: add HTTP API to generate TiDB metric profile (#18272) #18531
*: add HTTP API to generate TiDB metric profile (#18272) #18531
Conversation
Signed-off-by: ti-srebot <[email protected]>
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
Sorry @crazycs520, you don't have permission to trigger auto merge event on this branch. |
/run-all-tests |
/rebuild |
/run-sqllogic-test-1 |
/merge |
Your auto merge job has been accepted, waiting for:
|
/run-all-tests |
@ti-srebot merge failed. |
/merge |
Your auto merge job has been accepted, waiting for:
|
/run-all-tests |
@ti-srebot merge failed. |
/merge |
Your auto merge job has been accepted, waiting for:
|
/run-all-tests |
/run-all-tests |
@ti-srebot merge failed. |
/merge |
/run-all-tests |
@ti-srebot merge failed. |
cherry-pick #18272 to release-4.0
What problem does this PR solve?
TiDB has thousands of metrics, it is hard to know the related metric for users. This PR try to generate a metric profile SVG to describe the total time consumption of related metric in a specified time range.
Below is an example:
The
start
andend
use to specify the time range, the time format is RFC3339(2006-01-02T15:04:05Z07:00
).The profile.svg look like as below:
From the upper profile SVG, it's easy to see the time consumption ratio of each metric and the relationship between metrics.
Explain
This profile style, color, and format are very similar to the golang pprof profile.
Node
tidb_execute
: it was the metric name,tidb_execute
means the consuming time that spends on TiDB execution.20924.55s (30.51%) of 64938.09s (94.68%)
:20924.55s(30.51%)
meanstidb_execute
it self consumes20924.55
seconds, take up 30.51% of the total time.64938.09s (94.68%)
meanstidb_execute
itself and all children consume64938.09
seconds, take up 94.68% of the total time. So, all the children oftidb_execution
consume64938.09-20924.55
=44013.54
seconds.Maybe we can add more information about the metric in this node, such as
avg_P99
,avg_P80
,total_count
...Edge
The edge from
tidb_txn_cmd
totidb_txn_cmd.get
has comment with17311.60s
, it means17311.60
seconds was consumed after this edge.You must found that edges such as the edge from
tidb_txn_cmd
totidb_kv_request
, the edge line isdotted
, and there is no comment to indicate how much time was consume by its child. This is because the child such astidb_kv_request
may have multiple parents, I only know the total time consumed bytidb_kv_request
, I don't know the time consumed that after the dotted edge.Attention
Currently, the total time was the
tidb_query
, so the total time oftidb_query
and its children consumed was 100%.Sometimes, you may found the nodes such as
tidb_kv_request
, itself and its children consume time more than thetidb_query
total time. this is because 1tidb_query
may have many kv requests, and those kv requests may be executed concurrently.For some metric that the total value was less than 0.01% of the total time, the metric will not display in the profile SVG.
Example in different workload
What is changed and how it works?
Each node in the profile has a corresponding metric, such as the metric of
tidb_query
istidb_server_handle_query_duration_seconds
.Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests
Side effects
Release note