Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: add runtime information for point-get executor #18666

Merged
merged 27 commits into from
Jul 28, 2020

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Jul 17, 2020

What problem does this PR solve?

As the title said.

Here is some example of explain analyze select * from t where a=1;

Normally

+-------------+---------+---------+------+------------------------+----------------------------------------------------------+---------------+--------+------+
| id          | estRows | actRows | task | access object          | execution info                                           | operator info | memory | disk |
+-------------+---------+---------+------+------------------------+----------------------------------------------------------+---------------+--------+------+
| Point_Get_1 | 1.00    | 1       | root | table:t, index:idx1(a) | time:699.459µs, loops:2, Get_rpc:{num:2, time:662.459µs} |               | N/A    | N/A  |
+-------------+---------+---------+------+------------------------+----------------------------------------------------------+---------------+--------+------+

when point-get meet lock:

+-------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+---------------+--------+------+
| id          | estRows | actRows | task | access object          | execution info                                                                                                                                    | operator info | memory | disk |
+-------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+---------------+--------+------+
| Point_Get_1 | 1.00    | 1       | root | table:t, index:idx1(a) | time:4.520115944s, loops:2, Get_rpc:{num:3, time:666.149µs},ResolveLock_rpc:{num:1, time:4.519421685s},txnNotFound_backoff:{num:16, time:4510 ms} |               | N/A    | N/A  |
+-------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+---------------+--------+------+

when tikv restart:

+-------------+---------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+---------------+--------+------+
| id          | estRows | actRows | task | access object          | execution info                                                                                                                                  | operator info | memory | disk |
+-------------+---------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+---------------+--------+------+
| Point_Get_1 | 1.00    | 1       | root | table:t, index:idx1(a) | time:9.051671089s, loops:2, Get_rpc:{num:11, time:8.020006147s},tikvRPC_backoff:{num:9, time:7998 ms},regionMiss_backoff:{num:10, time:1012 ms} |               | N/A    | N/A  |
+-------------+---------+---------+------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+---------------+--------+------+

Of course, the slow-log will also record the point-get runtime information too:

select plan from information_schema.`SLOW_QUERY` where time > "2020-07-17 14:20:00" and query like "select% t%" order by query_time desc limit 1\G
***************************[ 1. row ]***************************
plan |  id              task    estRows operator info           actRows execution info                                                                                                                                  memory  disk
        Point_Get_1     root    1       table:t, index:idx1(a)  1       time:5.233083269s, loops:2, Get_rpc:{num:9, time:4.957755414s},regionMiss_backoff:{num:8, time:256 ms},tikvRPC_backoff:{num:7, time:4944 ms}    N/A     N/A

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Side effects

  • Performance regression
    • Consumes more CPU

Release note

  • Add executor runtime information for point-get.

@crazycs520 crazycs520 added the sig/execution SIG execution label Jul 17, 2020
@crazycs520 crazycs520 requested a review from a team as a code owner July 17, 2020 07:03
@crazycs520 crazycs520 requested review from SunRunAway and removed request for a team July 17, 2020 07:03
Signed-off-by: crazycs520 <[email protected]>
@crazycs520 crazycs520 changed the title executor: add executor runtime information for point-get executor: add runtime information for point-get executor Jul 17, 2020
@codecov
Copy link

codecov bot commented Jul 17, 2020

Codecov Report

Merging #18666 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18666   +/-   ##
===========================================
  Coverage   79.3677%   79.3677%           
===========================================
  Files           544        544           
  Lines        148229     148229           
===========================================
  Hits         117646     117646           
  Misses        21192      21192           
  Partials       9391       9391           

@zz-jason
Copy link
Member

Get_rpc:{num:3, time:666.149µs},ResolveLock_rpc:{num:1, time:4.519421685s}

What does time mean, is it the average RPC time?

@crazycs520
Copy link
Contributor Author

crazycs520 commented Jul 21, 2020

@zz-jason

Get_rpc:{num:3, time:666.149µs},ResolveLock_rpc:{num:1, time:4.519421685s}

time means the total time that spent on xxx_RPC.
In the upper example, it means the Get_rpc total_count is 3, and total time is 666.149us.
the ResolveLock_rpc total_count is 1, and total time is 4.519421685s.

@qw4990 qw4990 self-requested a review July 21, 2020 07:02
@zz-jason
Copy link
Member

@zz-jason

Get_rpc:{num:3, time:666.149µs},ResolveLock_rpc:{num:1, time:4.519421685s}

time means the total time that spent on xxx_RPC.
In the upper example, it means the Get_rpc total_count is 3, and total time is 666.149us.
the ResolveLock_rpc total_count is 1, and total time is 4.519421685s.

How about:

Get{num_rpc:3, total_time:666.149us}, ResolveLock:{num_rpc:1, total_time:4.519421685s}

@crazycs520
Copy link
Contributor Author

@zz-jason

How about:

Get{num_rpc:3, total_time:666.149us}, ResolveLock:{num_rpc:1, total_time:4.519421685s}

Great, look good to me.

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also please add some tests for this. Integration tests like integration_test.go can be easily added for this case.

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 27, 2020
@qw4990
Copy link
Contributor

qw4990 commented Jul 27, 2020

/merge

@ti-srebot
Copy link
Contributor

@qw4990 Oops! This PR requires at least 2 LGTMs to merge. The current number of LGTM is 1.

Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crazycs520 what about batch point-get executor?

@crazycs520
Copy link
Contributor Author

@lzmhhh123 I will add batch-point-get in the next PR, this PR is large enough.

Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 27, 2020
@lzmhhh123
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 27, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@crazycs520 merge failed.

@crazycs520
Copy link
Contributor Author

/rebuild

@crazycs520
Copy link
Contributor Author

/build

@crazycs520
Copy link
Contributor Author

/rebuild

@crazycs520
Copy link
Contributor Author

/rebuild

@crazycs520
Copy link
Contributor Author

/run-unit-test
/run-check_dev_2

@crazycs520 crazycs520 merged commit 8b19d67 into pingcap:master Jul 28, 2020
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Jul 28, 2020
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #18817

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants