Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MAX_EXECUTION_TIME. #10541

Merged
merged 21 commits into from
Jun 24, 2019

Conversation

db-storage
Copy link
Contributor

@db-storage db-storage commented May 20, 2019

  • This feature is request by Feature 7008.
  • Add support for system variable "session.max_execution_time" and "global.max_execution_time".
  • Support hint in Select Statement linke: "SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM t1".
  • Add some unit test for the above(all tests passed).

What problem does this PR solve?

Fixes #7008

What is changed and how it works?

  • If the max_execution_time is valid for select statement(by system variable "session.max_execution_time" or "global.max_execution_time" or hint), then check if max_execution_time has been exceeded each time before getting new chunk, and return error if it happens.

Check List

Tests

  • Unit test

With ingested delay, in order to make query time exceeds max_execution_time.

  • Integration test

`mysql> select /+ MAX_EXECUTION_TIME(100)/ * from litong where SLEEP(2) and exists (select * from test3 where test3.num>=litong.num);
ERROR 1907 (HY000): Query execution was interrupted, max_execution_time exceeded.
mysql> select * from litong where SLEEP(2);
Empty set (4.00 sec)

mysql> select * from litong;
+-----+------+
| id | num |
+-----+------+
| 100 | 28 |
| 101 | 28 |
+-----+------+
2 rows in set (0.00 sec)`

Code changes

  • Has interface methods change

Side effects

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to be included in the release note

@CLAassistant
Copy link

CLAassistant commented May 20, 2019

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented May 20, 2019

Codecov Report

Merging #10541 into master will decrease coverage by 0.1956%.
The diff coverage is 89.8305%.

@@               Coverage Diff                @@
##             master     #10541        +/-   ##
================================================
- Coverage   81.3393%   81.1436%   -0.1957%     
================================================
  Files           419        419                
  Lines         90002      89153       -849     
================================================
- Hits          73207      72342       -865     
- Misses        11546      11563        +17     
+ Partials       5249       5248         -1

@shenli shenli added the contribution This PR is from a community contributor. label May 20, 2019
@jackysp jackysp added the sig/execution SIG execution label May 20, 2019
server/conn.go Outdated
//MaxExecDuration() return 0 if there is no limit
if rs.MaxExecDuration().Nanoseconds() > 0 &&
time.Now().After(rs.StartExecTime().Add(rs.MaxExecDuration())) {
return errors.New("Query execution was interrupted, max_execution_time exceeded")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an standard error code for this type of error in MySQL?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the execution is blocked at line 1311 for a long time, it may exceed the maxExecDuration.

Copy link
Contributor Author

@db-storage db-storage May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an standard error code for this type of error in MySQL?
Error code 1907 added to parser, refer to PR

Copy link
Contributor Author

@db-storage db-storage May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the execution is blocked at line 1311 for a long time, it may exceed the maxExecDuration.
Fixed by closing result set and context on timeout.

Copy link
Member

@jackysp jackysp May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use this predefined error in parser repo,

OK, I will add and use the error code.
By the way, should all published Mysql error codes be imported in batch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, it will be defined when it is used.

@qw4990 qw4990 self-requested a review May 21, 2019 05:32
@breezewish
Copy link
Member

I think this should also be pushed down to TiKV to support per-statement max execution time limit.

server/conn.go Outdated
//MaxExecDuration() return 0 if there is no limit
if rs.MaxExecDuration().Nanoseconds() > 0 &&
time.Now().After(rs.StartExecTime().Add(rs.MaxExecDuration())) {
return errors.New("Query execution was interrupted, max_execution_time exceeded")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this cannot work for cases like HashAgg which consumes all of its child's input and finally returns result?

server/conn.go Outdated Show resolved Hide resolved
@qw4990
Copy link
Contributor

qw4990 commented May 22, 2019

I think this should also be pushed down to TiKV to support per-statement max execution time limit.

We can do this later after finishing MaxExecTime in TiDB.

@db-storage
Copy link
Contributor Author

db-storage commented May 23, 2019

code changed per review comments, please review again.

@qw4990
Copy link
Contributor

qw4990 commented May 29, 2019

@db-storage please fix CI problems and resolve conflicts.

@jackysp
Copy link
Member

jackysp commented May 29, 2019

So many unrelated changes in this PR, maybe need a squash.

@db-storage
Copy link
Contributor Author

@db-storage please fix CI problems and resolve conflicts.

The last committed code depends on Error Code defined in parser, refer to PR.

I will resolve conflicts and do a squash.

@db-storage db-storage force-pushed the support_max_excution_time branch 3 times, most recently from b9f7706 to 4f272e8 Compare May 29, 2019 07:29
@tiancaiamao
Copy link
Contributor

This commit use context cancel to implement MAX_EXECUTION_TIME.

Cancel context is the way we used to implement kill xxx in the past, it has some drawbacks:

  • all executor must handle the cancel signal... if some executor don't obey the rule, cancel can't work
  • handle cancel signal makes the code complex
  • as cancel makes the code complex... it's more likely to cause goroutine leak
  • cancel will not kill 2PC (known issue)

In commit #9844, we turn to use executor.Close
I think it's better to keep just one implementation, we should clean up the cancel way later.
Caller should use Kill() to terminate the executation of a session

In this PR #10350, there is a background goroutine that monitoring all slow queries, maybe it's better to wait the PR to be merged and implement MAX_EXECUTION_TIME after that.

@db-storage
Copy link
Contributor Author

db-storage commented May 29, 2019

This commit use context cancel to implement MAX_EXECUTION_TIME.

Cancel context is the way we used to implement kill xxx in the past, it has some drawbacks:

  • all executor must handle the cancel signal... if some executor don't obey the rule, cancel can't work
  • handle cancel signal makes the code complex
  • as cancel makes the code complex... it's more likely to cause goroutine leak
  • cancel will not kill 2PC (known issue)

recordSet.Close() is called too when the execution is timed out.
The MAX_EXECUTION_TIME hint is permitted only for SELECT statements. Refer to mysql doc. So there should be no 2PC involved.

In commit #9844, we turn to use executor.Close
I think it's better to keep just one implementation, we should clean up the cancel way later.
Caller should use Kill() to terminate the executation of a session

I can't find executor.Close in commit #9844,did you mean 'resultSet.Close'?

In this PR #10350, there is a background goroutine that monitoring all slow queries, maybe it's better to wait the PR to be merged and implement MAX_EXECUTION_TIME after that.

In order to use the same go routine? when will PR #10350 be merged?

@tiancaiamao
Copy link
Contributor

I can't find executor.Close in commit #9844,did you mean 'resultSet.Close'?

Yes, 'resultSet.Close'

In order to use the same go routine? when will PR #10350 be merged?

In order to make the code easier to maintain ... add a monitoring goroutine is a more general solution @XuHuaiyu will take charge of that PR and I guess it won't be too long get merged

@db-storage db-storage force-pushed the support_max_excution_time branch 2 times, most recently from a1f7962 to 2f9d152 Compare May 29, 2019 12:01
@shenli
Copy link
Member

shenli commented Jun 9, 2019

@db-storage Please resolve the conflicts.

- Use existing slow query handler to monitor registered queries;
- Don't use resultSet's CancelFunc, use Close only;
- Add error errMaxExecTimeExceeded;
- Modify writeChunk, don't send EOF if execution timed out.
@db-storage
Copy link
Contributor Author

@db-storage Please resolve the conflicts.

Conflicts solved.
Rebased code with PR #10350 merged.
Use the same handler to monitor queries with max_execution_time(by hint of system variable).
In addition to the original UT cases, added test from mysql client.

`mysql> select /+ MAX_EXECUTION_TIME(100)/ * from litong where SLEEP(2) and exists (select * from test3 where test3.num>=litong.num);
ERROR 1907 (HY000): Query execution was interrupted, max_execution_time exceeded.
mysql> select * from litong where SLEEP(2);
Empty set (4.00 sec)

mysql> select * from litong;
+-----+------+
| id | num |
+-----+------+
| 100 | 28 |
| 101 | 28 |
+-----+------+
2 rows in set (0.00 sec)`

@tiancaiamao
Copy link
Contributor

PTAL @qw4990 @XuHuaiyu

@@ -334,7 +334,8 @@ const (
DefTiDBUseFastAnalyze = false
DefTiDBSkipIsolationLevelCheck = false
DefTiDBWaitTableSplitFinish = false
DefTiDBExpensiveQueryTimeThreshold = 60 // 60s
DefTiDBExpensiveQueryTimeThreshold = 100 // 100 ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100ms is too small as the default value for ExpensiveQueryTimeThreshold, it makes TiDB print a log of logs.
I think 60s may be a better default value here.

And note that the check interval of the ticker is not necessary to be ExpensiveQueryTimeThreshold / 2 because the check interval value is shared with MAX_EXECUTION_TIME check interval, maybe 100ms is a good choice.

It means the MAX_EXECUTION_TIME will be less accurate if the check interval is 100ms. That's acceptable as the code is simple enough in my opinion. We can polish the code after this feature is merged, adopting a dynamic check interval.

…rval in ExpensiveQueryTimeThreshold; 'make all' passed
@tiancaiamao
Copy link
Contributor

LGTM

PTAL @XuHuaiyu @qw4990

@tiancaiamao tiancaiamao added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 22, 2019
@@ -682,6 +686,12 @@ func (s *SessionVars) SetSystemVar(name string, val string) error {
if isAutocommit {
s.SetStatusFlag(mysql.ServerStatusInTrans, false)
}
case MaxExecutionTime:
timeoutMS := tidbOptInt64(val, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about use tidbOptPositiveInt32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520
Copy link
Contributor

/run-all-tests

@crazycs520 crazycs520 added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 24, 2019
@crazycs520
Copy link
Contributor

/run-all-tests

@coderplay
Copy link
Contributor

I think this should also be pushed down to TiKV to support per-statement max execution time limit.

We can do this later after finishing MaxExecTime in TiDB.

Did we? @zhangjinpeng1987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. sig/execution SIG execution status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Support MAX_EXECUTION_TIME