Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler pending commands is too much and Tikv server timeout #9283

Closed
Hanyu2015 opened this issue Feb 12, 2019 · 23 comments
Closed

scheduler pending commands is too much and Tikv server timeout #9283

Hanyu2015 opened this issue Feb 12, 2019 · 23 comments
Labels
type/question The issue belongs to a question. type/stale This issue has not been updated for a long time.

Comments

@Hanyu2015
Copy link

Only one tikv's scheduler pending commands is too much and it's growing continuously, and then data cannot be inserted into database. After I kill this tikv process, the same thing happened on another tikv. When I execute 'select ... from ..', for some table, there is error 'Tikv server timeout', but the other table doesn't.

@Hanyu2015
Copy link
Author

There is also one scheduer worker cpu 100% continuously for the tikv .

@eurekaka
Copy link
Contributor

eurekaka commented Feb 12, 2019

@Hanyu2015 thanks for the report, could you please paste the monitor metrics of the busy tikv server? also, please describe the topology layout of your cluster.

@eurekaka eurekaka added the type/question The issue belongs to a question. label Feb 12, 2019
@Hanyu2015
Copy link
Author

3 pd, 3 tidb and 18 tikv. which monitor metrics do you need? @eurekaka

@eurekaka
Copy link
Contributor

@Hanyu2015 Could you please export your grafana board of TiKV to a PDF and attach it here so we can check if it is a problem of hot regions.

@eurekaka
Copy link
Contributor

@hicqu please take a look at this problem.

@Hanyu2015
Copy link
Author

@eurekaka
tikv.pdf

@eurekaka
Copy link
Contributor

@hicqu FYI

@Hanyu2015
Copy link
Author

I find that sql cannot run on the table sessions who causes this issue, and the error log is 'ERROR 9002 (HY000): TIKV server timeout[try again later]'. The tikv in which scheduler pending commands is too much and growing continuously has one scheduler worker cpu 100% continuously, but other 3 scheduler worker cpu is idle. I think in this tikv much command waits the busy cpu instand of running on the idle cpu. Also In tidb.log, there is error log like 'send tikv request error: rpc error: code = DeadlineExceeded desc = context deadline exceeded'.

@Hanyu2015
Copy link
Author

1400248308
I think I find out what leads to this problem, in this tidb log, the conflictTS is an unreasonable value. After I rebuild the table whose tableid is 346, the problem can be resolved. I think this is a bug. @eurekaka

@eurekaka
Copy link
Contributor

@Hanyu2015

  • how do you know the conflictTS is unreasonable?
  • by rebuild the table, you mean drop the table and recreate it?

@jackysp PTAL

@Hanyu2015
Copy link
Author

In the tidb log I upload, conflictTS=18446744073709551615, which is far greater than startTS.
Yes, after drop the table and recreate it.
@eurekaka

@jackysp
Copy link
Member

jackysp commented Feb 28, 2019

Thanks for your report, @Hanyu2015 !
PTAL @disksing

@shenli
Copy link
Member

shenli commented Mar 1, 2019

@Hanyu2015 Could you show me the result of select tidb_version()?

@Hanyu2015
Copy link
Author

v2.1.2-1-g8ba8096 @shenli

@shenli
Copy link
Member

shenli commented Mar 1, 2019

Could you send us the log of all the tidb-servers? We want to see the logs between 2019-02-28 07:00 ~ 08:00.

@jackysp
Copy link
Member

jackysp commented Mar 1, 2019

Hi @Hanyu2015 ! There is one thing here to confirm, do you only use SQL to read and write data, and do not bypass TiDB? That is, there is no direct use of TiKV's kv api to read and write.

@Hanyu2015
Copy link
Author

only SQL used. @jackysp

@Hanyu2015
Copy link
Author

@jackysp
Copy link
Member

jackysp commented Mar 1, 2019

Many thanks @Hanyu2015 !
Seems the uploading of the log is failed.

@Hanyu2015
Copy link
Author

Hanyu2015 commented Mar 1, 2019

tidblog.zip
@jackysp

@jackysp
Copy link
Member

jackysp commented Mar 1, 2019

Got it!
Thanks!

@lysu
Copy link
Contributor

lysu commented Mar 1, 2019

@Hanyu2015 hi, current log file seems be generated by grep '2019-02-28 07:00', and it lost some stack log after 'dispatch error:' that without date time printed, could you send use result grep -A90 '2019-02-28 07:00' tidb2.log again~ thanks~

@jebter
Copy link

jebter commented Jun 5, 2024

The issue has not been updated for too long, so I will close it. If there are any updates, you can reopen it.

@jebter jebter closed this as completed Jun 5, 2024
@jebter jebter added the type/stale This issue has not been updated for a long time. label Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question The issue belongs to a question. type/stale This issue has not been updated for a long time.
Projects
None yet
Development

No branches or pull requests

6 participants