Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB ERROR: unexpected resolve err: retryable:\"Txn(Mvcc(TxnLockNotFound... #26404

Closed
Tammyxia opened this issue Jul 21, 2021 · 3 comments
Closed
Assignees
Labels
severity/moderate sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.

Comments

@Tammyxia
Copy link

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  • Prepare workload in TiDB cluster: bin/go-ycsb load mysql -P ./workloads/oncall2929 -p mysql.host=172.16.6.24 -p mysql.port=4000 -p operationcount=5000000 -p autocommit=1 --threads=200
  • Run workload to emulate one of our customer: bin/go-ycsb run mysql -P ./workloads/oncall2929 -p mysql.host=172.16.6.24 -p mysql.port=4000 -p operationcount=5000000 -p droppartitioninterval=2400 --threads=200 ---> this issue can 100% reproduced in this step.
  • Workload is "insert ...on duplicate key" & "drop/create partition".

2. What did you expect to see? (Required)

No any error.

3. What did you see instead (Required)

  • Many repeated error about " unexpected resolve err: retryable:"Txn(Mvcc(TxnLockNotFound xxx" in tidb log.

image

  • If decreasing --thread=100, no any error.

4. What is your TiDB version? (Required)

Release Version: v4.0.14
Edition: Community
Git Commit Hash: a3baab4
Git Branch: heads/refs/tags/v4.0.14
UTC Build Time: 2021-07-15 06:54:45
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |

@Tammyxia Tammyxia added type/bug The issue is confirmed as a bug. severity/major labels Jul 21, 2021
@jebter jebter added the sig/transaction SIG:Transaction label Jul 21, 2021
@cfzjywxk
Copy link
Contributor

The error log in this test scenario is triggered by the following conditions.

  • txn1 is a pessimistic transaction.
  • txn1 executes a pessimistic query which conflicts with some other queries.
  • pessimistic retry happend and the to be locked keys changed, which means there could be different keys to be pessimistically locked.
  • the async pessimistic rollback is triggered, but it's async, so it could be ongoing, before it finishes, the unnecessary pessimistic locks are left.
  • the txn1 keeps processing and finishes committing the primary key lock, then the transaction is considered committed.
  • the secondary prewrite locks of txn1 are committed asynchronously.
  • txn2 conflict with the prewrite lock of txn1 and it tries to resolve the prewrite lock, after checking txn1's status, txn2 use resolve commit to process the prewrite lock of txn1.
  • the resolve requests are not using lite mode, which means it will scan the whole region looking for the locks of txn1
  • the left pessimistic lock is fetched by the resolve scan locks for a whole region.
  • the async pessimistic rollback is finished, so the left pessimistic lock is unlocked.
  • the resolve process continues, as the txn1 is committed already, so it tries to commit the left pessimistic lock.
  • as the async pessimistic rollback unlock it before the resolve commit for this pessimistic lock keys, error will be reported
  • the resolve response returns error

As in the logs, we could see many logs
like

[2021/07/21 10:57:15.346 +08:00] [WARN] [txn.rs:696] ["commit a pessimistic lock with Lock type"] [commit_ts=426465357906837536] [start_ts=426465357893730346] [key=7480000000000004FF3B5F72800000006FFF0D79AD0000000000FA]
[2021/07/21 11:17:02.885 +08:00] [INFO] [txn.rs:717] ["txn conflict (lock not found)"] [commit_ts=426465669255266307] [start_ts=426465668246012000] [key=7480000000000004FF425F698000000000FF0000010400000000FF744EDB910419A9F6FF0000000000000000FC]

That means some of the left pessimistic locks are committed first but some of them are pessimistically rolled back first.

It's needed to improve the logs, only print ERROR logs if the failure keys are not pessimistic type, and the resolve process could continue if the failure key type is pessimistc type.

@youjiali1995
Copy link
Contributor

Fixed by tikv/tikv#10652.

@ti-srebot
Copy link
Contributor

Please edit this comment or add a new comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/moderate sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

5 participants