Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb log print "connection running loop panic [err="Trying to start aggressive locking while it's already started"] " when run tpcc and import into #53540

Closed
Lily2025 opened this issue May 24, 2024 · 3 comments · Fixed by #53698

Comments

@Lily2025
Copy link

Lily2025 commented May 24, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、run tpcc and import into
tidb.tar.gz

2. What did you expect to see? (Required)

no panic

3. What did you see instead (Required)

tidb log print “connection running loop panic

[2024/05/24 10:49:13.008 +08:00] [ERROR] [conn.go:1013] ["connection running loop panic"] [conn=2594177614] [session_alias=] [lastSQL=ROLLBACK] [err="Trying to start aggressive locking while it's already started"] [stack="github.com/pingcap/tidb/pkg/server.(*clientConn).Run.func1
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1016
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:914
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec.func1
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:452
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:914
github.com/tikv/client-go/v2/txnkv/transaction.(*KVTxn).StartAggressiveLocking
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/txn.go:919
github.com/pingcap/tidb/pkg/store/driver/txn.(*tikvTxn).StartFairLocking
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/store/driver/txn/txn_driver.go:422
github.com/pingcap/tidb/pkg/session.(*LazyTxn).StartFairLocking
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/txn.go:495
github.com/pingcap/tidb/pkg/sessiontxn/isolation.(*basePessimisticTxnContextProvider).OnPessimisticStmtStart
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/sessiontxn/isolation/base.go:683
github.com/pingcap/tidb/pkg/session.(*txnManager).OnPessimisticStmtStart
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/txnmanager.go:228
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handlePessimisticDML
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:998
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelay
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:791
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:559
github.com/pingcap/tidb/pkg/session.runStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2285
github.com/pingcap/tidb/pkg/session.(*session).ExecuteStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2146
github.com/pingcap/tidb/pkg/server.(*TiDBContext).ExecuteStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/driver_tidb.go:294
github.com/pingcap/tidb/pkg/server.(*clientConn).handleStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2021
github.com/pingcap/tidb/pkg/server.(*clientConn).handleQuery
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1774
github.com/pingcap/tidb/pkg/server.(*clientConn).dispatch
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1348
github.com/pingcap/tidb/pkg/server.(*clientConn).Run
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1114
github.com/pingcap/tidb/pkg/server.(*Server).onConn
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:739"]
[2024/05/24 10:49:13.008 +08:00] [ERROR] [terror.go:324] ["encountered error"] [error="trying to rollback transaction when aggressive locking is pending"] [stack="github.com/pingcap/tidb/pkg/parser/terror.Log
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/parser/terror/terror.go:324
github.com/pingcap/tidb/pkg/session.(*session).RollbackTxn
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:941
github.com/pingcap/tidb/pkg/session.(*session).Close
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2516
github.com/pingcap/tidb/pkg/server.(*TiDBContext).Close
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/driver_tidb.go:313
github.com/pingcap/tidb/pkg/server.closeConn.func1
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:400
sync.(*Once).doSlow
	/usr/local/go/src/sync/once.go:74
sync.(*Once).Do
	/usr/local/go/src/sync/once.go:65
github.com/pingcap/tidb/pkg/server.closeConn
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:380
github.com/pingcap/tidb/pkg/server.(*clientConn).Close
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:373
github.com/pingcap/tidb/pkg/server.(*clientConn).Run.func1
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1023
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:914
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec.func1
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:452
runtime.gopanic
	/usr/local/go/src/runtime/panic.go:914
github.com/tikv/client-go/v2/txnkv/transaction.(*KVTxn).StartAggressiveLocking
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/txn.go:919
github.com/pingcap/tidb/pkg/store/driver/txn.(*tikvTxn).StartFairLocking
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/store/driver/txn/txn_driver.go:422
github.com/pingcap/tidb/pkg/session.(*LazyTxn).StartFairLocking
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/txn.go:495
github.com/pingcap/tidb/pkg/sessiontxn/isolation.(*basePessimisticTxnContextProvider).OnPessimisticStmtStart
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/sessiontxn/isolation/base.go:683
github.com/pingcap/tidb/pkg/session.(*txnManager).OnPessimisticStmtStart
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/txnmanager.go:228
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handlePessimisticDML
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:998
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelay
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:791
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:559
github.com/pingcap/tidb/pkg/session.runStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2285
github.com/pingcap/tidb/pkg/session.(*session).ExecuteStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2146
github.com/pingcap/tidb/pkg/server.(*TiDBContext).ExecuteStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/driver_tidb.go:294
github.com/pingcap/tidb/pkg/server.(*clientConn).handleStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2021
github.com/pingcap/tidb/pkg/server.(*clientConn).handleQuery
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1774
github.com/pingcap/tidb/pkg/server.(*clientConn).dispatch
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1348
github.com/pingcap/tidb/pkg/server.(*clientConn).Run
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1114
github.com/pingcap/tidb/pkg/server.(*Server).onConn
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:739"]`

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v8.2.0-alpha
Edition: Community
Git Commit Hash: 3485857
Git Branch: heads/refs/tags/v8.2.0-alpha
UTC Build Time: 2024-05-23 11:45:56
GoVersion: go1.21.10
Race Enabled: false

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label May 24, 2024
@Lily2025
Copy link
Author

/assign MyonKeminta

@MyonKeminta
Copy link
Contributor

The panic follows an oom kill, which causes such a panic:

[2024/05/24 10:49:13.007 +08:00] [INFO] [conn.go:1151] ["command dispatched failed"] [conn=2594177614] [session_alias=] [connInfo="id:2594177614, addr:10.200.73.228:60090 status:11, collation:utf8mb4_general_ci, user:root"] [command=Execute] [status="inTxn:1, autocommit:1"] [sql="SELECT no_o_id FROM new_order WHERE no_w_id = ? AND no_d_id = ? ORDER BY no_o_id ASC LIMIT 1 FOR UPDATE [arguments: (6, 1)]"] [txn_mode=PESSIMISTIC] [timestamp=449975144392425537] [err="[executor:8176]Your query has been cancelled due to exceeding the allowed memory limit for the tidb-server instance and this query is currently using the most memory. Please try narrowing your query scope or increase the tidb_server_memory_limit and try again.[conn=2594177614]
github.com/pingcap/errors.AddStack
	/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:178
github.com/pingcap/errors.(*Error).GenWithStackByArgs
	/go/pkg/mod/github.com/pingcap/[email protected]/normalize.go:175
github.com/pingcap/tidb/pkg/util/sqlkiller.(*SQLKiller).HandleSignal
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/sqlkiller/sqlkiller.go:74
github.com/pingcap/tidb/pkg/util/memory.(*Tracker).Consume
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/memory/tracker.go:457
github.com/pingcap/tidb/pkg/util/memory.(*Tracker).ReplaceBytesUsed
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/memory/tracker.go:745
github.com/pingcap/tidb/pkg/session.(*session).SetMemoryFootprintChangeHook.func1
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:4179
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).SetMemoryFootprintChangeHook.func1
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:881
github.com/tikv/client-go/v2/internal/unionstore.(*memdbArena).onMemChange
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb_arena.go:141
github.com/tikv/client-go/v2/internal/unionstore.(*nodeAllocator).allocNode
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb_arena.go:263
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).allocNode
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:799
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).traverse
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:420
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).set
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:345
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).UpdateFlags
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:258
github.com/tikv/client-go/v2/txnkv/transaction.(*KVTxn).DoneAggressiveLocking
	/go/pkg/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/txn.go:1000
github.com/pingcap/tidb/pkg/store/driver/txn.(*tikvTxn).DoneFairLocking
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/store/driver/txn/txn_driver.go:440
github.com/pingcap/tidb/pkg/session.(*LazyTxn).DoneFairLocking
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/txn.go:538
github.com/pingcap/tidb/pkg/sessiontxn/isolation.(*basePessimisticTxnContextProvider).OnPessimisticStmtEnd
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/sessiontxn/isolation/base.go:698
github.com/pingcap/tidb/pkg/session.(*txnManager).OnPessimisticStmtEnd
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/txnmanager.go:237
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handlePessimisticSelectForUpdate.func1
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:879
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).handlePessimisticSelectForUpdate
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:903
github.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:555
github.com/pingcap/tidb/pkg/session.runStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2285
github.com/pingcap/tidb/pkg/session.(*session).ExecuteStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2146
github.com/pingcap/tidb/pkg/server.(*TiDBContext).ExecuteStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/driver_tidb.go:294
github.com/pingcap/tidb/pkg/server.(*clientConn).executePreparedStmtAndWriteResult
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn_stmt.go:306
github.com/pingcap/tidb/pkg/server.(*clientConn).executePlanCacheStmt
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn_stmt.go:234
github.com/pingcap/tidb/pkg/server.(*clientConn).handleStmtExecute
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn_stmt.go:225
github.com/pingcap/tidb/pkg/server.(*clientConn).dispatch
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1375
github.com/pingcap/tidb/pkg/server.(*clientConn).Run
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1114
github.com/pingcap/tidb/pkg/server.(*Server).onConn
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:739
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
SELECT no_o_id FROM new_order WHERE no_w_id = ? AND no_d_id = ? ORDER BY no_o_id ASC LIMIT 1 FOR UPDATE [arguments: (6, 1)]"]

It breaks a DoneFairLocking procedure which is called in a defer block, caused fair locking state not succesfully exited after the statement. Then for some reason the transaction is still valid and the client tries to start the next statement (which I'm not sure if it's expected). So it caused the panic in this issue when the next statement tries to enter fair locking mode.

@MyonKeminta
Copy link
Contributor

The root cause is that:

  • Panicking in membuffer's internal allocation operation is not expected by some code of the transaciton layer and then lead to unexpected inconsistency state of the KVTxn object in memory
  • After killing a statement for memory exceeding limit, the transaction state is still left valid without rolling back so that user can still operate on the inconsistent state of the transaction.

My thought is:

  • We are using golang's panic like exeptions in some other languages by expecting the program state being still valid after a panic in internal operation.
  • The issue reveals the potential problem that the transaction state can still be used after some kinds of panicks, not only for DoneAggressiveLocking, but perhaps mroe.
  • It's hard to find a clean enough solution to avoid all kinds of the problem without changing this design.

Being unable to make decision about how to solve the problem completely after some discussion, I decide to first give a quick fix to this issue itself. Considering that in client-go:

  • Aggressive Locking (old name of Fair Locking) state is determined by whether the aggressiveLockingContext is nil
  • Any incomplete content written in this statement to the membuffer should be reverted after StmtRollback, which happens later than the statement calling DoneAggressiveLocking.

So the quick fix is simply unsetting aggressiveLockingContext field in a defer block in DoneAggressiveLocking, instead of at the last of the function (tikv/client-go#1355).

If this is approved and merged, maybe this issue can be closed and another issue should be filed to track further improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants