Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transaction: add hook for async commit to track the life cycle of the async-commit goroutine and secondary lock cleanup goroutine #1432

Merged
merged 2 commits into from
Sep 4, 2024

Conversation

YangKeao
Copy link
Member

This PR adds two hooks to execute before the start of async-commit goroutine and after the finish of async-commit goroutines.

It can be used to add some external logic to track the lifecycle of the async-commit goroutines. For example, in TiDB we'll need to wait for the background goroutines to finish before shutdown.

@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 23, 2024
@YangKeao YangKeao changed the title transaction: add hook for async commit to track the life cycle of an async-commit goroutine transaction: add hook for async commit to track the life cycle of an async-commit goroutine (WIP) Aug 23, 2024
@you06
Copy link
Contributor

you06 commented Aug 23, 2024

What about the non-async-commit 2PC, maybe such secondaries commit goroutines also need to be track in TiDB.

@YangKeao
Copy link
Member Author

YangKeao commented Aug 23, 2024

What about the non-async-commit 2PC, maybe such secondaries commit goroutines also need to be track in TiDB.

@you06

TiDB has a mechanism to wait for all connections to finish the current executing statement/transactions (ref (*Server).DrainClients, and (*Server).inShutdownMode). I think it can guarantee that the non-async-commit 2PC has committed (or rollbacked) before killing connections.

(Or did I have some misunderstandings about the transactions?)

@you06
Copy link
Contributor

you06 commented Aug 23, 2024

I took a look at (*Server).DrainClients, seems it will wait all the connections quit.

In common 2pc transactions, the txn commit success is responded to user once the primary key is committed, and the secondaries are committed in background(note the backoffer is created from store's context, it's not related to user connection's context).

// Already spawned a goroutine for async commit transaction.
if actionIsCommit && !actionCommit.retry && !c.isAsyncCommit() {
secondaryBo := retry.NewBackofferWithVars(c.store.Ctx(), CommitSecondaryMaxBackoff, c.txn.vars)
if c.store.IsClose() {
logutil.Logger(bo.GetCtx()).Warn("the store is closed",
zap.Uint64("startTS", c.startTS), zap.Uint64("commitTS", c.commitTS),
zap.Uint64("sessionID", c.sessionID))
return nil
}
c.store.WaitGroup().Add(1)
err = c.store.Go(func() {

If the store is killed before the secondaries are commited, the prewrite locks of secondaries will be leave, need to be cleaned by other read transactions or GC.

@YangKeao
Copy link
Member Author

YangKeao commented Aug 23, 2024

I took a look at (*Server).DrainClients, seems it will wait all the connections quit.

In common 2pc transactions, the txn commit success is responded to user once the primary key is committed, and the secondaries are committed in background(note the backoffer is created from store's context, it's not related to user connection's context).

// Already spawned a goroutine for async commit transaction.
if actionIsCommit && !actionCommit.retry && !c.isAsyncCommit() {
secondaryBo := retry.NewBackofferWithVars(c.store.Ctx(), CommitSecondaryMaxBackoff, c.txn.vars)
if c.store.IsClose() {
logutil.Logger(bo.GetCtx()).Warn("the store is closed",
zap.Uint64("startTS", c.startTS), zap.Uint64("commitTS", c.commitTS),
zap.Uint64("sessionID", c.sessionID))
return nil
}
c.store.WaitGroup().Add(1)
err = c.store.Go(func() {

If the store is killed before the secondaries are commited, the prewrite locks of secondaries will be leave, need to be cleaned by other read transactions or GC.

@you06 Oh! I got it. They'll also need to be tracked. I'll try to update this PR.

@YangKeao YangKeao force-pushed the add-hook-to-wait-for-async-commit branch from dd4a539 to 623f6a4 Compare August 23, 2024 08:13
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 23, 2024
@YangKeao YangKeao changed the title transaction: add hook for async commit to track the life cycle of an async-commit goroutine (WIP) transaction: add hook for async commit to track the life cycle of the async-commit goroutine and secondary key cleanup goroutine (WIP) Aug 23, 2024
@YangKeao YangKeao changed the title transaction: add hook for async commit to track the life cycle of the async-commit goroutine and secondary key cleanup goroutine (WIP) transaction: add hook for async commit to track the life cycle of the async-commit goroutine and secondary lock cleanup goroutine (WIP) Aug 23, 2024
@YangKeao YangKeao force-pushed the add-hook-to-wait-for-async-commit branch from 623f6a4 to d8e0b85 Compare August 26, 2024 07:02
@YangKeao YangKeao force-pushed the add-hook-to-wait-for-async-commit branch from d8e0b85 to 208243d Compare August 26, 2024 07:26
@YangKeao YangKeao changed the title transaction: add hook for async commit to track the life cycle of the async-commit goroutine and secondary lock cleanup goroutine (WIP) transaction: add hook for async commit to track the life cycle of the async-commit goroutine and secondary lock cleanup goroutine Aug 26, 2024
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 26, 2024
@YangKeao
Copy link
Member Author

/cc @you06

PTAL

@ti-chi-bot ti-chi-bot bot requested a review from you06 August 26, 2024 07:32
@YangKeao
Copy link
Member Author

TiDB side PR: pingcap/tidb#55608

@YangKeao
Copy link
Member Author

/retest

@YangKeao YangKeao force-pushed the add-hook-to-wait-for-async-commit branch from 208243d to 391936c Compare August 26, 2024 09:28
Copy link
Contributor

@you06 you06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing is the max backoff duration is 40s, I think the default graceful shutdown wait time should larger than it.

CommitMaxBackoff = uint64(40000)

txnkv/transaction/txn.go Show resolved Hide resolved
@YangKeao
Copy link
Member Author

One more thing is the max backoff duration is 40s, I think the default graceful shutdown wait time should larger than it.

@you06 Now, TiDB waits for at most 15s and it's hard coded and not configurable.

It's a trade-off between the rolling-restart speed and the regression caused by leaking locks. Someone thinks it's better to rolling restart faster and they can accept a little regression, but someone may think it doesn't matter if the rolling restart takes longer time, but it shouldn't affect any metrics.

At least, it's still not a big issue if some locks leak when these keys take too much time to commit 🤔.

@YangKeao YangKeao requested a review from you06 August 28, 2024 08:26
@you06
Copy link
Contributor

you06 commented Aug 28, 2024

At least, it's still not a big issue if some locks leak when these keys take too much time to commit 🤔.

Agree. My point is that the default configuration is more understandable(and make sense) if the max graceful wait time is larger than commit backoff time. But it's not a big issue because if commit takes more than 15s, there must be a more serious issue rather than the unresolved locks.

Copy link
Contributor

@you06 you06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation LGTM
@cfzjywxk what's your opinion of the new added hooks.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 4, 2024
@ti-chi-bot ti-chi-bot bot added the approved label Sep 4, 2024
@YangKeao
Copy link
Member Author

YangKeao commented Sep 4, 2024

/cc @cfzjywxk

@ti-chi-bot ti-chi-bot bot requested a review from cfzjywxk September 4, 2024 05:15
err = c.store.Go(func() {
if c.txn.secondaryLockCleanupLifecycleHooks.Post != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid hook at all the goroutines, another approach is to hook at the beginning of execution function in 2pc.go and the defer block of it, using c.store.Waitgroup().wait() to ensure the background goroutines are finished in the defer hook.

In the current implementation, the cleanup goroutine needs to be hooked too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've added a new spawn method for the txn as a helper function to track all goroutines. It'll maintain the lifecycle hooks and the inc/dec of txn.store.WaitGroup().

Also provide a spawnWithStorePool to call txn.store.Go internally. I don't modify the existing logic about whether to use the store pool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's called spawn but not go, because I want it to be a private function, but go is a keyword so I cannot use it as a function name.)

@YangKeao YangKeao force-pushed the add-hook-to-wait-for-async-commit branch from 391936c to ee7968b Compare September 4, 2024 08:04
@YangKeao
Copy link
Member Author

YangKeao commented Sep 4, 2024

/hold

Let me do some refactor 😃

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 4, 2024
@YangKeao
Copy link
Member Author

YangKeao commented Sep 4, 2024

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 4, 2024
Copy link
Contributor

@cfzjywxk cfzjywxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the lgtm label Sep 4, 2024
Copy link

ti-chi-bot bot commented Sep 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, you06

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 4, 2024
Copy link

ti-chi-bot bot commented Sep 4, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-04 04:29:15.243630492 +0000 UTC m=+418679.761683413: ☑️ agreed by you06.
  • 2024-09-04 09:18:03.253766875 +0000 UTC m=+436007.771819798: ☑️ agreed by cfzjywxk.

@ti-chi-bot ti-chi-bot bot merged commit f0ea917 into tikv:master Sep 4, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants