Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/concurrency: release reqs from same txn as discovered lock #45601

Merged

Conversation

nvanbenschoten
Copy link
Member

This commit updates the AddDiscoveredLock state transition in lockTableImpl to release all waiting writers that are part of the same transaction as a discovered lock.

This caused issues in #45482, where we saw txn deadlocks in the pkg/sql/tests.Bank benchmark. This issue was triggered because we were forgetting to inform the lockTable of a lock acquisition in a subtle case. That is now fixed and it's not clear that we can actually hit the bug here anymore given the current policy on how wait-queues form in the lock-table. Regardless, this is worth handling correctly in case things ever change.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):

	// If there are waiting requests from the same txn, they no longer need to wait.
	if l.releaseWritersFromTxn(txn) {
		informWaiters = true

I think we can tighten up the semantics of informWaiters and remove the return value from releaseWritersFromTxn.
The only correctness need for informWaiters=false is when the lock is already held. Currently the code also sets it to false when l.reservation == nil but that is an premature performance optimization that I think we should remove.

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):

Previously, sumeerbhola wrote…

I think we can tighten up the semantics of informWaiters and remove the return value from releaseWritersFromTxn.
The only correctness need for informWaiters=false is when the lock is already held. Currently the code also sets it to false when l.reservation == nil but that is an premature performance optimization that I think we should remove.

I'm not sure I understand why it's ever incorrect to call informActiveWaiters. You said that there's a correctness need for informWaiters=false when the lock is already held. Could you expand on that bit more?

Also, in place of the return value on releaseWritersFromTxn, are you suggesting that it call tryMakeNewDistinguished directly when it detects that doing so it necessary.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/lockTableDiscovered branch from 2c7a4be to ad50d9c Compare March 3, 2020 05:55
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @sumeerbhola)


pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):

I'm not sure I understand why it's ever incorrect to call informActiveWaiters. You said that there's a correctness need for informWaiters=false when the lock is already held. Could you expand on that bit more?

I agree that from a correctness perspective it is ok to call informActiveWaiters even if nothing has changed. The use of informWaiters is trying to be a performance optimization. When l.reservation==nil and the lock was not held this performance optimization is limited to not calling informActiveWaiters since that function won't have any work to do -- there can't be any waiters. So we can remove it. Regarding setting informWaiters to false when the lock was already held -- now that there are more fields in waitingState, I don't think that optimization is correct since the waiters want to know whether they are waiting on a replicated or unreplicated lock. So that would mean removing informWaiters altogether.

Also, in place of the return value on releaseWritersFromTxn, are you suggesting that it call tryMakeNewDistinguished directly when it detects that doing so it necessary.

No, we'd still call informActiveWaiters here. I should have said that the return value from releaseWritersFromTxn should not be needed to decide whether to call informActiveWaiters.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/lockTableDiscovered branch from ad50d9c to e8faf7c Compare March 3, 2020 23:08
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):

Previously, sumeerbhola wrote…

I'm not sure I understand why it's ever incorrect to call informActiveWaiters. You said that there's a correctness need for informWaiters=false when the lock is already held. Could you expand on that bit more?

I agree that from a correctness perspective it is ok to call informActiveWaiters even if nothing has changed. The use of informWaiters is trying to be a performance optimization. When l.reservation==nil and the lock was not held this performance optimization is limited to not calling informActiveWaiters since that function won't have any work to do -- there can't be any waiters. So we can remove it. Regarding setting informWaiters to false when the lock was already held -- now that there are more fields in waitingState, I don't think that optimization is correct since the waiters want to know whether they are waiting on a replicated or unreplicated lock. So that would mean removing informWaiters altogether.

Also, in place of the return value on releaseWritersFromTxn, are you suggesting that it call tryMakeNewDistinguished directly when it detects that doing so it necessary.

No, we'd still call informActiveWaiters here. I should have said that the return value from releaseWritersFromTxn should not be needed to decide whether to call informActiveWaiters.

Done, I think. Mind double checking that that's the change you were thinking of?

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)

This commit updates the AddDiscoveredLock state transition in lockTableImpl
to release all waiting writers that are part of the same transaction as a
discovered lock.

This caused issues in cockroachdb#45482, where we saw txn deadlocks in the
`pkg/sql/tests.Bank` benchmark. This issue was triggered because we were
forgetting to inform the lockTable of a lock acquisition in a subtle case.
That is now fixed and it's not clear that we can actually hit the bug here
anymore given the current policy on how wait-queues form in the lock-table.
Regardless, this is worth handling correctly in case things ever change.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/lockTableDiscovered branch from e8faf7c to 7381a72 Compare March 4, 2020 19:04
@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 4, 2020

Build failed (retrying...)

@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 4, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Mar 4, 2020

Build succeeded

@craig craig bot merged commit cdc5681 into cockroachdb:master Mar 4, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/lockTableDiscovered branch March 5, 2020 18:15
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 26, 2020
This testdata file was added in cockroachdb#45601. That skewed with the package
rename.

Release justification: testing only
craig bot pushed a commit that referenced this pull request Mar 27, 2020
46631: kv/kvserver: mv testdata/lock_table/add_discovered out of pkg/storage r=nvanbenschoten a=nvanbenschoten

This testdata file was added in #45601. That skewed with the mass package rename.

Release justification: testing only

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants