-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: pull Store-level concurrency retry loop under Replica, clean up req validation #43138
storage: pull Store-level concurrency retry loop under Replica, clean up req validation #43138
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 10 of 10 files at r2, 2 of 2 files at r3, 3 of 3 files at r4, 1 of 1 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, @sumeerbhola, and @tbg)
pkg/storage/replica.go, line 1022 at r2 (raw file):
// checkExecutionCanProceed returns an error if a batch request cannot be // executed by the Replica. An error indicates that the Replica is not live and // able to serve traffic or that the request is not compatable with the state of
compatible
pkg/storage/replica.go, line 1028 at r2 (raw file):
// used to indicate whether the caller has acquired latches and checked the // Range lease. The method will only check for a pending merge if both of these // conditions are true.
Add that lg
and st
can be nil and under which conditions the caller would do that.
pkg/storage/replica.go, line 1054 at r2 (raw file):
} // checkExecutionCanProceedForRangeFeed .returns an error if a rangefeed request
extra dot
pkg/storage/replica.go, line 1074 at r2 (raw file):
} // checkSpanInRange returns an error if a request (identified by its
nit: name is off
pkg/storage/replica_raft.go, line 68 at r4 (raw file):
func (r *Replica) evalAndPropose( ctx context.Context, lease *roachpb.Lease,
It's fine to leak the lease pointer out of r.mu.state
out from under the lock because we only ever replace that pointer, but never mutate into it, right? Might be worth a comment on ReplicaState
.
pkg/storage/replica_write.go, line 122 at r1 (raw file):
// Verify that the batch can be executed. if ec.lg != nil { if err := r.checkForPendingMerge(ctx, ba, ec.lg, &status); err != nil {
Isn't is potentially problematic to order this after redirectOnOrAcquireLease
? That method may try to get the lease and in doing so, may stall (waiting for the intent on the range desc to go away). I don't know if there's anything wrong with the code per se, but it does seem that there's some thinking to be had about deadlocks. It might be safer to put this ahead of the lease check.
pkg/storage/replica_write.go, line 158 at r2 (raw file):
// Checking the context just before proposing can help avoid ambiguous errors. if err := ctx.Err(); err != nil { log.Warningf(ctx, "%s before proposing: %s", err, ba.Summary())
not new code, but this shouldn't really be a warning, should it?
This commit shuffles around the read-only Replica execution path to acquire latches before checking the Replica's lease. This makes the read-only path more like the write path and will allow for future simplification. In order to permit this, the commit needed to pull out the logic to check for a pending merge from beginCmds. This allowed us to clean it up a bit and make its interaction with other checks more clear. Release note: None
bf7c725
to
8e4e1fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving this a pass!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @sumeerbhola, and @tbg)
pkg/storage/replica.go, line 1022 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
compatible
Done.
pkg/storage/replica.go, line 1028 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Add that
lg
andst
can be nil and under which conditions the caller would do that.
Done.
pkg/storage/replica.go, line 1054 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
extra dot
Done.
pkg/storage/replica.go, line 1074 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
nit: name is off
Done.
pkg/storage/replica_raft.go, line 68 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It's fine to leak the lease pointer out of
r.mu.state
out from under the lock because we only ever replace that pointer, but never mutate into it, right? Might be worth a comment onReplicaState
.
This isn't coming straight from r.mu.state
. We copy it into LeaseStatus
in redirectOnOrAcquireLease
.
pkg/storage/replica_write.go, line 122 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Isn't is potentially problematic to order this after
redirectOnOrAcquireLease
? That method may try to get the lease and in doing so, may stall (waiting for the intent on the range desc to go away). I don't know if there's anything wrong with the code per se, but it does seem that there's some thinking to be had about deadlocks. It might be safer to put this ahead of the lease check.
The merge check actually needs to be after the lease check on the read-only path in order to get TestStoreRangeMergeRHSLeaseExpiration
working. Without that, it's possible that a new leaseholder accepts a read while the range is meant to be frozen. This is because the mergeCompleteCh
is set in leasePostApply
when a frozen range transfers its lease, so if the mergeCompleteCh
is checked before the lease then the read can miss the mergeCompleteCh
but observe the new lease, allowing it to accidentally proceed.
No tests fail when I perform the re-order on the read-write path, but doing so in different places depending on the access method seems pretty bad to me. I'm also not convinced we wouldn't hit the same kind of issue with writes as we do with reads if we performed the re-order. I'll try extending TestStoreRangeMergeRHSLeaseExpiration
to also perform writes and see what I find.
pkg/storage/replica_write.go, line 158 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
not new code, but this shouldn't really be a warning, should it?
Nope, changed to log.VEventf(2, ...)
This reveals that writes to the RHS of a merge used to be allowed during a merge's critical phase if the RHS's lease expired at just the right time and raced with a write to the new leaseholder. The previous commit fixed this bug by checking the mergeCompleteCh after validating the replica's lease, instead of checking it before validating the replica's lease. Release note (bug fix): Fixed a bug where a well-timed write could slip in on the right-hand side of a Range merge. This would allow it to improperly synchronize with reads on the post-merged Range.
This commit cleans up the validation paths for BatchRequest and RaftFeed requests. It unifies the code that determining whether requests are compatable with the current state of the Replica it intends to execute on. This has a nice side effect of addressing a performance concern that I've had for a while -- on both the Replica read and write path, we were repeatedly read-locking the Replica Mutex to perform each of the validation steps. This commit addresses this by fusing all verification steps into a single critical section. Release note: None
It belongs here. Release note: None
Pass by reference, not value. Lease is fairly large and we just want to pull out its sequence later on in the method. Release note: None
We always set this and weren't prepared to handle the WriteIntentError even if we didn't. Release note: None
…l Txn *TransactionPushError already implements the transactionRestartError interface, so this wasn't needed. Release note: None
This commit pulls the retry loop in `Store.Send` down under `Replica.Send`, which provides a path to introduce a centralized concurrency manager as part of addressing cockroachdb#41720. The new `Replica.executeBatchWithConcurrencyRetries` method will call into the concurrency package and delegate some of its retry handling logic to it once the package is introduced. Release note: None
8e4e1fb
to
7103bb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @sumeerbhola, and @tbg)
pkg/storage/replica_write.go, line 122 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The merge check actually needs to be after the lease check on the read-only path in order to get
TestStoreRangeMergeRHSLeaseExpiration
working. Without that, it's possible that a new leaseholder accepts a read while the range is meant to be frozen. This is because themergeCompleteCh
is set inleasePostApply
when a frozen range transfers its lease, so if themergeCompleteCh
is checked before the lease then the read can miss themergeCompleteCh
but observe the new lease, allowing it to accidentally proceed.No tests fail when I perform the re-order on the read-write path, but doing so in different places depending on the access method seems pretty bad to me. I'm also not convinced we wouldn't hit the same kind of issue with writes as we do with reads if we performed the re-order. I'll try extending
TestStoreRangeMergeRHSLeaseExpiration
to also perform writes and see what I find.
I think this was actually broken before this change. I added some writes to TestStoreRangeMergeRHSLeaseExpiration
(see new commit) and that revealed that writes could race with a lease transfer during the critical section of a merge to allow them to be permitted. It doesn't look like the writes were lost, but I think the hazard is that they were improperly synchronized with reads on the post-merged range.
This issue goes away with the change here to check for merges after checking the lease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 10 files at r9, 1 of 11 files at r10, 4 of 10 files at r11, 1 of 3 files at r13, 8 of 8 files at r16.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)
TFTR! For reference, here's the
generated sheet: https://docs.google.com/spreadsheets/d/1EoSnsRab5kglCIEqoTyvumguJ3Rym-_zKeoGxaMEFzs/edit |
bors r+ |
43138: storage: pull Store-level concurrency retry loop under Replica, clean up req validation r=nvanbenschoten a=nvanbenschoten This PR contains a number of cleanup steps necessary to pave the way for the unified `pkg/storage/concurrency` package that we plan to introduce for #41720 (see [prototype](https://github.com/nvanbenschoten/cockroach/commits/nvanbenschoten/lockTable)). Primarily, the PR cleans up Replica-level request validation, unifies some of the request synchronization logic between the read-only and read-write execution paths, and pulls the Store-level concurrency retry loop under the Replica. Future commits will pull parts of this retry loop down into the `concurrency` package's "concurrency manager". Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
This PR contains a number of cleanup steps necessary to pave the way for the unified
pkg/storage/concurrency
package that we plan to introduce for #41720 (see prototype). Primarily, the PR cleans up Replica-level request validation, unifies some of the request synchronization logic between the read-only and read-write execution paths, and pulls the Store-level concurrency retry loop under the Replica. Future commits will pull parts of this retry loop down into theconcurrency
package's "concurrency manager".