Skip to content

Commit

Permalink
kv: circuit-break requests to unavailable replicas
Browse files Browse the repository at this point in the history
Fixes #33007.
Closes #61311.

This PR uses the circuit breaker package introduced in #73641 to address
issue #33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within the timeout specified by the
`kv.replica_circuit_breaker.slow_replication_threshold` cluster setting,
the breaker is tripped, cancelling all inflight and future requests
until the breaker heals. Perhaps surprisingly, the existing "slowness"
detection (the informational "have been waiting ... for proposal"
warning in `executeWriteBatch`) was moved deeper into replication
(`refreshProposalsLocked`), where it now trips the breaker.  This has
the advantage of providing a unified path for lease requests (which
don't go through `executeWriteBatch`) and pipelined writes (which return
before waiting on the inflight replication process). To make this work,
we need to pick a little fight with how leases are (not) refreshed
(#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

Perhaps surprisingly, when the breaker is tripped, *all* traffic to
the Replica gets the fail-fast behavior, not just mutations. This is
because even though a read may look good to serve based on the lease,
we would also need to check for latches, and in particular we would
need to fail-fast if there is a transitive dependency on any write
(since that write is presumably stuck). This is not trivial and so
we don't do it in this first iteration (see #74799).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in #72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, the cluster setting defaults to zero (disabling the
entire mechanism) until some necessary follow-up items have been
addressed (see #74705). For example, the breaker-sensitive context
cancelation is a toy implementation that comes with too much of a
performance overhead (one extra goroutine per request); #74707 will
address that.  Other things not done here that we certainly want in the
22.1 release are UI work (#74713) and metrics (#74505).

The release note is deferred to #74705 (where breakers are turned on).

Release note: None

touchie
  • Loading branch information
tbg committed Jan 13, 2022
1 parent ad59351 commit 6664d0c
Show file tree
Hide file tree
Showing 22 changed files with 1,062 additions and 129 deletions.
1 change: 1 addition & 0 deletions docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
<tr><td><code>kv.range_split.by_load_enabled</code></td><td>boolean</td><td><code>true</code></td><td>allow automatic splits of ranges based on where load is concentrated</td></tr>
<tr><td><code>kv.range_split.load_qps_threshold</code></td><td>integer</td><td><code>2500</code></td><td>the QPS over which, the range becomes a candidate for load based splitting</td></tr>
<tr><td><code>kv.rangefeed.enabled</code></td><td>boolean</td><td><code>false</code></td><td>if set, rangefeed registration is enabled</td></tr>
<tr><td><code>kv.replica_circuit_breaker.slow_replication_threshold</code></td><td>duration</td><td><code>0s</code></td><td>duration after which slow proposals trip the per-Replica circuit breaker (zero duration disables breakers)</td></tr>
<tr><td><code>kv.replication_reports.interval</code></td><td>duration</td><td><code>1m0s</code></td><td>the frequency for generating the replication_constraint_stats, replication_stats_report and replication_critical_localities reports (set to 0 to disable)</td></tr>
<tr><td><code>kv.snapshot_rebalance.max_rate</code></td><td>byte size</td><td><code>32 MiB</code></td><td>the rate limit (bytes/sec) to use for rebalance and upreplication snapshots</td></tr>
<tr><td><code>kv.snapshot_recovery.max_rate</code></td><td>byte size</td><td><code>32 MiB</code></td><td>the rate limit (bytes/sec) to use for recovery snapshots</td></tr>
Expand Down
6 changes: 6 additions & 0 deletions pkg/kv/kvserver/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ go_library(
"replica_application_state_machine.go",
"replica_backpressure.go",
"replica_batch_updates.go",
"replica_circuit_breaker.go",
"replica_closedts.go",
"replica_command.go",
"replica_consistency.go",
Expand Down Expand Up @@ -162,6 +163,7 @@ go_library(
"//pkg/util",
"//pkg/util/admission",
"//pkg/util/bufalloc",
"//pkg/util/circuit",
"//pkg/util/contextutil",
"//pkg/util/ctxgroup",
"//pkg/util/encoding",
Expand Down Expand Up @@ -221,6 +223,7 @@ go_test(
"client_rangefeed_test.go",
"client_relocate_range_test.go",
"client_replica_backpressure_test.go",
"client_replica_circuit_breaker_test.go",
"client_replica_gc_test.go",
"client_replica_test.go",
"client_spanconfigs_test.go",
Expand Down Expand Up @@ -252,6 +255,7 @@ go_test(
"replica_application_cmd_buf_test.go",
"replica_application_state_machine_test.go",
"replica_batch_updates_test.go",
"replica_circuit_breaker_test.go",
"replica_closedts_internal_test.go",
"replica_closedts_test.go",
"replica_command_test.go",
Expand Down Expand Up @@ -364,6 +368,7 @@ go_test(
"//pkg/storage/enginepb",
"//pkg/storage/fs",
"//pkg/testutils",
"//pkg/testutils/echotest",
"//pkg/testutils/gossiputil",
"//pkg/testutils/kvclientutils",
"//pkg/testutils/serverutils",
Expand All @@ -374,6 +379,7 @@ go_test(
"//pkg/ts/tspb",
"//pkg/util",
"//pkg/util/caller",
"//pkg/util/circuit",
"//pkg/util/contextutil",
"//pkg/util/ctxgroup",
"//pkg/util/encoding",
Expand Down
Loading

0 comments on commit 6664d0c

Please sign in to comment.