-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: don't truncate when probing active followers are present #32439
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/storage/raft_log_queue.go, line 160 at r2 (raw file):
// (including only resetting when the follower resumes replicating). pr.PendingSnapshot = pendingSnapshotIndex }
I think it is worthwhile to unit test this code as well. Perhaps it should be pulled out into a separate routine to ease the testing (i.e. something that takes a Raft status, descriptor and lastUpdateTimes map and returns the updated status).
pkg/storage/raft_log_queue.go, line 179 at r2 (raw file):
truncatableIndexChosenViaQuorumIndex = "quorum" truncatableIndexChosenViaFollowers = "followers" truncatableIndexChosenViaProbingFollower = "probing-follower"
s/probing-follower/probing follower/g
for consistency with the other two word reasons (e.g. pending snapshot
, first index
, etc).
a542a0d
to
4e6f396
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/raft_log_queue.go, line 160 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think it is worthwhile to unit test this code as well. Perhaps it should be pulled out into a separate routine to ease the testing (i.e. something that takes a Raft status, descriptor and lastUpdateTimes map and returns the updated status).
Thank you for making me write that unit test. I wasn't writing pr
back into prs
above, so in effect all that fancy code wasn't doing anything (RecentActive
would've always been true).
pkg/storage/raft_log_queue.go, line 179 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
s/probing-follower/probing follower/g
for consistency with the other two word reasons (e.g.pending snapshot
,first index
, etc).
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/raft_log_queue.go, line 160 at r2 (raw file):
Previously, tbg (Tobias Schottdorf) wrote…
Thank you for making me write that unit test. I wasn't writing
pr
back intoprs
above, so in effect all that fancy code wasn't doing anything (RecentActive
would've always been true).
Glad to be of service.
pkg/storage/raft_log_queue.go, line 146 at r4 (raw file):
// true since we don't use CheckQuorum) with our own activity check. r.mu.RLock() updateRaftProgressFromActivity(ctx, raftStatus.Progress, r.descRLocked().Replicas, r.mu.lastUpdateTimes, now)
Nit: wrap this line at 100 columns.
This is a more natural place to put them a) because there's no reason to have them sit on Replica and b) because this will allow me to write better code when I query this map in the log truncation code. Release note: None
4e6f396
to
297a14c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL. I had to make some more minor adjustments, including one that prompted an upstream PR (with a workaround for the time being).
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/raft_log_queue.go, line 146 at r4 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: wrap this line at 100 columns.
Done (tests are failing left and right now anyway), but generally, I'd prefer that we either put this into crlfmt
or leave these alone in reviews. Do you know if we've tried and failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/raft_log_queue.go, line 146 at r4 (raw file):
Previously, tbg (Tobias Schottdorf) wrote…
Done (tests are failing left and right now anyway), but generally, I'd prefer that we either put this into
crlfmt
or leave these alone in reviews. Do you know if we've tried and failed?
I don't know if we've tried. The primary reason I care about the line length limit is that it makes reviewing in reviewable easier.
pkg/storage/replica.go, line 456 at r6 (raw file):
// TODO(tschottdorf): keeping a map on each replica seems to be // overdoing it. We should map the replicaID to a NodeID and then use // node liveness (or any sensible measure of the peer being around).
The motivation for tracking activity per-range is that a node could be active while a replica on that node is stuck for some reason. Not sure if this is sufficient motivation, but there it is. Would be good to expand this TODO with this additional detail so that we don't forget about it whenever the TODO is addressed.
pkg/storage/replica.go, line 5260 at r6 (raw file):
fromReplica, fromErr := r.getReplicaDescriptorByIDRLocked(roachpb.ReplicaID(msg.From), r.mu.lastToReplica) toReplica, toErr := r.getReplicaDescriptorByIDRLocked(roachpb.ReplicaID(msg.To), r.mu.lastFromReplica) if isHB := msg.Type == raftpb.MsgHeartbeat; isHB {
The isHB
variable is inhibiting readability. I'm guessing it is detritus left over from earlier work.
pkg/storage/replica.go, line 5267 at r6 (raw file):
// them into the local Raft group. The leader won't hit that path, so we // update it whenever it sends a heartbeat. In effect, this makes sure // it always sees itself as alive.
I wonder if it would be better to special-case the leader when using lastUpdateTimes
. I don't have the usage paged-in, to feel free to ignore this idea if it is misguided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I'm going to whack this with a few testraces on my worker to make up for the fact that TC won't really testrace many potentially affected tests in CI.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/replica.go, line 456 at r6 (raw file):
Previously, petermattis (Peter Mattis) wrote…
The motivation for tracking activity per-range is that a node could be active while a replica on that node is stuck for some reason. Not sure if this is sufficient motivation, but there it is. Would be good to expand this TODO with this additional detail so that we don't forget about it whenever the TODO is addressed.
Right, but then we're already screwed because node liveness doesn't capture any of that neither. Updating the comment regardless.
pkg/storage/replica.go, line 5260 at r6 (raw file):
Previously, petermattis (Peter Mattis) wrote…
The
isHB
variable is inhibiting readability. I'm guessing it is detritus left over from earlier work.
Done.
pkg/storage/replica.go, line 5267 at r6 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I wonder if it would be better to special-case the leader when using
lastUpdateTimes
. I don't have the usage paged-in, to feel free to ignore this idea if it is misguided.
I considered that, but I think it's ultimately a bad idea to special case because every caller needs to do the right thing with it. For example, the quota pool is completely oblivious to these changes and would omit the leader in its computations. That is fine, but was probably not intentional. There are various other places in the code which compute ad-hoc quorum indexes. I want them to avoid this same kind of pitfall as they get made smarter in the future.
This is yet another root cause of Raft snapshots during import/restore. We create a number of new ranges which may experience Raft leadership changes early on and ingest SSTs at the same time. This leads to a situation in which we have followers whose acknowledged log index isn't known, which in turn leads to bad truncation decisions. Respect such followers (which are usually only in this state for very brief amounts of time). Release note: None
297a14c
to
7a7e1cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
Wow, turns out running the logictests under bors r=petermattis |
32439: storage: don't truncate when probing active followers are present r=petermattis a=tbg I need to clean up the first commit, putting the methods on `replica.mu` instead. ---- This is yet another root cause of Raft snapshots during import/restore. We create a number of new ranges which may experience Raft leadership changes early on and ingest SSTs at the same time. This leads to a situation in which we have followers whose acknowledged log index isn't known, which in turn leads to bad truncation decisions. Respect such followers (which are usually only in this state for very brief amounts of time). Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Build succeeded |
I need to clean up the first commit, putting the methods on
replica.mu
instead.This is yet another root cause of Raft snapshots during import/restore. We
create a number of new ranges which may experience Raft leadership changes
early on and ingest SSTs at the same time. This leads to a situation in
which we have followers whose acknowledged log index isn't known, which in
turn leads to bad truncation decisions. Respect such followers (which are
usually only in this state for very brief amounts of time).
Release note: None