Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: don't truncate when probing active followers are present #32439

Merged
merged 2 commits into from
Nov 20, 2018

Conversation

tbg
Copy link
Member

@tbg tbg commented Nov 17, 2018

I need to clean up the first commit, putting the methods on replica.mu instead.


This is yet another root cause of Raft snapshots during import/restore. We
create a number of new ranges which may experience Raft leadership changes
early on and ingest SSTs at the same time. This leads to a situation in
which we have followers whose acknowledged log index isn't known, which in
turn leads to bad truncation decisions. Respect such followers (which are
usually only in this state for very brief amounts of time).

Release note: None

@tbg tbg requested a review from a team November 17, 2018 00:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/raft_log_queue.go, line 160 at r2 (raw file):

		// (including only resetting when the follower resumes replicating).
		pr.PendingSnapshot = pendingSnapshotIndex
	}

I think it is worthwhile to unit test this code as well. Perhaps it should be pulled out into a separate routine to ease the testing (i.e. something that takes a Raft status, descriptor and lastUpdateTimes map and returns the updated status).


pkg/storage/raft_log_queue.go, line 179 at r2 (raw file):

	truncatableIndexChosenViaQuorumIndex     = "quorum"
	truncatableIndexChosenViaFollowers       = "followers"
	truncatableIndexChosenViaProbingFollower = "probing-follower"

s/probing-follower/probing follower/g for consistency with the other two word reasons (e.g. pending snapshot, first index, etc).

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/raft_log_queue.go, line 160 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think it is worthwhile to unit test this code as well. Perhaps it should be pulled out into a separate routine to ease the testing (i.e. something that takes a Raft status, descriptor and lastUpdateTimes map and returns the updated status).

Thank you for making me write that unit test. I wasn't writing pr back into prs above, so in effect all that fancy code wasn't doing anything (RecentActive would've always been true).


pkg/storage/raft_log_queue.go, line 179 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/probing-follower/probing follower/g for consistency with the other two word reasons (e.g. pending snapshot, first index, etc).

Done.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/raft_log_queue.go, line 160 at r2 (raw file):

Previously, tbg (Tobias Schottdorf) wrote…

Thank you for making me write that unit test. I wasn't writing pr back into prs above, so in effect all that fancy code wasn't doing anything (RecentActive would've always been true).

Glad to be of service.


pkg/storage/raft_log_queue.go, line 146 at r4 (raw file):

	// true since we don't use CheckQuorum) with our own activity check.
	r.mu.RLock()
	updateRaftProgressFromActivity(ctx, raftStatus.Progress, r.descRLocked().Replicas, r.mu.lastUpdateTimes, now)

Nit: wrap this line at 100 columns.

This is a more natural place to put them a) because there's no reason to
have them sit on Replica and b) because this will allow me to write
better code when I query this map in the log truncation code.

Release note: None
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL. I had to make some more minor adjustments, including one that prompted an upstream PR (with a workaround for the time being).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/raft_log_queue.go, line 146 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: wrap this line at 100 columns.

Done (tests are failing left and right now anyway), but generally, I'd prefer that we either put this into crlfmt or leave these alone in reviews. Do you know if we've tried and failed?

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/raft_log_queue.go, line 146 at r4 (raw file):

Previously, tbg (Tobias Schottdorf) wrote…

Done (tests are failing left and right now anyway), but generally, I'd prefer that we either put this into crlfmt or leave these alone in reviews. Do you know if we've tried and failed?

I don't know if we've tried. The primary reason I care about the line length limit is that it makes reviewing in reviewable easier.


pkg/storage/replica.go, line 456 at r6 (raw file):

		// TODO(tschottdorf): keeping a map on each replica seems to be
		// overdoing it. We should map the replicaID to a NodeID and then use
		// node liveness (or any sensible measure of the peer being around).

The motivation for tracking activity per-range is that a node could be active while a replica on that node is stuck for some reason. Not sure if this is sufficient motivation, but there it is. Would be good to expand this TODO with this additional detail so that we don't forget about it whenever the TODO is addressed.


pkg/storage/replica.go, line 5260 at r6 (raw file):

	fromReplica, fromErr := r.getReplicaDescriptorByIDRLocked(roachpb.ReplicaID(msg.From), r.mu.lastToReplica)
	toReplica, toErr := r.getReplicaDescriptorByIDRLocked(roachpb.ReplicaID(msg.To), r.mu.lastFromReplica)
	if isHB := msg.Type == raftpb.MsgHeartbeat; isHB {

The isHB variable is inhibiting readability. I'm guessing it is detritus left over from earlier work.


pkg/storage/replica.go, line 5267 at r6 (raw file):

		// them into the local Raft group. The leader won't hit that path, so we
		// update it whenever it sends a heartbeat. In effect, this makes sure
		// it always sees itself as alive.

I wonder if it would be better to special-case the leader when using lastUpdateTimes. I don't have the usage paged-in, to feel free to ignore this idea if it is misguided.

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I'm going to whack this with a few testraces on my worker to make up for the fact that TC won't really testrace many potentially affected tests in CI.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/replica.go, line 456 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The motivation for tracking activity per-range is that a node could be active while a replica on that node is stuck for some reason. Not sure if this is sufficient motivation, but there it is. Would be good to expand this TODO with this additional detail so that we don't forget about it whenever the TODO is addressed.

Right, but then we're already screwed because node liveness doesn't capture any of that neither. Updating the comment regardless.


pkg/storage/replica.go, line 5260 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The isHB variable is inhibiting readability. I'm guessing it is detritus left over from earlier work.

Done.


pkg/storage/replica.go, line 5267 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if it would be better to special-case the leader when using lastUpdateTimes. I don't have the usage paged-in, to feel free to ignore this idea if it is misguided.

I considered that, but I think it's ultimately a bad idea to special case because every caller needs to do the right thing with it. For example, the quota pool is completely oblivious to these changes and would omit the leader in its computations. That is fine, but was probably not intentional. There are various other places in the code which compute ad-hoc quorum indexes. I want them to avoid this same kind of pitfall as they get made smarter in the future.

This is yet another root cause of Raft snapshots during import/restore.
We create a number of new ranges which may experience Raft leadership
changes early on and ingest SSTs at the same time. This leads to a
situation in which we have followers whose acknowledged log index isn't
known, which in turn leads to bad truncation decisions. Respect such
followers (which are usually only in this state for very brief amounts
of time).

Release note: None
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@tbg
Copy link
Member Author

tbg commented Nov 20, 2018

Wow, turns out running the logictests under testrace is a suicidal idea.

bors r=petermattis

craig bot pushed a commit that referenced this pull request Nov 20, 2018
32439: storage: don't truncate when probing active followers are present r=petermattis a=tbg

I need to clean up the first commit, putting the methods on `replica.mu` instead.

----

This is yet another root cause of Raft snapshots during import/restore. We
create a number of new ranges which may experience Raft leadership changes
early on and ingest SSTs at the same time. This leads to a situation in
which we have followers whose acknowledged log index isn't known, which in
turn leads to bad truncation decisions. Respect such followers (which are
usually only in this state for very brief amounts of time).

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Nov 20, 2018

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants