-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: compute minority in failed consistency check #41893
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tbg
force-pushed
the
deathrattle
branch
2 times, most recently
from
October 24, 2019 15:13
704c1bc
to
87b3b4a
Compare
irfansharif
approved these changes
Oct 24, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained
Previously, a consistency failure was always framed as a deviation from the leaseholder. However, in practice a mismatch is always symmetric: the leaseholder could be right and the follower wrong; the follower might be right; and in fact the (usually 2+) followers may agree that the leaseholder is wrong. We've historically found that the majority was usually right, that is, with very few exceptions (I remember a single one), if there is one outlier replica, that replica is usually the problematic one, or more generally, the SHA with the lowest multiplicity is usually the one worth suspecting (we call this a "minority" in this context). The consistency checker currently fatals the leaseholder indiscriminately. This is an especially poor choice that was made out of convenience. Killing the leaseholder, which may very well have the correct data, forces the lease to move to a follower, often resulting in the corrupt node taking over the lease and spreading the corruption further. In a follow-up change, we'll want to kill the replicas representing the SHA with the lowest multiplicity instead of defaulting to the leaseholder. Release note (general change): Improve the consistency checker's log output.
tbg
added a commit
to tbg/cockroach
that referenced
this pull request
Oct 25, 2019
Following up on cockroachdb#41893, the consistency checker now instructs the replicas corresponding to the SHA with the lowest multiplicity to terminate. Note that this change is a candidate for backporting into 19.2. An "old" leaseholder will never populate the newly introduced field, and will always terminate itself. A new leaseholder will populate the field, but only new nodes will interpret it. In the worst case, the result will be a consistency failure that does not lead to a fatal error (but collects RocksDB checkpoints just the same). Since no significant time is usually spent in a mixed-patch-releases, this is not a concern that outweighs the benefit of terminating (heuristically) the "right" nodes. Release note (general improvement): when the replicas within a range have found to been corrupted, the outliers will be terminated. Previously, the leaseholder replica would terminate, regardless of which replicas disagreed with each other. This is expected to curb the spreading of corrupted data better than the previous approach.
tbg
added a commit
to tbg/cockroach
that referenced
this pull request
Oct 25, 2019
Following up on cockroachdb#41893, the consistency checker now instructs the replicas corresponding to the SHA with the lowest multiplicity to terminate. Note that this change is a candidate for backporting into 19.2. An "old" leaseholder will never populate the newly introduced field, and will always terminate itself. A new leaseholder will populate the field, but only new nodes will interpret it. In the worst case, the result will be a consistency failure that does not lead to a fatal error (but collects RocksDB checkpoints just the same). Since no significant time is usually spent in a mixed-patch-releases, this is not a concern that outweighs the benefit of terminating (heuristically) the "right" nodes. Release note (general improvement): when the replicas within a range have found to been corrupted, the outliers will be terminated. Previously, the leaseholder replica would terminate, regardless of which replicas disagreed with each other. This is expected to curb the spreading of corrupted data better than the previous approach.
tbg
added a commit
to tbg/cockroach
that referenced
this pull request
Oct 28, 2019
Following up on cockroachdb#41893, the consistency checker now instructs the replicas corresponding to the SHA with the lowest multiplicity to terminate. Note that this change is a candidate for backporting into 19.2. An "old" leaseholder will never populate the newly introduced field, and will always terminate itself. A new leaseholder will populate the field, but only new nodes will interpret it. In the worst case, the result will be a consistency failure that does not lead to a fatal error (but collects RocksDB checkpoints just the same). Since no significant time is usually spent in a mixed-patch-releases, this is not a concern that outweighs the benefit of terminating (heuristically) the "right" nodes. Release note (general improvement): when the replicas within a range have found to been corrupted, the outliers will be terminated. Previously, the leaseholder replica would terminate, regardless of which replicas disagreed with each other. This is expected to curb the spreading of corrupted data better than the previous approach.
tbg
added a commit
to tbg/cockroach
that referenced
this pull request
Nov 4, 2019
Following up on cockroachdb#41893, the consistency checker now instructs the replicas corresponding to the SHA with the lowest multiplicity to terminate. Note that this change is a candidate for backporting into 19.2. An "old" leaseholder will never populate the newly introduced field, and will always terminate itself. A new leaseholder will populate the field, but only new nodes will interpret it. In the worst case, the result will be a consistency failure that does not lead to a fatal error (but collects RocksDB checkpoints just the same). Since no significant time is usually spent in a mixed-patch-releases, this is not a concern that outweighs the benefit of terminating (heuristically) the "right" nodes. Release note (general improvement): when the replicas within a range have found to been corrupted, the outliers will be terminated. Previously, the leaseholder replica would terminate, regardless of which replicas disagreed with each other. This is expected to curb the spreading of corrupted data better than the previous approach.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, a consistency failure was always framed as a deviation from
the leaseholder. However, in practice a mismatch is always symmetric:
the leaseholder could be right and the follower wrong; the follower
might be right; and in fact the (usually 2+) followers may agree that
the leaseholder is wrong.
We've historically found that the majority was usually right, that is,
with very few exceptions (I remember a single one), if there is one
outlier replica, that replica is usually the problematic one, or more
generally, the SHA with the lowest multiplicity is usually the one
worth suspecting (we call this a "minority" in this context).
The consistency checker currently fatals the leaseholder
indiscriminately. This is an especially poor choice that was made out of
convenience. Killing the leaseholder, which may very well have the
correct data, forces the lease to move to a follower, often resulting in
the corrupt node taking over the lease and spreading the corruption
further. In a follow-up change, we'll want to kill the replicas
representing the SHA with the lowest multiplicity instead of defaulting
to the leaseholder.
Release note (general change): Improve the consistency checker's log output.