Delete completed local blocks when replaying wal #939
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
This PR fixes #937 by deleting locally complete blocks during wal replay, and another race condition on ingester startup. Here is a description of the situation and why this should fix it.
a. The wal complete is in progress, writing over top of
data
when it is flushed to the backend. The flush will reach EOF and assume all went well. This creates a partially written block in the backend.b. The wal complete can entirely finish before the flush. It appends another entry into
completedBlocks
. When the flush occurs it reads the first entry fromcompletedBlocks
and saves it as the meta data. This is the actual condition formagic number
errors and incorrect meta. It uses the meta as rediscovered in step 2, which may be a different encoding, etc.c. The wal complete can start on a new block before rediscoverLocalBlocks begins. rediscoverLocalBlocks sees a bad block (missing meta) and deletes it. I think this ends up in a situation similar to (a) but different causes. This is handled by not rediscovering the local block is there is still a wal for it.
A couple more notes:
Which issue(s) this PR fixes:
Fixes #937
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]