Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool to remove corrupted parts of corrupt shards #31389

Closed
DaveCTurner opened this issue Jun 18, 2018 · 14 comments
Closed

Tool to remove corrupted parts of corrupt shards #31389

DaveCTurner opened this issue Jun 18, 2018 · 14 comments
Labels
:Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >enhancement help wanted adoptme v6.5.0 v7.0.0-beta1

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jun 18, 2018

Today, if we detect shard corruption then we mark the store as corrupt and refuse to open it again. If there are no replicas then you might be able to use Lucene’s CheckIndex to remove the corrupted segments, although this does not remove the corruption marker, requires knowledge of our filesystem layout, and might be tricky to do in a containerised or heavily automated environment. The only way forward via the API is to force the allocation of an empty primary which drops all the data in the shard. We have an index.shard.check_on_startup: fix setting but this is suboptimal for a couple of reasons:

  • it’s index-wide and requires closing and verifying the whole index.
  • it has no effect on shards that have a corruption marker, because the corruption marker is checked before this option takes effect.

(it also does nothing in versions 6.0 and above, but that's another story)

The Right Way™ to recover a corrupted shard is certainly to fail it and recover another copy from one of its replicas, assuming such a thing exists, but we’ve seen a couple of cases recently where a user was running without replicas, e.g. to do a bulk load of data (which we sorta suggest might be a good idea sometimes) and hit some corruption that they'd have preferred to recover from with a bit of data loss rather than by restarting the load or allocating an empty primary.

I propose removing the fix option of the index.shard.check_on_startup setting and instead adding another dangerous forced allocation command that can attempt to allocate a primary on top of a corrupt store by fixing the store and removing its corruption marker.

/cc @tsouza @ywelsch re. this forum thread


Actual points and opened questions:

  • Tool name: elasticsearch-shard with subcommand remove-corrupted-segments
    • the main goal is to fix corrupted index - the action is destructive - therefore no any fix or repair, avoid truncate as it is far from Lucene terminology
  • Available options for remove-corrupted-segments:
    • --index-name index_name and --shard-id shard_id (mandatory)
      • alternative: -d path_to_index_folder or --dir path_to_index_folder
    • --dry-run do fast check without actual dropping of corrupted segments
    • no options means exorcise - interactive keyboard confirmation is required
  • merge elasticsearch-translog into elasticsearch-shard
    • elasticsearch-translog becomes elasticsearch-shard truncate-translog
    • elasticsearch-translog has only -d option to specify folder - it would be nice to have --index-name index_name and --shard-id shard_id
  • Exit immediately if there is no corruption marker file
    • for both cases
  • actually missed segments are unrecoverable case with checkIndex
@DaveCTurner DaveCTurner added :Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. team-discuss labels Jun 18, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes
Copy link
Contributor

bleskes commented Jun 18, 2018

+1. That settings is dangerous :(

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Jun 26, 2018

We (@elastic/es-distributed) discussed this today and decided:

  • removing index.shard.check_on_startup: fix is the right thing to do.

  • fixing a corrupted shard should not be done online, via the API, but we should have an offline tool, similar to the translog tool, that can fix it without requiring the user to descend into the filesystem by hand.

@DaveCTurner DaveCTurner changed the title Online recovery of corrupt shards Tool for recovery of corrupt shards Jun 26, 2018
@vladimirdolzhenko vladimirdolzhenko self-assigned this Jul 17, 2018
@vladimirdolzhenko
Copy link
Contributor

It has been observed that index.shard.check_on_startup is broken since 6.0.0 and in fact no any other value rather true/false -

if (Booleans.isTrue(checkIndexOnStartup)) {

@tsouza
Copy link

tsouza commented Jul 17, 2018

I don't think this is correct

private void doCheckIndex() throws IOException {

UPDATE: It seems that method is dead code.

@vladimirdolzhenko
Copy link
Contributor

vladimirdolzhenko commented Jul 18, 2018

  • Decided to fix “checksum” but not fix “fix”

vladimirdolzhenko pushed a commit to vladimirdolzhenko/elasticsearch that referenced this issue Jul 21, 2018
@vladimirdolzhenko
Copy link
Contributor

vladimirdolzhenko commented Jul 23, 2018

description is updated

vladimirdolzhenko pushed a commit to vladimirdolzhenko/elasticsearch that referenced this issue Jul 23, 2018
@bleskes
Copy link
Contributor

bleskes commented Jul 23, 2018

-d path_to_index_folder

as discussed one of the upside of a tool vs running lucene directly is the translation between index names and index folders. I think we should allow people to specify an index name and an shard id as parameters.

@jpountz
Copy link
Contributor

jpountz commented Jul 23, 2018

+1 to remove-corrupted-segments, I like that it is explicit about the data loss and the fact that it works at the segment level.

+1 to pass in an index name and a shard rather than a folder.

elasticsearch-index

Or maybe elasticsearch-shard?

-fast (default)
-slow

I'd probably not expose these options at all, and always run with fast=true and crossCheckTermVectors=false.

-exorcise

Since the name of the command already implies data loss I'm not sure we need this one. Maybe turn it around and make it a dry-run option that only prints what is going to happen when enabled?

@vladimirdolzhenko
Copy link
Contributor

@jpountz I like idea of dry-run - that is default one - in this case appears question for proper naming for exorcise - force-remove or smth else ?

@jpountz
Copy link
Contributor

jpountz commented Jul 24, 2018

I'm open to ideas here as long as the fact that this command will cause data loss is clear. My thinking was that since the command is already called remove-corrupted-segments then we don't need additional warnings and could just go with the removal of corrupted segments by default unless --dry-run is passed. But I also understand why someone would like to add a second-level of protection, I am fine either way. I know Lucene uses exorcise but I think we could find a better name / workflow. For instance I think some other tools are interactive whenever changes need to be applied and ask to confirm, maybe this is something we can get inspiration from. In any case I'm open to how we want to handle that part. The main things that I care about are having a name that makes the data loss obvious (remove-corrupted-segments sounds great) and having as few options as possible (ie. skip options that don't help much in the context of Elasticsearch like cross-checking term vectors).

vladimirdolzhenko pushed a commit to vladimirdolzhenko/elasticsearch that referenced this issue Jul 27, 2018
@bleskes
Copy link
Contributor

bleskes commented Aug 9, 2018

We've had a good discussion around this tool and have concluded the following:

  1. We should have one tool for dealing with corruptions, both in the translog and in the lucene index
  2. The tool will refuse to run if there are no existing corruption markers (i.e., it will only work on known corrupted shards)
  3. The tool will first run a dry run and show an analysis of what it's going to do to the user, get confirmation and then perform required operations.
  4. The tool should fail when check index fails to drop corrupted segments in Lucene. In the future we can offer users to only recover the translog, if needed. We don't feel we need the complexity right now.
  5. We should document the implication of the tool to join relationships as it may be unexpected to users.
  6. The tool should generate a new history uuid to prevent ops based recoveries and CCR.
  7. The tool should generate a new allocation id and tell the user what command they need to run in order for the cluster to use this shard (allocate stale primary).

We have run out of time and didn't discuss the parameters and tool naming. @vladimirdolzhenko can you post a suggestion here based on the above and we can discuss it further?

@vladimirdolzhenko
Copy link
Contributor

  • tool is elasticsearch-shard with a subcommand remove-corrupted-data
  • actually there is no corrupted markers for a translog (PR is following) - therefore tool performs analysis first for it - if it is clean (not corrupted) is the same as no corruption marker, check for corruption marker for index files
  • tool has --index-name and --shard-id parameters or --dir for the cases of multiple nodes per data dir / environment
  • as tool performs analysis before any destructive actions (those have to be confirmed) it is decided to drop --dry-run option

@DaveCTurner DaveCTurner changed the title Tool for recovery of corrupt shards Tool to remove corrupted parts of corrupt shards Aug 31, 2018
vladimirdolzhenko added a commit that referenced this issue Aug 31, 2018
drop `index.shard.check_on_startup: fix`

Relates #31389
vladimirdolzhenko added a commit that referenced this issue Aug 31, 2018
Relates #31389

(cherry picked from commit 3d82a30)
vladimirdolzhenko added a commit that referenced this issue Sep 19, 2018
vladimirdolzhenko added a commit to vladimirdolzhenko/elasticsearch that referenced this issue Sep 19, 2018
vladimirdolzhenko added a commit that referenced this issue Sep 22, 2018
Relates #31389

(cherry picked from commit a3e8b83)
vladimirdolzhenko added a commit that referenced this issue Oct 1, 2018
#32281 adds elasticsearch-shard to provide bwc version of elasticsearch-translog for 6.x; have to remove elasticsearch-translog for 7.0

Relates to #31389
kcm pushed a commit that referenced this issue Oct 30, 2018
#32281 adds elasticsearch-shard to provide bwc version of elasticsearch-translog for 6.x; have to remove elasticsearch-translog for 7.0

Relates to #31389
@DaveCTurner
Copy link
Contributor Author

Closed by #32281.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >enhancement help wanted adoptme v6.5.0 v7.0.0-beta1
Projects
None yet
Development

No branches or pull requests

7 participants