Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES 5.4.3 unexpectedly removes indexes #26669

Closed
Daniel314 opened this issue Sep 15, 2017 · 3 comments
Closed

ES 5.4.3 unexpectedly removes indexes #26669

Daniel314 opened this issue Sep 15, 2017 · 3 comments

Comments

@Daniel314
Copy link

ES: Version: 5.4.3, Build: eed30a8/2017-06-22T00:34:03.743Z, JVM: 1.8.0_144
Plugins: none
Java: 1.8.0_144
OS: Linux es-archival1 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I have experienced an issue with two versions of ElasticSearch (5.4.3 and 2.4.1), where ElasticSearch deletes indicies that it should not delete. My environment is a multi-mode cluster with Logstash indexes being routed to three different groups: ingestion, warm, and cold (this last one is named 'crypt' in my environment). The logstash indexes (Logstash-YYYY.MM.DD) are created on the ingestion nodes, routed/moved to the 'warm' node a day later, and then routed/moved to the cold nodes about three days later.

The 'warm' node (there is only one in my environment) developed a filesystem/SAN issue where some blocks became unaccessible (ES would time out trying to read those blocks), and this was causing ElasticSearch on this node to get hung up when trying to route/move shards to the cold nodes. When this happened, after a ~12 hour attempt/timeout/failure, ElasticSearch would remove ALL Logstash indexes at or older than the index that it was trying to move to the cold nodes (i.e. all Logstash indexes that were in good shape on the cold nodes [about 30 indexes in my case], in addition to the Logstash index that it couldn't move because of the filesystem errors).

Fixing the filesystem error resolved the issue where indexes were being unexpectedly deleted.
None of the Elasticsearch logfiles (on all of the systems that I spot-checked) showed any indication of why all of the extra indexes were removed. Interestingly enough, other indexes on the cold nodes that didn't have the Logstash-YYYY.MM.DD name format were not affected.

My issues are resolved now (the filesystem/device issues are fixed), but the unexpected deletion of other indexes (particularly since none of their shards were stored on this server any more) is why I'm reporting this issue.

This is how I create my indicies:
curl -XPUT 'http://localhost:9200/logstash-2017.09.16' -H 'Content-Type: application/json' -d '{ "settings" : { "index.routing.allocation.include.category": "ingestion", "index.mapping.total_fields.limit": 3000, "number_of_shards": 6, "number_of_replicas": 0, "index.unassigned.node_left.delayed_timeout": "5m" } }'

This is the command that would (eventually) result in the loss of all Logstash indexes prior to Sep 10:
curl -XPUT 'http://localhost:9200/logstash-2017.09.10/_settings' -H 'Content-Type: application/json' -d '{ "index.routing.allocation.include.category": "crypt", "number_of_replicas": 1 }'

Thanks,

- Daniel
@dakrone
Copy link
Member

dakrone commented Sep 15, 2017

When this happened, after a ~12 hour attempt/timeout/failure,

Do you have any logs from this attempt/timeout/failure (stacktraces would be great to see where it timed out)

ElasticSearch would remove ALL Logstash indexes at or older than the index that it was trying to move to the cold nodes (i.e. all Logstash indexes that were in good shape on the cold nodes [about 30 indexes in my case], in addition to the Logstash index that it couldn't move because of the filesystem errors).

This sounds suspiciously like something like curator is running in a cron job that might be deleting all indices older than a certain date. Can you confirm whether you have a process like that in your environment?

Any other relevant logs would be much appreciated, there should be logs at least of which indices were deleted and when.

@Daniel314
Copy link
Author

Daniel314 commented Sep 15, 2017 via email

@Daniel314
Copy link
Author

Daniel314 commented Sep 18, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants