Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard Allocation Race Condition #34878

Closed
danielkasen opened this issue Oct 25, 2018 · 4 comments
Closed

Shard Allocation Race Condition #34878

danielkasen opened this issue Oct 25, 2018 · 4 comments
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) feedback_needed

Comments

@danielkasen
Copy link

6.3.2 :

Plugins installed: []

JVM version (java -version): 1.8.72

OS version (uname -a if on a Unix-like system): Ubuntu 14.04

Description of the problem including expected versus actual behavior:
New Index Gets allocated into a yellow state vs. allocating to shards each available node when using a mixture of rack_awareness and shard_allocation_per_node = 1

Steps to reproduce:

  1. Create a 15 node Cluster that has 3 different racks
  2. Create an index with 7 shards and 1 replica ( 14 total ) that can allocate only 1 shard per node
  3. Randomly get a race condition where the index can't allocate one of the replicas becase it's primary is in the same rack.

So basically you can get into a condition where even though you still have 2 nodes without a shard it can't allocate to either of them because then the primary and relica would be in the same rack. To fix this you have to move the free up a node in a different rack by moving it's primary or replica to one of the unused nodes, then assign the original replica that couldn't be assigned to that new node (moving 2 shards at once).

@dnhatn dnhatn added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Oct 26, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn
Copy link
Member

dnhatn commented Oct 26, 2018

@danielkasen Thanks for reporting this. Could you provide the shard allocation filter that you used? Thanks.

@DaveCTurner
Copy link
Contributor

I think this duplicates #12273. The shard allocator does not consider moving shards around to make more of them fit, and backs itself into a corner, especially if there's a limit per node.

(It's not a race condition, this all happens on a single thread.)

@danielkasen
Copy link
Author

Ahh yes I didn't see that other thread. This is basically what is happening, just with a different shard to node ratio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) feedback_needed
Projects
None yet
Development

No branches or pull requests

4 participants