-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
total_shards_per_node may lead to unassigned shards #9248
Comments
Hi @imriz I tried this on 1.4.1, and it seems to be working correctly:
Returns:
Could you provide a recreation of the problem? |
So far, I wasn't able to reproduce at will, but I can gather what ever On Tue, Jan 13, 2015, 22:56 Clinton Gormley [email protected]
|
Hi @clintongormley, The only relevant logs are: |
Hi @imriz So these are definitely per-index settings that you are setting? Nothing in the per-node config files or cluster settings? Any other allocation-related settings that we should know about? |
The only thing remotely related is: But we are not close to these thresholds, nor do I see how they can create the allocation topology I got here.. I should emphasize that it looks completely random - an index creation few minutes later can result a perfectly sensible topology. |
Hi @imriz A colleague has encountered this before and explained to me what is happening:
The only way to solve this setup is to move one of the other shards to a different node, but it isn't going to do that until something else triggers the move, eg disk full, node failure, etc While it goes to some trouble to randomize things like this, so that this should happen seldom, it is possible (as you've seen) with hard limits like |
Yep, that is what I described in my initial post :) Just one small note, if total_shards_per_node is set to a higher value, the nodes will happily allocate the same shard to the same node. i think that the best approach is to have a configuration flag to ensure this will never happen (same shard on the same node), since the whole reason I've had to set total_shards_per_node to 2 is to prevent a node from recovering redundant shards when another node fails (which is very problematic when you don't have a lot of free disk space). |
This is never be case. I think what happens if you increase the total_shards_per_node is that you end up with 3 (different) shards on a single node, then the allocator rebalances by moving one of the shards to a different node. But having two copies of the same shard on a single node will never be allowed.
as above, this never happens. |
You are, of course, correct.
Well, not the same shards, but indeed redundant shards. |
@imriz did you try looking at the the disk based allocation as a means to avoid assigning shards that will cause the node to run out of disk space? This will give the flexibility you need if possible while avoiding filling up the disk. More info is here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html#disk . Note that you will have to be using 1.4.1 or higher for it to include the size of relocating shards (and prevent them): see #7785 which was back ported in 4e5264c |
@bleskes I am aware of the disk based allocation thresholds, but that doesn't give me the functionality I need - I want to utilize the disk space as much as I can - I just don't want the nodes to recover any shards of another failing node (that is, I want to be able to sustain only 1 node failure). |
@imriz sorry for the late response. I hear you and your are right. The disk threshold allocator will give you some more flexibility here, if you have the space for it. Since we can't try all the shard allocations combinations (think thousands of shards, hundreds of nodes) we have to use an iterative algorithm and that might not find a global optimum. If you have any suggestion how to improve this I'll be very happy to discuss them. |
@bleskes Maybe something like CRUSH maps could be of use? |
@imriz thx for the tip. Looks interesting, though it's not clear at first glance how removing the centralized nature (which is very good for other reasons) will allow to deal with local minima. |
@bleskes CRUSH allows you to set allocation rules, which can prevent the issue On Fri, Feb 13, 2015, 15:17 Boaz Leskes [email protected] wrote:
|
It's been a long time since I tried this approach (I was the author of the linked user group post). But when talking to someone at ElasticON, they said it sounds like a bug and to create an issue. Was about to try do that and came across this issue. Glad I'm not the only one. Haven't gone through to try to recreate this on 1.4.4. We are currently on 1.4.2. We are running a 3 node cluster, 3 primary, 1 replica, with daily rolling indexes. There's probably about 12 indexes per day. Of those, 2 are very large/hot. We really care that those are distributed evenly (we are on spinning disk and pretty constrained w/ memory so anything we can do to balance it across the cluster, the better). In fact, we'd just prefer everything get distributed evenly. I've tried to play with Seems the issue is clear now? Or is there anything I can do to try to help recreate? |
Closing this as a duplicate of #12273 |
Hi,
I have 3 nodes (version 1.4.1), and my index settings is as follows:
"index.routing.allocation.total_shards_per_node": 2
"index.number_of_shards": 3
"index.number_of_replicas": "1"
Sometimes, The shards gets allocated as follows (p denotes primary, r denotes replica):
Node1: 0p,1r
Node2: 0r,1p
Node2: 2p
This leaves 2r unassigned.
If I raise total_shards_per_node to 3, the cluster will start recovering the unassigned shard (2r).
If I lower total_shards_per_node to 2 after the recovery has finished, it will reallocate the shards correctly.
I see that I am not the only one seeing this:
https://groups.google.com/forum/#!topic/elasticsearch/GZnamxaaj0g
The text was updated successfully, but these errors were encountered: