-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frozen tier autoscaling decider based on shards #71042
Frozen tier autoscaling decider based on shards #71042
Conversation
The frozen tier only holds shared cache searchable snapshots. This commit adds an autoscaling decider that scales the total memory on the tier adequately to hold the shards. A frozen shard is assigned a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards before scaling further.
Pinging @elastic/es-distributed (Team:Distributed) |
String tierPreference = DataTierAllocationDecider.INDEX_ROUTING_PREFER_SETTING.get(indexSettings); | ||
String[] preferredTiers = DataTierAllocationDecider.parseTierList(tierPreference); | ||
if (preferredTiers.length >= 1 && preferredTiers[0].equals(DataTier.DATA_FROZEN)) { | ||
// todo: add this line once frozen is only mounted using DATA_FROZEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be updated to be == 1
now that #71014 has been merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added the assertion in f96070c
Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates elastic#71042
The reactive decider no longer applies to the frozen tier, since it could grossly over-estimate the amount of storage for unassigned frozen shards. Relates elastic#71042
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left a few really minor comments. I wouldn't say I'm an expert in the autoscaling stuff (far from it!), but this looks good to me.
import org.elasticsearch.xpack.autoscaling.LocalStateAutoscaling; | ||
import org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots; | ||
|
||
public class LocalStateAutoscalingAndSearchableSnapshots extends LocalStateAutoscaling { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own edification and/or someone else reading this code, can you add a comment about why this plugin wrapper is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment in 1c4c4ae
No expert on this and any conventions. I think adding a LocalStateSearchablesnapshots
here and using both LocalState
plugins from the test could also work, but I found this nicer.
...ternalClusterTest/java/org/elasticsearch/xpack/autoscaling/shards/FrozenShardsDeciderIT.java
Outdated
Show resolved
Hide resolved
public static final Setting<ByteSizeValue> MEMORY_PER_SHARD = Setting.byteSizeSetting( | ||
"memory_per_shard", | ||
(dummy) -> DEFAULT_MEMORY_PER_SHARD.getStringRep() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this validate that memory_per_shard
is not negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, done in 8eabd20
private final long shards; | ||
|
||
public FrozenShardsReason(long shards) { | ||
this.shards = shards; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate this is non-negative, and then use (read|write)Vlong()
in serialization? It's not a huge deal, just curious whether we expect this to ever be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, done in 8f975a0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments.
static final ByteSizeValue DEFAULT_MEMORY_PER_SHARD = ByteSizeValue.ofBytes(MAX_MEMORY.getBytes() / 2000); | ||
public static final Setting<ByteSizeValue> MEMORY_PER_SHARD = Setting.byteSizeSetting( | ||
"memory_per_shard", | ||
(dummy) -> DEFAULT_MEMORY_PER_SHARD.getStringRep(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style guides would recommend avoiding the use of terms such as this. How about ignored
or unused
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, fixed in 1596b6d
static long countFrozenShards(Metadata metadata) { | ||
return StreamSupport.stream(metadata.spliterator(), false) | ||
.filter(imd -> isFrozenIndex(imd.getSettings())) | ||
.mapToLong(IndexMetadata::getTotalNumberOfShards) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m curious why mapToLong
instead of mapToInt
here, and IntStream#sum
returning an int
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future proofing 🙂, avoiding to think about (int*long) and thinking we only support 64bit so it does not matter. I changed it to an int in 6c5ecde since this seems to be the "convention" for such a count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks Lee and Jason! |
The frozen tier only holds shared cache searchable snapshots. This commit adds an autoscaling decider that scales the total memory on the tier adequately to hold the shards. A frozen shard is assigned a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards before scaling further.
The frozen tier only holds shared cache searchable snapshots. This commit adds an autoscaling decider that scales the total memory on the tier adequately to hold the shards. A frozen shard is assigned a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards before scaling further.
Added documentation for the frozen shards decider. Relates #71042
Added documentation for the frozen shards decider. Relates elastic#71042
The reactive decider no longer applies to the frozen tier, since it could grossly over-estimate the amount of storage for unassigned frozen shards. Relates #71042
The reactive decider no longer applies to the frozen tier, since it could grossly over-estimate the amount of storage for unassigned frozen shards. Relates #71042
Frozen indices (partial searchable snapshots) require less heap per shard and the limit can therefore be raised for those. We pick 3000 frozen shards per frozen data node, since we think 2000 is reasonable to use in production. Relates elastic#71042 and elastic#34021
The frozen tier only holds shared cache searchable snapshots. This
commit adds an autoscaling decider that scales the total memory on
the tier adequately to hold the shards. A frozen shard is assigned
a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards
before scaling further.
The max shards validation limit will be relaxed in a separate PR.