Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter pod placement strategy for statefulsets #1654

Open
eugenea opened this issue Sep 11, 2024 · 1 comment
Open

Smarter pod placement strategy for statefulsets #1654

eugenea opened this issue Sep 11, 2024 · 1 comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@eugenea
Copy link

eugenea commented Sep 11, 2024

Description

What problem are you trying to solve?

Initial condition:

  • operator-managed statefulset with updateStrategy: {type: OnDelete}, in particular - solrcloud cluster.
  • ebs volume attached to each pod of the statefulset, meaning once pod launched in particular AWS AZ, it cannot change that AZ without deletion of the volume (i e data loss).
  • hard topology spread constraint between pods of the statefulset.

Failure scenario:
When the operator launches statefulset, all statefulset pods get created simultaneously and they get assigned to absolutely random AZ. Lets say we launched 9 replicas, and in some unlucky but highly probable state, first 3 pods will land in A AZ, then next 3 pods will be in B AZ and the last 3 pods will land in C AZ.

Now let's imagine that the customer decided to scale-down statefulset by 3 pods. Scaling down happens in particular order so 3 pods with the highest ordinal get removed (remember, this is a statefulset). What you left with is 3 pods in A AZ, 3 pods in B AZ and 0 pods in C AZ. Everything is still ok at this point, except that the cluster now is unbalanced and non-AZ-redundant. Now the customer decides to change pod spec which triggers pod restart. At this point 4 out of 6 pods will violate topology spread constraint because there are no pods in AZ C and pods will get stuck in pending state until their corresponding EBS volume is killed (which may create some scary situations).

Possible solution of the problem:

Karpenter needs to be statefulset-aware and it should evaluate pod constraints (or schedule pods) in order of statefulset pod ordenal increase, not at random, for each statefuset pod, so the constraint get satisfied in case of statefulset scale-down.

How important is this feature to you?

This prevents us from scaling down solr clusters which is pretty big deal.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@eugenea eugenea added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 11, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 11, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants