You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which component are you using?:
Cluster autoscaler
What version of the component are you using?:
v1.30.0
Component version:
v1.30.0
What k8s version are you using (kubectl version)?:
kubectl version Output
$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.3-eks-2f46c53
What environment is this in?:
EKS - AWS
What did you expect to happen?:
If the configured scale-down-delay-after-add time expires, the autoscaler will mark this node and proceed to execute a scale-down if it is not used anymore, unless there are pods in it that have the annotation “safe-to-evict: false”.
What happened instead?:
Once this time expires, if there are new pods in that node, the autoscaler proceeds to drain the node via the cluster API, causing eviction in those pods.
Note that all these pods have the annotation “safe-to-evict: false” but they are drained anyway.
How to reproduce it (as minimally and precisely as possible):
Reproducing it is my problem. I can't determine the cause but in our environments if we configure the autoscaler as follows
I attach an analysis I did, because this error appears only if I activate the autoscaler. If the cluster I leave enough fixed nodes for the loads of my pods, at no time Eviction problems occur, hence my concern and why I generate this ticket.
Exemplified case:
At 15:56 (UTC-3) the node is without any pod with the annotation “safe-to-evict:false” because the pod “5bc..” finishes its work (no eviction occurs). After 4 min the cluster schedules 2 pods with the annotation (“safe-to-evict:false”) but the node already has a “scale-down-delay-after-add” time expired.
The strange thing is that after that time the node is drained.
The autoscaler does not take into account the annotation “safe-to-evict:false” when the “scale-down-delay-after-add” expired.
I will also add a ticket I saw of similar behaviors due to a problem with this annotation, maybe they are related. #7244
Thanks.
The text was updated successfully, but these errors were encountered:
Which component are you using?:
Cluster autoscaler
What version of the component are you using?:
v1.30.0
Component version:
v1.30.0
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
EKS - AWS
What did you expect to happen?:
If the configured scale-down-delay-after-add time expires, the autoscaler will mark this node and proceed to execute a scale-down if it is not used anymore, unless there are pods in it that have the annotation “safe-to-evict: false”.
What happened instead?:
Once this time expires, if there are new pods in that node, the autoscaler proceeds to drain the node via the cluster API, causing eviction in those pods.
Note that all these pods have the annotation “safe-to-evict: false” but they are drained anyway.
How to reproduce it (as minimally and precisely as possible):
Reproducing it is my problem. I can't determine the cause but in our environments if we configure the autoscaler as follows
The number of pods with eviction increases. But if we configure the autoscaler by increasing the delay, the eviction is very markedly reduced.
Anything else we need to know?:
I attach an analysis I did, because this error appears only if I activate the autoscaler. If the cluster I leave enough fixed nodes for the loads of my pods, at no time Eviction problems occur, hence my concern and why I generate this ticket.
Exemplified case:
At 15:56 (UTC-3) the node is without any pod with the annotation “safe-to-evict:false” because the pod “5bc..” finishes its work (no eviction occurs). After 4 min the cluster schedules 2 pods with the annotation (“safe-to-evict:false”) but the node already has a “scale-down-delay-after-add” time expired.
The strange thing is that after that time the node is drained.
Drained pods:
Autoscaler Logs:
I also attach the autoscaler logs:
autoscaler-logs.txt
Pod with the annotation “safe-to-evict:false”.
Hypothesis
The autoscaler does not take into account the annotation “safe-to-evict:false” when the “scale-down-delay-after-add” expired.
I will also add a ticket I saw of similar behaviors due to a problem with this annotation, maybe they are related.
#7244
Thanks.
The text was updated successfully, but these errors were encountered: