Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling is flapping when idleReplicaCount != 0 #2314

Closed
mkuf opened this issue Nov 23, 2021 · 14 comments · Fixed by kedacore/keda-docs#1186
Closed

Scaling is flapping when idleReplicaCount != 0 #2314

mkuf opened this issue Nov 23, 2021 · 14 comments · Fixed by kedacore/keda-docs#1186
Labels
bug Something isn't working help wanted Looking for support from community known-issue stale All issues that are marked as stale due to inactivity

Comments

@mkuf
Copy link

mkuf commented Nov 23, 2021

Report

When creating a Scaledobject with idleReplicaCount > 0, the Target is scaled Up/Down every time the trigger gets polled.

Expected Behavior

Referring to the Docs, the Deployment should be scaled to idleReplicaCount if there is no activity on triggers and only scale to minReplicaCount if there is activity.

Actual Behavior

The Deployment is scaled to minReplicaCount, even if the length of the referred RabbitMQ Queue is 0.
Any number of Pods above idleReplicaCount get Terminated immediately after creation, so that they don't even properly start.

Steps to Reproduce the Problem

  1. Create a nginx-deployment
kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/application/deployment.yaml
  1. Create a scaledobject
cat <<EOF | kubectl apply -f -
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: nginx
spec:
  cooldownPeriod: 300
  fallback:
    failureThreshold: 3
    replicas: 4
  idleReplicaCount: 1
  maxReplicaCount: 30
  minReplicaCount: 2
  pollingInterval: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  triggers:
  - type: rabbitmq
    metadata:
      host: amqp://user:pass@rabbitmq:5672/
      mode: QueueLength
      queueName: alwaysempty
      value: "5"
EOF

Logs from KEDA operator

2021-11-23T07:13:37.964Z        INFO    scaleexecutor   Successfully set ScaleTarget replicas count to ScaledObject idleReplicaCount                                                                                                                                          {"scaledobject.Name": "nginx", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "Original Replicas Count": 2, "New Replicas Count": 1}
2021-11-23T07:13:52.139Z        INFO    controllers.ScaledObject        Reconciling ScaledObject                                                                                                                                                                              {"ScaledObject.Namespace": "default", "ScaledObject.Name": "nginx"}
2021-11-23T07:13:53.089Z        INFO    scaleexecutor   Successfully set ScaleTarget replicas count to ScaledObject idleReplicaCount                                                                                                                                          {"scaledobject.Name": "nginx", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "Original Replicas Count": 2, "New Replicas Count": 1}
2021-11-23T07:14:07.165Z        INFO    controllers.ScaledObject        Reconciling ScaledObject                                                                                                                                                                              {"ScaledObject.Namespace": "default", "ScaledObject.Name": "nginx"}
2021-11-23T07:14:08.157Z        INFO    scaleexecutor   Successfully set ScaleTarget replicas count to ScaledObject idleReplicaCount                                                                                                                                          {"scaledobject.Name": "nginx", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "Original Replicas Count": 2, "New Replicas Count": 1}
2021-11-23T07:14:22.185Z        INFO    controllers.ScaledObject        Reconciling ScaledObject                                                                                                                                                                              {"ScaledObject.Namespace": "default", "ScaledObject.Name": "nginx"}
2021-11-23T07:14:23.213Z        INFO    scaleexecutor   Successfully set ScaleTarget replicas count to ScaledObject idleReplicaCount                                                                                                                                          {"scaledobject.Name": "nginx", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "Original Replicas Count": 2, "New Replicas Count": 1}
2021-11-23T07:14:37.207Z        INFO    controllers.ScaledObject        Reconciling ScaledObject                                                                                                                                                                              {"ScaledObject.Namespace": "default", "ScaledObject.Name": "nginx"}
2021-11-23T07:14:38.270Z        INFO    scaleexecutor   Successfully set ScaleTarget replicas count to ScaledObject idleReplicaCount                                                                                                                                          {"scaledobject.Name": "nginx", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "Original Replicas Count": 2, "New Replicas Count": 1}
2021-11-23T07:14:52.229Z        INFO    controllers.ScaledObject        Reconciling ScaledObject                                                                                                                                                                              {"ScaledObject.Namespace": "default", "ScaledObject.Name": "nginx"}
2021-11-23T07:14:53.415Z        INFO    scaleexecutor   Successfully set ScaleTarget replicas count to ScaledObject idleReplicaCount                                                                                                                                          {"scaledobject.Name": "nginx", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "Original Replicas Count": 2, "New Replicas Count": 1}

KEDA Version

2.4.0

Kubernetes Version

1.21

Platform

Other

Scaler Details

rabbitmq

Anything else?

No response

@mkuf mkuf added the bug Something isn't working label Nov 23, 2021
@zroubalik zroubalik added this to the v2.5.0 milestone Nov 24, 2021
@zroubalik
Copy link
Member

@mkuf thanks for reporting, I can reproduce the problem. I'll try to fix this for the upcoming release.

@zroubalik zroubalik self-assigned this Nov 24, 2021
@zroubalik zroubalik modified the milestones: v2.5.0, v2.6.0 Nov 24, 2021
@zroubalik
Copy link
Member

zroubalik commented Nov 24, 2021

This problem is caused by HPA controller, when it finds out that the current replicas (idleReplicaCount) on the scale target is below min replicas(minReplicaCount) it will scale up:
https://github.com/kubernetes/kubernetes/blob/c8c81cbfbb381129c904ed1ab387744946cf807a/pkg/controller/podautoscaler/horizontal.go#L640-L642

If we set idleReplicaCount to 0, it works as expected, because in this case HPA controller ignores this resource:
https://github.com/kubernetes/kubernetes/blob/c8c81cbfbb381129c904ed1ab387744946cf807a/pkg/controller/podautoscaler/horizontal.go#L632-L636

To mitigate this problem we would need to refactor the current of KEDA behavior a lot. Thus I am marking this as a known issue and postponing to the next release.

@brunodasilvalenga
Copy link

Hey @zroubalik is there any way to have a workaround on it to when we have idleReplicaCount > 1 we use the number of messages always > 1. In that way, we always will have at least 1 pod running.

Keda is an awesome tool but we must have at least 1 pod running in our project.

@tomkerkhove tomkerkhove added the help wanted Looking for support from community label Jan 4, 2022
@tomkerkhove tomkerkhove removed this from the v2.6.0 milestone Jan 4, 2022
@or-shachar
Copy link
Contributor

or-shachar commented Jan 9, 2022

Let’s say that just like @brunodasilvalenga - we always need at least 1 pod running and using Cloudwatch scaler - is the following a valid workaround?

  1. Scaler on AWS/SQS/ApproximateNumberOfMessagesVisible and I set the minMetricValue to some negative number.
  2. Is it correct that [isActive will always return True] (https://github.com/kedacore/keda/blob/main/pkg/scalers/aws_cloudwatch_scaler.go#L306) in this case (because empty queue returns 0 which is always higher than negative number) ?
  3. Is it correct that while isActive returns True - I’ll always be at least in the minReplicaCount and never at the idleReplicaCount ?

@zroubalik
Copy link
Member

@brunodasilvalenga @or-shachar at the moment all you can do is to set minReplicaCount = 1, supporting idleReplicaCount > 0 requires a substantial change (we would need to modify HPA on the fly).

@brunodasilvalenga
Copy link

Thanks for the feedback @zroubalik. Do you still plan to have It in the next release as planned in the milestone 2.6.0?

@zroubalik
Copy link
Member

@brunodasilvalenga I think we are not going to make it for 2.6.0, nobody is commited to implement this atm. Willing to contribute this?

@brunodasilvalenga
Copy link

I would, but don't have that much experience in Go :(

@stale
Copy link

stale bot commented Mar 14, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Mar 14, 2022
@slobo
Copy link

slobo commented Mar 16, 2022

For anyone else who like me couldn't grok the documentation, I'll mention that omitting idleReplicaCount gives the "keep at least minReplicaCount pods running" behaviour.

That is, if you want to achieve what you hope idleReplicaCount: 1, minReplicaCount: 1 will do, then you just need to make sure idleReplicaCount isn't mentioned at all.

@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Mar 16, 2022
@zroubalik
Copy link
Member

@slobo could you please open a PR to docs repo with suggested fix, to make the documentation more clear? Thanks!

@stale
Copy link

stale bot commented May 15, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label May 15, 2022
@stale
Copy link

stale bot commented May 22, 2022

This issue has been automatically closed due to inactivity.

@ovooxo
Copy link

ovooxo commented Oct 24, 2023

May I ask if this issue has been fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Looking for support from community known-issue stale All issues that are marked as stale due to inactivity
Projects
None yet
7 participants