Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow min_node_count to be zero #408

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gtirloni
Copy link

The current behavior sets min_node_count to 1 if the user specifies it should be zero. Zero is supported by both Magnum and the Cluster Autoscaler.

This sets it to the default value 1 (same as Magnum when min_node_count is not specified) only when it was not specified.

Related:

The current behavior sets min_node_count to 1 if the user specifies it
should be zero. Zero is supported by both Magnum and the Cluster
Autoscaler.

This sets it to the default value 1 (same as Magnum when min_node_count
is not specified) only when it was not specified.

Related:
* Magnum: https://review.opendev.org/c/openstack/magnum/+/737580
* Autoscaler: kubernetes/autoscaler#3995
@gtirloni
Copy link
Author

This unblocks the scaling down but scaling up later doesn't work because the autoscaler can't find a nodegroup to scale. That seems to happen because it iterates over all the existing nodes to find one that has a nodegroup it can scale up... since the nodegroup node_count is currently zero, there are not nodes (and it correctly ignores the controlplane node):

I0715 18:03:59.737912       1 filter_out_schedulable.go:63] Filtering out schedulables
I0715 18:03:59.738021       1 klogx.go:87] failed to find place for kube-system/csi-cinder-controllerplugin-cd7ffbdf9-ml59v: cannot put pod csi-cinder-controllerplugin-cd7ffbdf9-ml59v on any node
I0715 18:03:59.738092       1 klogx.go:87] failed to find place for kube-system/openstack-manila-csi-controllerplugin-0: cannot put pod openstack-manila-csi-controllerplugin-0 on any node
I0715 18:03:59.738149       1 klogx.go:87] failed to find place for test/force-autoscaler: cannot put pod force-autoscaler on any node
I0715 18:03:59.738168       1 filter_out_schedulable.go:120] 0 pods marked as unschedulable can be scheduled.
I0715 18:03:59.738191       1 filter_out_schedulable.go:83] No schedulable pods
I0715 18:03:59.738204       1 filter_out_daemon_sets.go:40] Filtering out daemon set pods
I0715 18:03:59.738218       1 filter_out_daemon_sets.go:49] Filtered out 0 daemon set pods, 3 unschedulable pods left

And later (correctly ignoring controplane nodes):

I0715 18:03:59.739693       1 pre_filtering_processor.go:57] Node kube-e9yct-t45jx-7g8rp should not be processed by cluster autoscaler (no node group config)
I0715 18:03:59.739796       1 static_autoscaler.go:623] Scale down status: lastScaleUpTime=2024-07-15 16:21:17.926417071 +0000 UTC m=-3596.573195991 lastScaleDownDeleteTime=2024-07-15 16:21:17.926417071 +0000 UTC m=-3596.573195991 lastScaleDownFailTime=2024-07-15 16:21:17.926417071 +0000 UTC m=-3596.573195991 scaleDownForbidden=false scaleDownInCooldown=false

Related:

@mnaser
Copy link
Member

mnaser commented Jul 16, 2024

I don't think this is technically a valid fix, since scale out from zero is not possible, so that kind puts the user in a bad position, right?

@gtirloni
Copy link
Author

gtirloni commented Jul 18, 2024

I don't think this is technically a valid fix, since scale out from zero is not possible, so that kind puts the user in a bad position, right?

Itss supposed to work but there's an upstream bug about it. The cluster-autoscaler magnum driver was updated to allow for this situation.

In any case, Magnum accepts min_node_count=0 and MCAPI is overriding that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants