-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GKE clusters get two core nodes without being CPU / Memory constrained #2199
Comments
Aha, maybe it was always only Looking at events after draining a node core, I saw that its a
ports:
- containerPort: 5473
hostPort: 5473
name: calico-typha
protocol: TCP Hmmm... One can't force node-typha to run on the same node I guess. It is also controlled by...
Where the horizontal autoscaler decides on pod count and vertical on resources granted. The horizontal is described like this.
Editing the configmap to for example this made us get only one pod, which we will have until we have 10 nodes in total. At that point, it may not be so problematic if another core node is added. data:
ladder: |-
{
"coresToReplicas": [],
"nodesToReplicas":
[
[1, 1],
- [2, 1],
+ [10, 2],
[100, 3],
[250, 4],
[500, 5],
[1000, 6],
[1500, 7],
[2000, 8]
]
} |
Action pointsFigure out what makes sense to do with this.
I'm not sure, but I don't like any option. I lean towards adjusting the configmap with a |
To resolve issues, I think we should go for two nodes with 2:16 CPU:RAM. I don't want this to be tracked in this issue though. I've opened #2212 for this. |
I've applied the patch in #2199 (comment) to reduce costs for the callysto cluster, which can run with a single n2-highmem-2 node and only requires two core nodes because calico-typha's horizontal autoscaler. So, I've made the horizontal autoscaler only add another replica if we go up to 10 nodes for now. |
@consideRatio was this manually done? I am curious if it'll just come back on cluster upgrade. |
@yuvipanda this was manually applied change to the calico-typha-horizontal-autoscaler's configuration in a configmap, and it was not overridden when making either a node pool upgrade or k8s control plane upgrade! So, it seems like a change like this is quite robust! |
It seems that at least some GKE clusters, like
linked-earth
gets two core nodes instead of one, which is a waste of cloud resources I think. I think it is because the cluster-autoscaler scales up forkonnectivity-agent
so that it can have three pods in a 2+1 configuration instead of having all on the same node.This isn't suitable for us, and I'm not sure how we ought to avoid it. I think we can do
kubectl edit
on resources like theDeployment
of konnectivity-agent, or influence the konnectivity-agent-autoscaler that runsgke.gcr.io/cluster-proportional-autoscaler
. But, how to make this change so that its not reverted by GKE at a later point in time or similar?The problem stems from use of a cluster-proportional-autoscaler that adds one pod per node, but then the konnectivity-agent pod's don't tolerate the user nodes and stack up on core nodes. In a situation with two core nodes and one user node, we have 3+1 konnectivity-agent's running on the core nodes currently.
Related
topologySpreadConstraints
The text was updated successfully, but these errors were encountered: