Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] calico-typha cannot be scheduled on Spot VMs #3539

Closed
dv0gt opened this issue Mar 15, 2023 · 27 comments
Closed

[BUG] calico-typha cannot be scheduled on Spot VMs #3539

dv0gt opened this issue Mar 15, 2023 · 27 comments
Labels
action-required bug Needs Attention 👋 Issues needs attention/assignee/owner

Comments

@dv0gt
Copy link

dv0gt commented Mar 15, 2023

Describe the bug
We have an aks cluster with 2 nodepools, one containing Spot VMs. We also have calico as NetworkPolicy enabled for our aks.

Aks tries to rollout the calico-typha deployment with 3 replicas in the calico-system namespace, which fails on the Spot VMs due to missing tolerations: 1 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority: spot}

To Reproduce
Steps to reproduce the behavior:

  1. Deploy an aks (1.24.9) with calico enabled
  2. Create nodepool Azure Spot Instance feature enabled
  3. See error

Expected behavior
Spot nodes should be tolerated.

Screenshots
image
image

Environment (please complete the following information):

  • aks 1.24.9
@dv0gt dv0gt added the bug label Mar 15, 2023
@aresabalo
Copy link

bug present in aks v1.25.5 :-(

calico-system calico-typha-69998dd955-pv9tc 1/1 Running 0 9d
calico-system calico-typha-69998dd955-r6rgs 1/1 Running 0 9d
calico-system calico-typha-69998dd955-vf2n8 0/1 Pending 0 7d8h

Type Reason Age From Message


Warning FailedScheduling 24m (x709 over 2d4h) default-scheduler 0/5 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 node(s) had untolerated taint {kubernetes.azure.com/scalesetpriority: spot}. preemption: 0/5 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 Preemption is not helpful for scheduling.

@ghost ghost added the action-required label Apr 10, 2023
@ghost
Copy link

ghost commented Apr 15, 2023

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Apr 15, 2023
@ghost
Copy link

ghost commented Apr 30, 2023

Issue needing attention of @Azure/aks-leads

1 similar comment
@ghost
Copy link

ghost commented May 15, 2023

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented May 30, 2023

Issue needing attention of @Azure/aks-leads

4 similar comments
@ghost
Copy link

ghost commented Jun 15, 2023

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Jun 30, 2023

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Jul 15, 2023

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Jul 30, 2023

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

2 similar comments
Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

@agateaux
Copy link

agateaux commented Mar 6, 2024

I'm having the same issue on standard AKS instances.
When you have only one system node, and the rest of the nodes are tainted, calico-typha can't schedule more than one pod:
Warning FailedScheduling 67s default-scheduler 0/8 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 7 node(s) had untolerated taint {plotly-toleration: no-system-pod}. preemption: 0/8 nodes are available: 1 No preemption victims found for incoming pod, 7 Preemption is not helpful for scheduling

@agateaux
Copy link

agateaux commented Mar 6, 2024

Is there a way to modify calico manifest to add custom taints, like on the vanilla calico?

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

9 similar comments
Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

Copy link
Contributor

Issue needing attention of @Azure/aks-leads

@AllenWen-at-Azure
Copy link
Contributor

Close this issue as it was fixed in projectcalico/calico#7979

@Sissi44
Copy link

Sissi44 commented Sep 4, 2024

Hello,
The issue can be fixed by updating the helm release version of calico from Azure side. So is there any action done in Azure side already ? If so, in which version of AKS ? If nothing done, please reopen it
In our side no new toleration added in the calico-typha deployment still so it CANBE fixed by Azure, but NOT fixed yet.

Close this issue as it was fixed in projectcalico/calico#7979

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action-required bug Needs Attention 👋 Issues needs attention/assignee/owner
Projects
None yet
5 participants