AzureCluster failureDomains do not always match specific control plane AzureMachine SKU #5033

nojnhuh · 2024-07-29T17:37:57Z

/kind bug

What steps did you take and what happened:

CAPZ's AzureCluster controller contains logic to set the status.failureDomains to the available availability zones for the region of the cluster resources:

cluster-api-provider-azure/azure/services/resourceskus/cache.go

Line 170 in 89efbf3

    
           func (c *Cache) GetZones(ctx context.Context, location string) ([]string, error) {

This function loops through all VM SKUs, returning any availability zone without restrictions for at least one SKU to be set in the AzureCluster's failureDomains. But when a specific VM SKU in a region has a Zone restriction preventing VMs from being created in specific zones, CAPZ may still list those zones in its status.failureDomains if some other SKU in the same region does not have the same restrictions, causing VMs to be created in zones with restrictions and fail.

What did you expect to happen:

The AzureCluster failureDomains match the availability of the specific SKU of control plane VM being used.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

cluster-api-provider-azure version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

mboersma · 2024-08-08T16:21:55Z

@nojnhuh should this be in the next milestone?

nojnhuh · 2024-08-08T16:29:11Z

I don't think this is something we need to do soon, at least since we've ironed out the regions we use in CI to avoid this.

willie-yao · 2024-08-15T16:45:27Z

/priority backlog

k8s-triage-robot · 2024-11-13T16:58:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

nojnhuh · 2024-11-13T17:14:17Z

/remove-lifecycle stale

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 29, 2024

This was referenced Jul 29, 2024

Update regions for new sub based on failures #5026

Merged

remove predefined availabilityZones from AKS e2e templates #5034

Closed

jsturtevant mentioned this issue Jul 29, 2024

Kubernetes e2e suite: [It] [sig-api-machinery] ResourceQuota should create a ResourceQuota and capture the life of a ResourceClaim [Feature:DynamicResourceAllocation] kubernetes/kubernetes#126364

Closed

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 15, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 13, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AzureCluster failureDomains do not always match specific control plane AzureMachine SKU #5033

AzureCluster failureDomains do not always match specific control plane AzureMachine SKU #5033

nojnhuh commented Jul 29, 2024

mboersma commented Aug 8, 2024

nojnhuh commented Aug 8, 2024

willie-yao commented Aug 15, 2024

k8s-triage-robot commented Nov 13, 2024

nojnhuh commented Nov 13, 2024

AzureCluster failureDomains do not always match specific control plane AzureMachine SKU #5033

AzureCluster failureDomains do not always match specific control plane AzureMachine SKU #5033

Comments

nojnhuh commented Jul 29, 2024

mboersma commented Aug 8, 2024

nojnhuh commented Aug 8, 2024

willie-yao commented Aug 15, 2024

k8s-triage-robot commented Nov 13, 2024

nojnhuh commented Nov 13, 2024