Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS agent_pool_profile block leads to cluster recreation #4987

Closed
secustor opened this issue Nov 26, 2019 · 10 comments
Closed

AKS agent_pool_profile block leads to cluster recreation #4987

secustor opened this issue Nov 26, 2019 · 10 comments

Comments

@secustor
Copy link
Contributor

secustor commented Nov 26, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.16
provider.azurerm v1.37.0

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

  agent_pool_profile {
    name     = "${var.aks_agent_pool_profile_name_prefix}${var.aks_agent_pool_profile_name_suffix[var.environment]}"
    vm_size  = var.aks_agent_pool_vm_size
    count    = var.aks_agent_pool_vm_count[var.environment]
    max_pods = var.aks_node_max_pods
    os_type  = "Linux"
    type     = "AvailabilitySet" # default
    vnet_subnet_id = azurerm_subnet.vnet01_aks01.id
  }

Debug Output

      ~ agent_pool_profile {
          - availability_zones    = [] -> null
            count                 = 4
          + dns_prefix            = (known after apply)
          - enable_auto_scaling   = false -> null
          - enable_node_public_ip = false -> null
          ~ fqdn                  = "<retracted>.hcp.westeurope.azmk8s.io" -> (known after apply)
          - max_count             = 0 -> null
            max_pods              = 100
          - min_count             = 0 -> null
            name                  = "<retracted>"
          - node_taints           = [] -> null
          ~ os_disk_size_gb       = 100 -> (known after apply)
            os_type               = "Linux"
            type                  = "AvailabilitySet"
            vm_size               = "Standard_D8_v3"
            vnet_subnet_id        = "/subscriptions/<retracted>/resourceGroups/<retracted>/providers/Microsoft.Network/virtualNetworks/<retracted>/subnets/<retracted>-aks01-s-subnet"
        }

      - default_node_pool {
          - availability_zones    = [] -> null
          - enable_auto_scaling   = false -> null
          - enable_node_public_ip = false -> null
          - max_count             = 0 -> null
          - max_pods              = 100 -> null
          - min_count             = 0 -> null
          - name                  = "<retracted>" -> null # forces replacement
          - node_count            = 4 -> null
          - node_taints           = [] -> null
          - os_disk_size_gb       = 100 -> null
          - type                  = "AvailabilitySet" -> null # forces replacement
          - vm_size               = "Standard_D8_v3" -> null # forces replacement
          - vnet_subnet_id        = "/subscriptions/<retracted>/resourceGroups/<retracted>/providers/Microsoft.Network/virtualNetworks/<retracted>-vnet01/subnets/<retracted>-aks01-s-subnet" -> null # forces replacement
        }

Panic Output

Expected Behavior

Only a deprecation warning is written during plan and apply operations.

Actual Behavior

Resource gets deleted and will be recreated on each apply.

Steps to Reproduce

  1. Deploy AKS cluster using version 1.36.1 with an agent_pool_profile block
  2. Plan with 1.37.0 --> suggests a recreation
  3. Apply leads to an endless recreation on every apply

Important Factoids

References

@md2k
Copy link

md2k commented Nov 27, 2019

You can try move your agent_pool_profile block to default_node_pool , move inside state agent_pool to default_pool.
if you have multiple agent_pool_profile, others should be moved to external resource(s) azurerm_kubernetes_cluster_node_pool

@secustor
Copy link
Contributor Author

@md2k agent_pool_profile and default_node_pool have not the same content. Further should agent_pool_profile, as I understand it from the changelog, only be deprecated not completely removed.

@tombuildsstuff
Copy link
Contributor

hey @secustor @md2k

Thanks for opening this issue.

As you've noticed in the changelog the azurerm_kubernetes_cluster resource went through a bunch of changes in the v1.37 release to account for the way the API now works. Due to this the agent_pool_profile block has been deprecated and replaced by the default_node_pool block and the separate azurerm_kubernetes_cluster_node_pool resource.

Whilst ultimately you'll need to switch from using the agent_pool_profiles block over to using the default_node_pool block & the new separate resource - in the interim you should be able to continue using the agent_pool_profile block by using Terraform's ignore_changes functionality, for example:

resource "azurerm_kubernetes_cluster" "test" {
   # ...
   
   agent_pool_profile { ... }

   lifecycle {
    ignore_changes = [
      "default_node_pool"
    ]
  }
}

Which should show no changes and continue to use the existing block.

Since the agent_pool_profile block will be removed in 2.0 - we'd suggest switching over to the default_node_pool block which you can do by adding that in and removing the ignore_changes element:

resource "azurerm_kubernetes_cluster" "test" {
   # ...
   
   default_node_pool { ... }
}

At which point you should be migrated across - would you be able to take a look and see if that works for you?

Thanks!

@t0klian
Copy link
Contributor

t0klian commented Nov 28, 2019

@tombuildsstuff

Hi!

The point here is that, despite default_node_pool is optional block, on second apply run terraform decides it should be created, so treat it as a mandatory, and eventually triggers AKS recreation.

Such behaviour is very destructive. For example you need re-run terraform code with updated provider against production cluster.

IMHO That is a bug which should be fixed

@t0klian
Copy link
Contributor

t0klian commented Nov 28, 2019

@secustor @md2k as a workaround you can add agent_pool_profile and default_node_pool into ignore_changes to prevent AKS recreation

@tombuildsstuff
Copy link
Contributor

tombuildsstuff commented Nov 28, 2019

@t0klian

The point here is that, despite default_node_pool is optional block, on second apply run terraform decides it should be created, so treat it as a mandatory, and eventually triggers AKS recreation.

Whilst we appreciate this isn't ideal - unfortunately this change is required due to a breaking change in the Azure API, and is documented in the changelog for the 1.37 release.

As mentioned above there's two approaches to working around this:

  1. Using ignore_changes on the agent_pool_profile block to continue using this block
  2. Switching over to using the default_node_pool block

Since the behaviour of the underlying API has changed, we've deprecated the existing agent_pool_profile block in favour of the default_node_pool block and will remove that in the upcoming 2.0 release - as such we'd strongly recommend switching to this block where possible.

As mentioned above it's unfortunate that we've had to ship a breaking change in a minor version - it's something we try to avoid - however since the API behaviour has changed sufficiently here, we ultimately decided that introducing a replacement block was a better approach in the long-term to give users a migration path here.

Thanks!

@t0klian
Copy link
Contributor

t0klian commented Nov 28, 2019

@tombuildsstuff

We'll try to migrate AKS to default_node_pool

Thanks for reply!

@secustor
Copy link
Contributor Author

secustor commented Dec 2, 2019

We will pin the provider to version 1.36.1 and try to migrate further down the road

@tombuildsstuff
Copy link
Contributor

👋

Since this should be solved by updating your Terraform Configuration I'm going to close this issue for the moment, but please let us know if that doesn't work and we'll take another look.

Thanks!

@ghost
Copy link

ghost commented Jan 2, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Jan 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants