feat!: add create_before_destroy=true to node pools #357

the-technat · 2023-04-25T07:24:55Z

Describe your changes

While migrating our node pools into the module, we noticed that the lifecycle attribute create_before_destroy is not set.

This is an attribute that should be set to true by default to allow for rolling-updates of the node pools whenever there are changes to them.

Please note that Terraform forbids adding a variable in the lifecycle block or even setting the lifecycle block to dynamic, that's why this value is hard coded, but it shouldn't harm anyone and terraform-aws-eks also has this hardcoded.

Issue number

Closes #358

Checklist before requesting a review

The pr title can be used to describe what this pr did in CHANGELOG.md file
I have executed pre-commit on my machine
I have passed pr-check on my machine

Thanks for your cooperation!

mkilchhofer · 2023-04-25T08:13:40Z

main.tf

@@ -592,6 +592,8 @@ resource "azurerm_kubernetes_cluster_node_pool" "node_pool" {
  depends_on = [azapi_update_resource.aks_cluster_post_create]

  lifecycle {
+    create_before_destroy = true


At least README notes are required if we "hardcode" this. Nodepool names need to be unique and create_before_destroy = true will create another nodepool with the same name by default.

Jep definitely. Since the change is hard coded (and can only be hard coded), I think we should also hard code a random suffix to the node pool?

Yeah I like this random suffix idea.

Btw, it's possible to make this create_before_destroy = true dynamically, we can declare two similar resources with a toggle switcher, one contains create_before_destroy = true and another doesn't. Like:

resource "azurerm_kubernetes_cluster_node_pool" node_pool { count = var.create_before_destroy ? 0 : 1 ... } resource "azurerm_kubernetes_cluster_node_pool" node_pool_create_before_destroy { count = var.create_before_destroy ? 1 : 0 ... create_before_destroy = true }

But in this case, we won't need this trick, I agree with you, the node pool should contain create_before_destroy.

I have another question @the-technat, even with create_before_destroy = true how can we achieve seamless node pool updates? The old pool would be destroyed immediately once the new pool is ready, would it cause downtime?

Another option is we maintain the node pool in green-blue pattern, the upgrade could create a blue pool, then drain the green pool, then destroy the green pool. I would consider it as plan B if re-creation of a node pool with create_before_destroy still cause downtime.

So the random suffix is implemented, WDYT?

Regarding the dynamically, I think it's not DRY if we have a switch-case for the node pools and all features must be implemented on both. In addition, we just add many more edge-cases I'm not sure we want to test them all, that's why I'm for hard coding that thing.

Regarding your other question, that totally depends on how the AzureRM API implements deletion of the old node pool. I assume it starts with cordening all the nodes and then draining them, which results in workload beeing moved to the new nodes. That should give you a seamless update. And in order to ensure total 100% availability of the apps, they have to use PDBs (which they should use anyway), to prevent K8s from draining nodes too fast (draining will respect PDBs and wait until the pods are started on the new nodes).

I'm thinking would the current implementation of using random_string's result be refreshed on the node pool's recreation?

Yeah, we do need sufficient tests for this Terraform circus show 😃

Hi @the-technat, I've done some tests on my side and the results are good, I've tried updating name, force new attributes and update-in-place attributes, and all scenarios work as we expected. I would encourage you to do the same tests on your side so we can be on the same page.

@lonegunmanb I did some tests with your suggestion and it worked fine. Now I also see why my apporach wouldn't work (it recreates the nodepool for every non-breaking attribute too...).

I updated the PR with your approach and fixed one test (now nodepool names can only be up to 8 chars since the module itself adds 4). Still not very happy with the null_resource, but I guess that's our only option. Maybe it's worth creating a feature request for the AKS API to implement a new field use_name_prefix=true on the nodepool that will automatically use your given name as prefix and inject a random suffix if the node pool get's recreated.

Sorry that it took so long unterstanding why your apporach was needed. Definitely underestimated the effect / effort to implement this "feature".

@the-technat Terraform 1.4 has added a new built-in resource terraform_data to replace null_resource, so I'm thinking of whether we should use this new resource or not. Using this new resource could set us free from null provider, but it also requires the Terraform Core's version must be >= 1.4.

This null_resource + md5 + replace_triggered_by + create_before_destroy pattern can be used in multiple scenarios, and I think pulumi has faced the very same problem so they've introduced auto-naming.

Interesting, terraform_data definitely looks like a better approach.

Because this is a breaking change anyway I guess we could do this. Otherwise I'd be afraid of bumping the Terraform version to >= 1.4 as this has affect on all parent / sibling-modules.

@mkilchhofer WDYT about using terraform_data?

lonegunmanb

Thanks @the-technat, LGTM! I'm merging this pr now, as for the terraform_data proposal, we can open a separate pr for it.

the-technat marked this pull request as draft April 25, 2023 08:08

the-technat mentioned this pull request Apr 25, 2023

Support for create_before_destroy on node pools #358

Closed

1 task

mkilchhofer reviewed Apr 25, 2023

View reviewed changes

the-technat force-pushed the fix/node-pool-lifecycle branch from 1ac2b9b to 21a96eb Compare April 25, 2023 08:48

the-technat changed the title ~~feat: add create_before_destroy=true to aks nodepool lifecycle~~ feat!: add create_before_destroy=true to node pools Apr 25, 2023

the-technat marked this pull request as ready for review April 25, 2023 08:59

the-technat force-pushed the fix/node-pool-lifecycle branch 4 times, most recently from eb2c38f to aeed5e6 Compare April 27, 2023 10:10

feat!: add create_before_destroy=true to node pools

dc4ac32

the-technat force-pushed the fix/node-pool-lifecycle branch from aeed5e6 to dc4ac32 Compare April 27, 2023 11:11

the-technat temporarily deployed to acctests April 28, 2023 07:30 — with GitHub Actions Inactive

lonegunmanb approved these changes May 4, 2023

View reviewed changes

lonegunmanb merged commit bc0c9fa into Azure:main May 4, 2023

the-technat deleted the fix/node-pool-lifecycle branch May 4, 2023 05:09

lonegunmanb added a commit to lonegunmanb/terraform-azurerm-aks that referenced this pull request May 16, 2023

add notice for Azure#357

c10b279

skolobov pushed a commit to skolobov/terraform-azurerm-aks that referenced this pull request Oct 29, 2023

feat!: add create_before_destroy=true to node pools (Azure#357)

b2d47e4

skolobov pushed a commit to skolobov/terraform-azurerm-aks that referenced this pull request Oct 29, 2023

add notice for Azure#357

a075136

zioproto mentioned this pull request Dec 19, 2023

Concern about random suffix on nodepools in version 7 #491

Closed

1 task

lonegunmanb mentioned this pull request Jan 11, 2024

Let the users decide whether adding a random suffix in cluster and pool's name or not. #496

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: add create_before_destroy=true to node pools #357

feat!: add create_before_destroy=true to node pools #357

the-technat commented Apr 25, 2023 •

edited

Loading

mkilchhofer Apr 25, 2023

the-technat Apr 25, 2023 •

edited

Loading

lonegunmanb Apr 25, 2023

the-technat Apr 25, 2023 •

edited

Loading

lonegunmanb Apr 25, 2023 •

edited

Loading

lonegunmanb Apr 26, 2023

lonegunmanb Apr 26, 2023 •

edited

Loading

the-technat Apr 27, 2023 •

edited

Loading

lonegunmanb Apr 28, 2023 •

edited

Loading

the-technat Apr 28, 2023 •

edited

Loading

lonegunmanb left a comment •

edited

Loading

feat!: add create_before_destroy=true to node pools #357

feat!: add create_before_destroy=true to node pools #357

Conversation

the-technat commented Apr 25, 2023 • edited Loading

Describe your changes

Issue number

Checklist before requesting a review

mkilchhofer Apr 25, 2023

Choose a reason for hiding this comment

the-technat Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

lonegunmanb Apr 25, 2023

Choose a reason for hiding this comment

the-technat Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

lonegunmanb Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

lonegunmanb Apr 26, 2023

Choose a reason for hiding this comment

lonegunmanb Apr 26, 2023 • edited Loading

Choose a reason for hiding this comment

the-technat Apr 27, 2023 • edited Loading

Choose a reason for hiding this comment

lonegunmanb Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

the-technat Apr 28, 2023 • edited Loading

Choose a reason for hiding this comment

lonegunmanb left a comment • edited Loading

Choose a reason for hiding this comment

the-technat commented Apr 25, 2023 •

edited

Loading

the-technat Apr 25, 2023 •

edited

Loading

the-technat Apr 25, 2023 •

edited

Loading

lonegunmanb Apr 25, 2023 •

edited

Loading

lonegunmanb Apr 26, 2023 •

edited

Loading

the-technat Apr 27, 2023 •

edited

Loading

lonegunmanb Apr 28, 2023 •

edited

Loading

the-technat Apr 28, 2023 •

edited

Loading

lonegunmanb left a comment •

edited

Loading