desired capacity update does not work for node groups #835

amitsehgal · 2020-04-16T18:10:36Z

desired capacity update does not work for node groups

I'm submitting an issue, where I have tried to update min, max, desired variables for node groups. The terraform does shows min and max being changed, however the desired does not updated

[ *] bug report
feature request
support request - read the FAQ first!
kudos, thank you, warm fuzzy

What is the current behavior?

terraform code.

node_groups = {
    eks_nodegroup = {
      desired_capacity = 2
      max_capacity     = 4
      min_capacity     = 2

      instance_type = var.instance_type
      k8s_labels = {
        Environment = "sbx"
      }
      additional_tags = {
        "k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
        "k8s.io/cluster-autoscaler/enabled" = "true"
      }
    }
  }

output from plan and apply:

  ~ resource "aws_eks_node_group" "workers" {
.
.
.
.
.
      ~ scaling_config {
            desired_size = 1
          ~ max_size     = 3 -> 4
          ~ min_size     = 1 -> 2
        }
}

Error: error updating EKS Node Group (ce-eks-sbx:ce-eks-sbx-eks_nodegroup-lenient-blowfish) config: InvalidParameterException: Minimum capacity 2 can't be greater than desired size 1
{
ClusterName: "test-eks-sbx",
Message_: "Minimum capacity 2 can't be greater than desired size 1",
NodegroupName: "ce-eks-sbx-eks_nodegroup-lenient-blowfish"
}

i have also tried updating desired capacity through node_groups_defaults

If this is a bug, how to reproduce? Please include a code sample if relevant.

change the min, max and desired capacity

What's the expected behavior?

new scaling policies should take place.

Are you able to fix this problem and submit a PR? Link here if you have already.

No

Environment details

Affected module version:
OS:
Terraform version:

Any other relevant info

The text was updated successfully, but these errors were encountered:

karlderkaefer · 2020-04-25T10:47:11Z

it was caused by #691 it works in 8.0.0

meysammeisam · 2020-04-27T10:50:59Z

it was caused by #691 it works in 8.0.0

@karlderkaefer
What do you mean by 8.0.0?

karlderkaefer · 2020-04-27T11:56:52Z

I mean the git tag https://github.com/terraform-aws-modules/terraform-aws-eks/tree/v8.0.0

mandeburka · 2020-04-30T13:03:50Z

desired_capacity doesn't work for me as well. terraform-aws-eks version installed is v11.1.0

kuritonasu · 2020-05-05T13:24:12Z

I'm encountering this issue in v11.1.0 as well. I see the significance of #691 however the present issue prevents node group resizing as the author points out.

max-rocket-internet · 2020-05-05T15:12:28Z

#681
#678
#510 (comment)

🙂

max-rocket-internet · 2020-05-05T15:34:24Z

TLDR; you should be using the cluster-autoscaler. If not, you need to make the change manually.

kuritonasu · 2020-05-06T06:28:54Z

Consider the case where autoscaling is not desired but still want to resize my node group. And we do not wish to resize manually through the console, for the usual reasons. I don't believe this scenario is uncommon.

Also to be clear, I don't believe desired should be modified by this module by default as this could cause confusion and undesirable consequences. I am not arguing against #691, however there should be a way to override this behaviour.

max-rocket-internet · 2020-05-06T08:46:29Z

Sure but the problem is that there is no way to have optional lifecycle on resources, therefor we choose to support the most common option.

dmanchikalapudi · 2020-07-27T21:38:20Z

TLDR; you should be using the cluster-autoscaler. If not, you need to make the change manually.

@max-rocket-internet - Did you mean configure cluster-autoscaler within this module? If so, how? I dont see it in examples.

Also, I provisioned my node group with the following value

      desired_capacity = 4
      max_capacity     = 10
      min_capacity     = 4

At what point does the capacity go beyond 4? I tried to standup a t2.micro instance and try to scale pods beyond t2.micro's capacity. My cluster does not scale up to have more nodes.

PS: I am using V 12.2.0 of this module.

dmanchikalapudi · 2020-07-29T04:49:43Z

Did anyone here get the nodes to scale properly? I cannot get it to work no matter what I do.

kuritonasu · 2020-07-29T05:23:53Z

@max-rocket-internet - Did you mean configure cluster-autoscaler within this module? If so, how? I dont see it in examples.

@dmanchikalapudi cluster-autoscaler is not connected to nor can be configured through this module.

At what point does the capacity go beyond 4? I tried to standup a t2.micro instance and try to scale pods beyond t2.micro's capacity. My cluster does not scale up to have more nodes.

The desired_capacity value is ignored by the module. You have to modify it by hand through the console.

dmanchikalapudi · 2020-07-29T19:15:11Z

Thanks for the response @kuritonasu. Doing it by hand pretty much negates the idea behind "managed" nodegroups. There is no point in defining the min/max node counts either. It is just an illusion of autoscaling.

My need is simple. When my replicasets scale to initialize more pods than the nodes can run with, I need the nodes to scale to accommodate (assuming there is HW capacity underneath and is within the max node count). How do I go about making that happen via terraform?

max-rocket-internet · 2020-07-30T07:31:31Z

cluster-autoscaler is not connected to nor can be configured through this module.

Correct ✅

Doing it by hand pretty much negates the idea behind "managed" nodegroups.

Perhaps his doc might help you to see what is "managed" and what is not, specifically this image:

There is no point in defining the min/max node counts either. It is just an illusion of autoscaling.

I wouldn't say it's an illusion, it's just not a "turn-key" thing. ASGs have been around for years and work very well when configured correctly 🙂

My need is simple. When my replicasets scale to initialize more pods than the nodes can run with, I need the nodes to scale to accommodate (assuming there is HW capacity underneath and is within the max node count). How do I go about making that happen via terraform?

This is how typical autoscaling works in k8s but this module is only for the AWS resources. The cluster-autoscaler runs in your cluster and is not supported by us or this module in any way, it's a completely separate thing. But there is some doc here that might help you: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md

dploeger · 2020-09-01T10:38:30Z

@max-rocket-internet I'd say, that "managed" in the broader sense together with Terraform also means scaling the worker nodes by setting the desired size as I'm also managing the VPC configuration with Terraform (and everything in between actually :) )

So IMHO, IF setting the desired size is possible through the API it SHOULD be supported by this ressource.

elebertus · 2020-09-23T20:51:36Z

The practical use case that I have for this is that if I set a managed node group to desired 3, max 6, min 3, cluster autoscaler will respect this. There isn't a technical reason why we can't change the min_size, nor should it be dismissed as "not a feature"

So some concrete examples, since this has been a bit of noisy thread.

Here's an example initial definition of a scaling config as passed through node_groups in the eks cluster module:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 1
}

Then update this nodegroup's minimum to:

compute_1 = {
  desired_capacity = 1
  max_capacity     = 6
  min_capacity     = 3
}

You'll get an error like:

Error: error updating EKS Node Group (eks_cluster_1:compute_1) config: InvalidParameterException: Minimum capacity 3 can't be greater than desired size 1
{
  RespMetadata: {
    StatusCode: 400,
    RequestID: "<requestID>"
  },
  ClusterName: "eks_cluster_1",
  Message_: "Minimum capacity 3 can't be greater than desired size 1",
  NodegroupName: "compute_1"
}

Then running a state show, shows the obvious:

terraform state show module.eks_cluster_1.module.node_groups.aws_eks_node_group.workers[\"compute_1\"] | grep -A4 scaling_config
    scaling_config {
        desired_size = 1
        max_size     = 6
        min_size     = 1
    }

So this means that from a terraform perspective desired_capacity which translates into the scaling_config.desired_size is immutable. Which also means that desired_capacity can never be > the initial desired_capacity and min_capacity is effectively limited by this while you can still happily raise the max_capacity.

There's ways to work around this, such as getting the ASG id from the module and modifying it in terraform as part of the workflow, but that's a hack at best.

rsmets · 2020-11-30T18:54:06Z

One hacky workaround that I have found works is you can specify a different instance size which will then force a totally new node group to be created which will then respect your (new, "initial") desired_capacity setting. I sure any other hack which forces a new node group to be created would work as well.

I agree with many of the other thread comments, it really feels odd that desired_capacity is not actually mutable by terraform. That said I do not have a clear picture of what the aws interface is like - I'm sure it's easier said than done!

tolajuwon · 2020-12-10T21:34:14Z

I hacked it for now by using the value for the desired capacity in place of minimum capacity. At least if that's not a problem for your design, it works.

worker_groups = [
{
name = "worker-group-1"
key_name = var.worker_ssh_key_name
instance_type = var.worker_instance_type
asg_desired_capacity = var.worker_asg_desired_capacity
asg_max_size = var.worker_asg_max_size
asg_min_size = var.worker_asg_desired_capacity

},

]

stale · 2021-03-11T21:32:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

shoekstra · 2021-03-22T20:33:28Z

This issue is still relevant and needs fixing/review. We shouldn't need to create a new node group to change the desired size.

stale · 2021-04-22T12:33:54Z

This issue has been automatically closed because it has not had recent activity since being marked as stale.

ericlake · 2021-05-04T16:13:30Z

This is still an issue for our team. Terraform should be able to handle this.

bit-herder · 2021-05-05T16:03:53Z

please re open this. this is a major issue.

barryib · 2021-05-06T07:01:18Z

Re-opening this to let us track this issue. But so far we don't have a ideal fix for now.

Maybe hashicorp/terraform#24188 worth something.

khatritariq · 2021-05-25T12:32:48Z

Looking for its resolution.

loliveira-itp · 2021-05-25T15:14:06Z

Same, this issue affects my use case as well.

amazingguni · 2021-05-26T02:00:20Z

It is also problem for me

ayk33 · 2021-05-27T15:56:14Z

ditto. Need some kind of solution for this.

letusfly85 · 2021-05-28T10:07:00Z

I fixed it by adding the following field... but is it right?

Before

  node_groups = {
    main = {
      desired_capacity = 1
      max_capacity     = 1
      min_capacity     = 1
      instance_type    = "t2.small"
      subnets          = module.vpc.private_subnets
    }

After

  node_groups = {
    main = {
      // desired_capacity = 1
      desired_size = 1
      max_capacity     = 1
      min_capacity     = 1
      instance_type    = "t2.small"
      subnets          = module.vpc.private_subnets
    }

ayk33 · 2021-05-28T15:59:37Z

I fixed it by adding the following field... but is it right?

Before

  node_groups = {
    main = {
      desired_capacity = 1
      max_capacity     = 1
      min_capacity     = 1
      instance_type    = "t2.small"
      subnets          = module.vpc.private_subnets
    }

After

  node_groups = {
    main = {
      // desired_capacity = 1
      desired_size = 1
      max_capacity     = 1
      min_capacity     = 1
      instance_type    = "t2.small"
      subnets          = module.vpc.private_subnets
    }

I'm not sure how that would work? The lifecycle for the node_group ignores changes to desired_size
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/modules/node_groups/node_groups.tf#L75

letusfly85 · 2021-05-29T08:21:29Z

@ayk33

Thank you for replying, hmmm, you're right.
And this is my misunderstanding, thank you.

ayk33 · 2021-05-29T14:26:38Z

Re-opening this to let us track this issue. But so far we don't have a ideal fix for now.

Maybe hashicorp/terraform#24188 worth something.

Would it be possible to only have the desired capacity in the lifecycle rule if autoscaling is disabled?

daern91 · 2021-08-05T09:07:57Z

We're also running into problems with this one.

fitchtech · 2021-08-06T04:24:12Z

The problem with desired_size = each.value["desired_capacity"] is that if your node group auto scaling has scaled out then on a subsequent run of Terraform apply it will set the desired back to whatever is in your code. The problem is that the desired_capacity is required on create of the node group. You can then comment out desired_capacity or change the value to what's currently in the group, however that's a pain. Since if you need to recreate the node group you need to put desired_capacity back in.

Additionally, most updates to existing node groups with the EKS module in general fail. They do not trigger an in place update as they should. Instead triggering a replacement, which then fails cause it says the node group name already exists. Only way to fix that is to create a new node group with a different name.

taragurung · 2021-08-16T10:29:48Z

For me it was

lifecycle {
    ignore_changes = [scaling_config[0].desired_size]
  }

which ignores the changes in desired_size causing the issue so I commented it out and it worked and uncomment it if you want the feature back.

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group

stale · 2021-09-16T11:12:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2021-09-23T15:54:47Z

This issue has been automatically closed because it has not had recent activity since being marked as stale.

vladimir259 · 2021-09-27T15:10:38Z

still an issue

vladimir259 · 2021-09-27T15:11:05Z

please reopen

daroga0002 · 2021-09-28T09:54:19Z

this is duplicated in #1568

tech4242 · 2021-10-24T09:34:27Z

This is really tiresome and while the docs have slightly improved with the FAQ change I still feel like this module is useless for manual changes to EKS. I ended up doing it with the console:

Because going from:

node_groups = {
    first = {
      desired_capacity = 1
      max_capacity     = 5
      min_capacity     = 1

      instance_type = "m5.large"
    }
  }

to

node_groups = {
    first = {
      desired_capacity = 2
      max_capacity     = 5
      min_capacity     = 2

      instance_type = "m5.large"
    }
  }

produces Error: error updating EKS Node Group (xxx) config: InvalidParameterException: Minimum capacity 2 can't be greater than desired size 1.

If you do the change manually and to terraform plan you will see that the change has been made. Funnily enough if you then look at the actual tfstate:

only the min_size has changed:

"scaling_config": [
              {
                "desired_size": 1,
                "max_size": 5,
                "min_size": 2
              }
            ],

Proposal: can't we add something like autoscaling=false to let the module know that autoscaling is off, so desired_size is not as toothless?

In the meantime I did scale to 2 nodes and my tfstate is more or less correct (I have to swallow down the wrong desired_size) but this could not be any less fun to work with.

psyhomb · 2022-01-30T16:44:59Z

Still an issue, anything new on this?

daroga0002 · 2022-02-07T11:08:54Z

Still an issue, anything new on this?

This is not threaten as issue as this is working as expected and this is some compromise which we implemented in this module and no plans to change it (as this will impact for exmaple autoscaling)

psyhomb · 2022-03-25T21:06:05Z

We could maybe create two separate resources resource "aws_eks_node_group" "this" {} and resource "aws_eks_node_group" "this_autoscaling" {}, introduce a new input variable of type bool e.g. use_autoscaling and then set count value for first resource to something like count = var.create && !var.use_autoscaling ? 1 : 0 and count = var.create && var.use_autoscaling ? 1 : 0 for second resource, this resource will have simple lifecycle meta-argument w/o ignore_changes and the other one this_autoscaling w/ ignore_changes.

Example:

variable "use_autoscaling" {
  description = "Determines whether autoscaling will be used or not"
  type        = bool
  default     = true
}

resource "aws_eks_node_group" "this" {
  count = var.create && !var.use_autoscaling ? 1 : 0

  ...

  lifecycle {
    create_before_destroy = true
  }

  ...
}

resource "aws_eks_node_group" "this_autoscaling" {
  count = var.create && var.use_autoscaling ? 1 : 0

  ...

  lifecycle {
    create_before_destroy = true
    ignore_changes = [
      scaling_config[0].desired_size,
    ]
  }

  ...
}

We should also update variables in outputs.tf file accordingly.

I agree it is not the most elegant solution, redundant code and everything but it's the only solution I can think of given that conditional expressions are still not supported within lifecycle meta-argument block.

Speculor · 2022-04-28T17:09:52Z

This is still a major issue over two years after it was raised

bryantbiggs · 2022-04-28T17:11:44Z

This is still a major issue over two years after it was raised

It is not a major issue, it is a design decision the module has taken. The majority of Kubernetes/EKS users utilize some form of autoscaling and without variable support for ignore_changes by Terraform core, thats what we currently have

ecktom mentioned this issue Jan 4, 2021

Desired Capacity not tracking changes #1149

Closed

3 tasks

lloydChris mentioned this issue Jan 6, 2021

clarify desired_capacity must be changed in the console CircleCI-Public/server-terraform#65

Merged

stale bot added the stale label Mar 11, 2021

stale bot closed this as completed Apr 22, 2021

barryib reopened this May 6, 2021

stale bot removed the stale label May 6, 2021

stale bot added the stale label Sep 16, 2021

stale bot closed this as completed Sep 23, 2021

daroga0002 mentioned this issue Sep 28, 2021

docs: Updated faq about desired count of instances in node and worker groups #1604

Merged

1 task

tyen-brex mentioned this issue Jan 25, 2022

[EKS] [request]: Managed node group - Automatically updated desiredSize based on minSize or maxSize if needed aws/containers-roadmap#1637

Open

terraform-aws-modules locked as resolved and limited conversation to collaborators Apr 28, 2022

desired capacity update does not work for node groups #835

desired capacity update does not work for node groups #835

Comments

amitsehgal commented Apr 16, 2020

desired capacity update does not work for node groups

I'm submitting an issue, where I have tried to update min, max, desired variables for node groups. The terraform does shows min and max being changed, however the desired does not updated

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Any other relevant info

karlderkaefer commented Apr 25, 2020

meysammeisam commented Apr 27, 2020

karlderkaefer commented Apr 27, 2020

mandeburka commented Apr 30, 2020

kuritonasu commented May 5, 2020 • edited Loading

max-rocket-internet commented May 5, 2020

max-rocket-internet commented May 5, 2020

kuritonasu commented May 6, 2020

max-rocket-internet commented May 6, 2020

dmanchikalapudi commented Jul 27, 2020 • edited Loading

dmanchikalapudi commented Jul 29, 2020

kuritonasu commented Jul 29, 2020 • edited Loading

dmanchikalapudi commented Jul 29, 2020

max-rocket-internet commented Jul 30, 2020

dploeger commented Sep 1, 2020

elebertus commented Sep 23, 2020 • edited Loading

rsmets commented Nov 30, 2020

tolajuwon commented Dec 10, 2020

stale bot commented Mar 11, 2021

shoekstra commented Mar 22, 2021 • edited Loading

stale bot commented Apr 22, 2021

ericlake commented May 4, 2021

bit-herder commented May 5, 2021

barryib commented May 6, 2021 • edited Loading

khatritariq commented May 25, 2021

loliveira-itp commented May 25, 2021

amazingguni commented May 26, 2021

ayk33 commented May 27, 2021

letusfly85 commented May 28, 2021

ayk33 commented May 28, 2021

letusfly85 commented May 29, 2021 • edited Loading

ayk33 commented May 29, 2021

daern91 commented Aug 5, 2021

fitchtech commented Aug 6, 2021

taragurung commented Aug 16, 2021

stale bot commented Sep 16, 2021

stale bot commented Sep 23, 2021

vladimir259 commented Sep 27, 2021

vladimir259 commented Sep 27, 2021

daroga0002 commented Sep 28, 2021

tech4242 commented Oct 24, 2021

psyhomb commented Jan 30, 2022

daroga0002 commented Feb 7, 2022

psyhomb commented Mar 25, 2022 • edited Loading

Speculor commented Apr 28, 2022

bryantbiggs commented Apr 28, 2022

kuritonasu commented May 5, 2020 •

edited

Loading

dmanchikalapudi commented Jul 27, 2020 •

edited

Loading

kuritonasu commented Jul 29, 2020 •

edited

Loading

elebertus commented Sep 23, 2020 •

edited

Loading

shoekstra commented Mar 22, 2021 •

edited

Loading

barryib commented May 6, 2021 •

edited

Loading

letusfly85 commented May 29, 2021 •

edited

Loading

psyhomb commented Mar 25, 2022 •

edited

Loading