Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

providers/aws: aws_autoscaling_group should depend on its aws_launch_configuration #1109

Closed
pmoust opened this issue Mar 3, 2015 · 38 comments

Comments

@pmoust
Copy link
Contributor

pmoust commented Mar 3, 2015

providers/aws: aws_autoscaling_group should depend on its aws_launch_configuration

When destroying an aws_autoscaling_group along with its assigned aws_launch_configuration I expect Terraform to first issue a destroy on the ASG, wait till its done, then delete the launch configuration.

What happens is;

* Cannot delete launch configuration trolololol because it is attached to AutoScalingGroup trolololol

To reproduce, tweak create and then remove the following

# base coreos CoreOS launch configuration
/*
resource "aws_launch_configuration" "trolololol" {
    name            = "trolololol"
    instance_type   = "t2.micro"
    image_id        = "ami-8297d4ea"
    security_groups = [ "${aws_security_group.pph_coreos.id}",
                        "${aws_security_group.pph_allow_elb.id}",
                        "${aws_security_group.pph_allow_tower.id}",
                        "${aws_security_group.pph_allow_vpn.id}",
                        "${aws_security_group.pph_admins.id}" ]
    user_data       = "${file("./coreos.yml")}"
    associate_public_ip_address = true
}

# the autoscaling group determines health based on ELB stats
resource "aws_autoscaling_group" "trolololol" {
    name                      = "trolololol"
    availability_zones        = [ "${aws_subnet.pph.*.availability_zone}" ]
    vpc_zone_identifier       = [ "${aws_subnet.pph.*.id}" ] 
    load_balancers            = [ "${aws_elb.trolololol.id}" ]
    min_size                  = 3
    max_size                  = 5
    desired_capacity          = 3
    health_check_type         = "EC2"
    health_check_grace_period = 300
    force_delete              = false
    launch_configuration      = "${aws_launch_configuration.trolololol.name}"
}

resource "aws_elb" "trolololol" {
    name            = "trolololol"
    subnets         = [ "${aws_subnet.pph.*.id}" ]
    security_groups = [ "${aws_security_group.pph_elb.id}",
                        "${aws_security_group.pph_admins.id}" ]

    listener {
        instance_port = 6001
        instance_protocol = "http"
        lb_port = 80
        lb_protocol = "http"
    }

    listener {
        instance_port = 6001
        instance_protocol = "http"
        lb_port = 443
        lb_protocol = "https"
        ssl_certificate_id  = "${var.pph_ssl_certificate}"
    }

    listener {
        instance_port = 7001
        instance_protocol = "tcp"
        lb_port = 7001
        lb_protocol = "tcp"
    }

    listener {
        instance_port = 4001
        instance_protocol = "tcp"
        lb_port = 4001
        lb_protocol = "tcp"
    }

    health_check {
        healthy_threshold = 2
        unhealthy_threshold = 5
        timeout = 4
        target = "HTTP:4001/version"
        interval = 5
    }
}
*/

I think it was working before.

Terraform v0.4.0-dev (23d90c0c02c10596eed79986e356b20bc6abb441)

@mitchellh
Copy link
Contributor

This is a separate issue. I'm not sure if we're tracking it, but this is due to the "eventually consistent" nature of AWS. I'm pretty sure there is a separate issue for this where we just have to do a stupid loop on the ASG to make it happen.

@jessem
Copy link

jessem commented Mar 9, 2015

I don't think terraform is trying to delete the ASG.

I'm noticing the same issue when I update the AMI for a launch configuration. The plan only implies that the launch configuration will be changed, so as expected when I try to apply the plan the error @pmoust posted is returned.

-/+ aws_launch_configuration.as_conf
    image_id:          "ami-old_id" => "ami-new_id" (forces new resource)
    instance_type:     "t2.micro" => "t2.micro"
    key_name:          "" => "<computed>"
    name:              "my_service" => "my_service"
    security_groups.#: "1" => "1"
    security_groups.0: "sg-81ee64e5" => "sg-81ee64e5"

@jessem
Copy link

jessem commented Mar 10, 2015

There's also an issue when making a change to both the autoscaling group and launch configuration at the same time where terraform deletes the launch configuration and then tries to modify it (which throws a launch configuration not found error).

To recreate try to change the AMI of a launch configuration, and change the launch configuration of an autoscaling group.

Terraform shows the following plan:

-/+ aws_autoscaling_group.foo
    availability_zones.#:      "1" => "1"
    availability_zones.0:      "us-east-1b" => "us-east-1b"
    default_cooldown:          "300" => "<computed>"
    desired_capacity:          "2" => "2"
    force_delete:              "true" => "1"
    health_check_grace_period: "300" => "300"
    health_check_type:         "ELB" => "ELB"
    launch_configuration:      "baz" => "bar" (forces new resource)
    load_balancers.#:          "1" => "1"
    load_balancers.0:          "foo-elb" => "foo-elb"
    max_size:                  "5" => "5"
    min_size:                  "2" => "2"
    name:                      "foo" => "foo"
    vpc_zone_identifier.#:     "1" => "1"
    vpc_zone_identifier.0:     "subnet-1058fb3b" => "subnet-1058fb3b"

-/+ aws_launch_configuration.bar
    image_id:          "ami-12b1957a" => "ami-ac3e19c4" (forces new resource)
    instance_type:     "t2.micro" => "t2.micro"
    key_name:          "" => "<computed>"
    name:              "bar" => "bar"
    security_groups.#: "1" => "1"
    security_groups.0: "sg-bf881ddb" => "sg-bf881ddb"

On execution the below error happens when Terraform deletes the launch configuration before it tries to modify it:

aws_autoscaling_group.foo: Destroying...
aws_autoscaling_group.foo: Destruction complete
aws_launch_configuration.bar: Destroying...
aws_launch_configuration.bar: Destruction complete
aws_launch_configuration.bar: Modifying...
  image_id:          "ami-12b1957a" => "ami-ac3e19c4"
  instance_type:     "t2.micro" => "t2.micro"
  key_name:          "" => "<computed>"
  name:              "bar" => "bar"
  security_groups.#: "1" => "1"
  security_groups.0: "sg-bf881ddb" => "sg-bf881ddb"
aws_launch_configuration.bar: Error: ValidationError: Launch configuration name not found - Launch configuration bar not found
Error applying plan:

1 error(s) occurred:

* ValidationError: Launch configuration name not found - Launch configuration bar not found

One additional thing to note is that changing a launch configuration should not require destroying the ASG as changing the launch configuration is part of the AutoScaling API:
http://docs.aws.amazon.com/AutoScaling/latest/APIReference/API_UpdateAutoScalingGroup.html

@catsby
Copy link
Contributor

catsby commented Mar 30, 2015

I believe this is a duplicate of #532.
Regardless, a way of doing a targeted update to the ASG (not full destroy/create) to update it's launch_configuration would be needed here

@willejs
Copy link

willejs commented Apr 1, 2015

targeted update to the launch_configuration 👍
Im just trying to add an iam_instance_profile...

@catsby
Copy link
Contributor

catsby commented Apr 1, 2015

I believe the original problem stated here is resolved in #1353 .

@pmoust
Copy link
Contributor Author

pmoust commented Apr 7, 2015

@catsby nop, this issue still persists

@jwaldrip
Copy link
Contributor

👍 issue still present. Also the SDK doesn't allow for LC updates, so the strategy would have to either be to create a new LC and associate it to the autoscaling groups before trying to destroy the old, or to destroy the autoscaling groups and recreating them. I would prefer the creation of a new LC and then re-associating it.

@CpuID
Copy link
Contributor

CpuID commented Apr 25, 2015

Any updates on this one? Hit it yesterday myself...

ASG's definitely take some time to delete (mainly if their terminating instances linked to them), but I found I can delete the LC's almost straight away via the console as soon as the ASG delete has been initiated. Not sure how that translates via the API though...

@rlister
Copy link

rlister commented Apr 30, 2015

The underlying problem here is that launch configurations are effectively immutable, so an update generally requires creation of a new launch config, and update (not new resource) of the autoscaling group to use it. Attempting to delete the existing launch config so it can be rebuilt with the same name will produce an error, as it is in use by the ASG.

In my own scripts I generally create new LCs with timestamps in the name, and update the ASG with the new name. Old LCs are kept around for a short time (in case of rollback) and then deleted once unused.

I'm not sure if there is an idiomatic way for terraform to handle this situation, which doesn't quite fit into the usual update OR delete/recreate choice.

@jessem
Copy link

jessem commented Apr 30, 2015

Attempting to delete the existing launch config so it can be rebuilt with the same name will produce an error, as it is in use by the ASG.

Recently discovered a (as far as I can tell) undocumented feature that helps with this problem at a terraform talk by @justincampbell. By adding the lifecycle create before destroy option as seen below, terraform will create the new launch configuration and attach it to your ASG before destroying the old one.

resource "aws_launch_configuration" "foo" {
  name = "foo"
  image_id = "${var.ami}"
  instance_type = "t2.micro"

  lifecycle { create_before_destroy = true }
}

@phinze
Copy link
Contributor

phinze commented Apr 30, 2015

@jessem note that to avoid LC name collisions when using create_before_destroy, you need to either omit the LC name to let terraform generate a unique one for you (new feature in master), or include something that changes interpolated into the name (like sticking the AMI at the end)

@ketzacoatl
Copy link
Contributor

This issue is still valid for 0.5.x yes? Should we be using create_before_destroy until otherwise noted? Is there an intention to address this issue outside of the create_before_destroy trick?

@rbowlby
Copy link

rbowlby commented Jun 12, 2015

Just ran into this as well. This is likely effecting any stack that uses the common elb+autoscale pattern.

@sheldonh
Copy link

You don't have to explicitly delete an aws_autoscaling_group to see this bug. If you, for example, change the image_id of the aws_launch_configuration, that's enough to trigger the bug.

This is an attempt to roll out a new AMI for Vault. The AMI is an atlas_artifact. I'm using Terraform v0.6.0-dev (ce8baea):

$ terraform plan
~ aws_autoscaling_group.hetzner-development-vault
    launch_configuration: "hetzner-development-vault" => "${aws_launch_configuration.hetzner-development-vault.id}"

-/+ aws_launch_configuration.hetzner-development-vault
    associate_public_ip_address: "false" => "0"
    ebs_block_device.#:          "0" => "<computed>"
    ebs_optimized:               "false" => "<computed>"
    image_id:                    "ami-31c2b946" => "ami-52367225" (forces new resource)
    instance_type:               "t2.micro" => "t2.micro"
    key_name:                    "heisenberg" => "heisenberg"
    name:                        "hetzner-development-vault" => "hetzner-development-vault"
    root_block_device.#:         "0" => "<computed>"
    security_groups.#:           "3" => "3"
    security_groups.1980580733:  "sg-5b18233e" => "sg-5b18233e"
    security_groups.3077299940:  "sg-27182342" => "sg-27182342"
    security_groups.54734731:    "sg-bac3f4df" => "sg-bac3f4df"
    user_data:                   "8e154addf6c9fc4833b86db7b8192c4cf328514a" => "8e154addf6c9fc4833b86db7b8192c4cf328514a"

Hmmm, okay, there's a lot of unexpected noise, but whatever, let's take that for a spin:

$ terraform plan
...
aws_launch_configuration.hetzner-development-vault: Destroying...
aws_launch_configuration.hetzner-development-vault: Error: 1 error(s) occurred:

* ResourceInUse: Cannot delete launch configuration hetzner-development-vault because it is attached to AutoScalingGroup hetzner-development-vault
    status code: 400, request id: [dbbc99c7-1995-11e5-a2f4-33bade2894bd]
Error applying plan:

2 error(s) occurred:

* ResourceInUse: Cannot delete launch configuration hetzner-development-vault because it is attached to AutoScalingGroup hetzner-development-vault
    status code: 400, request id: [dbbc99c7-1995-11e5-a2f4-33bade2894bd]
* aws_autoscaling_group.hetzner-development-vault: diffs didn't match during apply. This is a bug with Terraform and should be reported.

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

@joekhoobyar
Copy link
Contributor

The create_before_destroy trick doesn't work for us - it causes terraform to declare a cycle between the ASG and the LC.

@ajmath
Copy link
Contributor

ajmath commented Jul 1, 2015

Not being able to update a launch configuration's user-data or AMI is a major issue for my team. Something like the way suggested in #1552 would be ideal for us.

@ketzacoatl
Copy link
Contributor

@mitchellh, I still run into these issue with clean deploys and minor changes. One can carefully taint the right resource to work around the bug, but this is becoming a more important/painful snag for us.. while I don't want to add pressure, I would appreciate your feedback on where this and/or #1552 sit in the roadmap. Thanks!

@jszwedko
Copy link
Contributor

jszwedko commented Jul 1, 2015

@ketzacoatl for now you can tag launch configurations as create_before_destroy. Note that you also need to tag any resources that the launch configuration depends on as create_before_destroy as well to avoid cycles. You can use terraform graph to easily determine the LCs dependencies.

@ajlanghorn this might help you too.

Also @joekhoobyar, you may not be tagging the things the LC depends on (and its dependencies and so on) as create_before_destroy.

@pikeas
Copy link
Contributor

pikeas commented Jul 10, 2015

Bump, just bit by this as well.

@blewa
Copy link

blewa commented Jul 24, 2015

+1.

Currently we're manually deleting the LC and ASG before running apply. I added create_before_destroy to all the places that I could see as interlinked at this point in the graph but it still returns the Error creating launch configuration: AlreadyExists: Launch Configuration by this name already exists - A launch configuration already exists with the name blergblergblerg error.

@pikeas
Copy link
Contributor

pikeas commented Jul 24, 2015

@blewa Did you try removing the name field on the ASG resource? Terraform will generate a new name for you. I'd prefer to have control over the ASG name, but this hack has been working for me.

@blewa
Copy link

blewa commented Jul 24, 2015

@pikeas It looks like that's a required field...am I missing something?

@jszwedko
Copy link
Contributor

@blewa the docs are out of date, terraform will generate that field for you if left out.

@jessem
Copy link

jessem commented Jul 25, 2015

I concat the AMI to end of the name so that they never collide and I still
get something readable.

On Fri, Jul 24, 2015 at 7:59 PM, Jesse Szwedko [email protected]
wrote:

@blewa https://github.com/blewa the docs are out of date, terraform
will generate that field for you if left empty.


Reply to this email directly or view it on GitHub
#1109 (comment)
.

@blewa
Copy link

blewa commented Jul 27, 2015

@pikeas - Here's what I get when I remove the name field:

$ terraform plan
There are warnings and/or errors related to your configuration. Please
fix these before continuing.

Errors:

  * aws_autoscaling_group.main_asg: "name": required field is not set

$ terraform version
Terraform v0.6.1`

@pikeas
Copy link
Contributor

pikeas commented Jul 27, 2015

You're right, I had my wires crossed a bit. Remove the name on the launch configuration resource, not the ASG resource.

@stack72
Copy link
Contributor

stack72 commented Sep 17, 2015

@catsby / @phinze (not sure who else to tag here) - I believe this is not an issue anymore. The bug from the original issue was due to the name being set on the ASG - thoughts?

@phinze
Copy link
Contributor

phinze commented Sep 17, 2015

Good call, @stack72 - thanks! "This issue can be closed" pings are my favorite pings. 😀

@phinze phinze closed this as completed Sep 17, 2015
@ketzacoatl
Copy link
Contributor

Am I correct in understanding: We should allow Terraform to manage the name of the ASG, and all will be well when we create a new LC? Or what is the proper flow here?

@stack72
Copy link
Contributor

stack72 commented Sep 17, 2015

@ketzacoatl there was an update to the LaunchConfig docs last month:

In order to effectively use a Launch Configuration resource with an AutoScaling Group resource, it's recommend to omit the Launch Configuration name attribute, and specify create_before_destroy in a lifecycle block

Make sense?

@ketzacoatl
Copy link
Contributor

@stack72: Perfect! I missed that update, thanks for sharing.

@stack72
Copy link
Contributor

stack72 commented Sep 17, 2015

@ketzacoatl no worries, shout me if there are any issues with it

@pmoust
Copy link
Contributor Author

pmoust commented Nov 10, 2015

Terraform v0.6.6

Case: Change in cloudinit template of an AS Launch Configuration.
Expected result: Termination of ASG, retry till ASG terminated, remove AS Launch Configuration, create a new AS LC, create a new ASG based on the new configuration.
Actual result:

* aws_launch_configuration.elk_staging: ResourceInUse: Cannot delete launch configuration terraform-ekccojv3qvaz5aifqmbkjslth4 because it is attached to AutoScalingGroup elk_staging
    status code: 400, request id: f48144bc-8794-11e5-a435-770bbad5b2ec

I proceeded to add a lifecycle policy block (per discussion above).
Expected result: create new AS Launch Configuration, create new ASG with that configuration, Termination of previous ASG, retry till ASG terminated, destroy old AS LC.
Actual result:

Error running plan: 1 error(s) occurred:

* Cycle: template_file.elk-cloud-init_staging, aws_launch_configuration.elk_staging, aws_autoscaling_group.elk_staging, aws_launch_configuration.elk_staging (destroy), template_file.elk-cloud-init_staging (destroy)

Here's how the tf config looks like

resource "template_file" "elk-cloud-init_staging" {
    filename = "./templates/elk-staging.yml"
    vars {
        channel         = "stable"
        reboot-strategy = "off"
        role            = "elk"
    }
}
# base coreos CoreOS launch configuration
resource "aws_launch_configuration" "elk_staging" {
    instance_type   = "t2.large"
    image_id        = "ami-37bdc15d" # 766.5.0
    security_groups = [ "${aws_security_group.pph_coreos_staging.id}",
                        "${aws_security_group.elk_staging.id}",
                        "${aws_security_group.pph_allow_vpn.id}",
                        "${aws_security_group.pph_admins.id}" ]

    root_block_device {
        volume_size = 180
        volume_type = "gp2"
        delete_on_termination = true
    }

    user_data                   = "${template_file.elk-cloud-init_staging.rendered}"
    iam_instance_profile        = "${aws_iam_instance_profile.coreos.name}"
    associate_public_ip_address = true

    lifecycle {
        create_before_destroy = true
    }
}

resource "aws_autoscaling_group" "elk_staging" {
    name                      = "elk_staging"
    availability_zones        = [ "${aws_subnet.pph.*.availability_zone}" ]
    vpc_zone_identifier       = [ "${aws_subnet.pph.*.id}" ] 
    load_balancers            = [ "${aws_elb.elk_staging.id}" ]
    min_size                  = 3
    max_size                  = 3
    desired_capacity          = 3
    health_check_type         = "EC2"
    health_check_grace_period = 120
    force_delete              = true
    launch_configuration      = "${aws_launch_configuration.elk_staging.name}"

    tag {
        key = "Name"
        value = "ELK Staging"
        propagate_at_launch = true
    }
    tag {
        key = "Environment"
        value = "Staging"
        propagate_at_launch = true
    }
    tag {
        key = "Role"
        value = "ELK"
        propagate_at_launch = true
    }
}

@pmoust
Copy link
Contributor Author

pmoust commented Nov 10, 2015

Disregard my last comment.

The cyclic dependency issue was due to the template_file resource used to generate the user_data attribute for the aws_launch_configuration resource.

As soon as a lifecycle policy was added in the template_file all went smooth.

Sorry for necromancing the thread.

@ascendantlogic
Copy link

I am still running into this issue even though I have removed the names from the ASG's. Here I forgot to add the proper SSH key name into the LC so I do so:

~ aws_autoscaling_group.qa_api
    launch_configuration: "terraform-msr3hgjipjhcla3mrmkkke74za" => "${aws_launch_configuration.qa_api.name}"

~ aws_autoscaling_group.qa_worker
    launch_configuration: "terraform-zlk75c4jgbaifctkieybkqxif4" => "${aws_launch_configuration.qa_worker.name}"

-/+ aws_launch_configuration.qa_api
    associate_public_ip_address: "false" => "0"
    ebs_block_device.#:          "0" => "<computed>"
    ebs_optimized:               "false" => "<computed>"
    enable_monitoring:           "true" => "1"
    iam_instance_profile:        "qa_api" => "qa_api"
    image_id:                    "ami-5189a661" => "ami-5189a661"
    instance_type:               "m3.medium" => "m3.medium"
    key_name:                    "" => "development-us-west-2" (forces new resource)
    name:                        "terraform-msr3hgjipjhcla3mrmkkke74za" => "<computed>"
    root_block_device.#:         "0" => "<computed>"
    security_groups.#:           "2" => "2"
    security_groups.1827055751:  "sg-1fff8d7b" => "sg-1fff8d7b"
    security_groups.3047234712:  "sg-59461e3d" => "sg-59461e3d"

-/+ aws_launch_configuration.qa_worker
    associate_public_ip_address: "false" => "0"
    ebs_block_device.#:          "0" => "<computed>"
    ebs_optimized:               "false" => "<computed>"
    enable_monitoring:           "true" => "1"
    iam_instance_profile:        "qa_worker" => "qa_worker"
    image_id:                    "ami-5189a661" => "ami-5189a661"
    instance_type:               "m3.medium" => "m3.medium"
    key_name:                    "" => "development-us-west-2" (forces new resource)
    name:                        "terraform-zlk75c4jgbaifctkieybkqxif4" => "<computed>"
    root_block_device.#:         "0" => "<computed>"
    security_groups.#:           "2" => "2"
    security_groups.1122078793:  "sg-e4591b80" => "sg-e4591b80"
    security_groups.1827055751:  "sg-1fff8d7b" => "sg-1fff8d7b"

Then I attempt to do the apply:

Error applying plan:

2 error(s) occurred:

* aws_launch_configuration.qa_api: ResourceInUse: Cannot delete launch configuration terraform-msr3hgjipjhcla3mrmkkke74za because it is attached to AutoScalingGroup tf-asg-t2n4biof55aorau4sogfs3imwe
    status code: 400, request id: a8cefa06-9225-11e5-a10e-b5d9763bbe83
* aws_launch_configuration.qa_worker: ResourceInUse: Cannot delete launch configuration terraform-zlk75c4jgbaifctkieybkqxif4 because it is attached to AutoScalingGroup tf-asg-ukxofekq4jc2pc3h2r2du4uhyy
    status code: 400, request id: a8cfbd7b-9225-11e5-9309-e12e3c78ac45

This is with a version compiled out of master at this commit.

Here is one of the LC/ASG configs:

resource "aws_launch_configuration" "qa_api" {
  image_id = "${var.qa_api_ami}"
  instance_type = "${var.qa_api_instance_type}"
  iam_instance_profile = "${aws_iam_instance_profile.qa_api.name}"
  associate_public_ip_address = false

  key_name = "${aws_key_pair.development.key_name}"

  security_groups = [
    "${aws_security_group.dev_default.id}",
    "${aws_security_group.dev_api_instance.id}"
  ]
}

resource "aws_autoscaling_group" "qa_api" {
  max_size = 1
  min_size = 1
  desired_capacity = 1

  health_check_grace_period = 300
  health_check_type = "ELB"

  launch_configuration = "${aws_launch_configuration.qa_api.name}"

  vpc_zone_identifier = ["${aws_subnet.dev_private.*.id}"]

  load_balancers = ["${aws_elb.qa_api.id}", "${aws_elb.qa_app.id}"]

  lifecycle {
    create_before_destroy = true
  }

  wait_for_capacity_timeout = "0"

  tag {
    key = "Name"
    value = "qa-api-asg"
    propagate_at_launch = true
  }

  tag {
    key = "Environment"
    value = "qa"
    propagate_at_launch = true
  }

  tag {
    key = "Role"
    value = "api"
    propagate_at_launch = true
  }
}

@ascendantlogic
Copy link

Ok so it seems that my lifecycle block was in the wrong place. It belongs on the launch config, not the ASG.

@ghost
Copy link

ghost commented Apr 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests