Upgrading from 0.6.13 to 0.6.16 forced destroy/add #6798

joshpurvis · 2016-05-20T16:07:32Z

After attempting to upgrade from 0.6.13 to 0.6.16, it appears to have corrupted my tfstate file on the first tf plan. When I do a terraform plan it's saying:

Plan: 42 to add, 68 to change, 42 to destroy.

42 is the number of aws_instance resources I have.

At initial glance, it looks like the aws_security_groups have changed in every single one of them (But they haven't)

Here's one of my aws_instances (they all look pretty much like this, except with their respective security groups differing.

-/+ aws_instance.postgres
    ami:                        "ami-901e59f8" => "ami-901e59f8"
    availability_zone:          "us-east-1a" => "us-east-1a"
    ebs_block_device.#:         "0" => "<computed>"
    ephemeral_block_device.#:   "0" => "<computed>"
    iam_instance_profile:       "PrivateInstance" => "PrivateInstance"
    instance_state:             "running" => "<computed>"
    instance_type:              "m3.medium" => "m3.medium"
    key_name:                   "REDACTED" => "REDACTED"
    placement_group:            "" => "<computed>"
    private_dns:                "ip-10-0-2-10.ec2.internal" => "<computed>"
    private_ip:                 "10.0.2.10" => "<computed>"
    public_dns:                 "" => "<computed>"
    public_ip:                  "" => "<computed>"
    root_block_device.#:        "1" => "<computed>"
    security_groups.#:          "0" => "2" (forces new resource)
    security_groups.2810372704: "" => "sg-e939f48d" (forces new resource)
    security_groups.3361875115: "" => "sg-6d578c09" (forces new resource)
    source_dest_check:          "true" => "1"
    subnet_id:                  "subnet-8703b4de" => "subnet-8703b4de"
    tags.#:                     "1" => "1"
    tags.Name:                  "postgres" => "postgres"
    tenancy:                    "default" => "<computed>"
    user_data:                  "ac198541a10c3eb930d534aa99288a05172bcb2f" => "ac198541a10c3eb930d534aa99288a05172bcb2f"
    vpc_security_group_ids.#:   "2" => "<computed>"

And the resource itself looks like this (and the two security groups that were there below):

resource "aws_instance" "postgres" {
    ami = "ami-901e59f8"
    instance_type = "m3.medium"
    subnet_id = "${aws_subnet.middle.id}"
    key_name = "REDACTED"
    availability_zone = "us-east-1a"
    iam_instance_profile = "PrivateInstance"
    tags {
        Name = "postgres"
    }
    security_groups = [
        "${aws_security_group.private.id}",
        "${aws_security_group.postgres.id}",
    ]
    user_data = "${file("cloud-config/postgres")}"
}

resource "aws_security_group" "postgres" {
    name = "postgres"
    description = "Default ports for postgres"
    vpc_id = "${aws_vpc.main.id}"

    egress {
        from_port = 0
        to_port = 0
        protocol = "-1"
        cidr_blocks = ["0.0.0.0/0"]
    }

    ingress {
        from_port = 5432
        to_port = 5432
        protocol = "tcp"
        security_groups = [
            "${aws_security_group.web.id}",
            "${aws_security_group.web-backend.id}",
            "${aws_security_group.private.id}"
        ]
    }
    ingress {
        from_port = 6432
        to_port = 6432
        protocol = "tcp"
        security_groups = [
            "${aws_security_group.web.id}",
            "${aws_security_group.web-backend.id}",
            "${aws_security_group.private.id}"
        ]
    }
}

resource "aws_security_group" "private" {
    name = "private"
    description = "Security group for private instances, only accessible via NAT"
    vpc_id = "${aws_vpc.main.id}"
    ingress {
        from_port = 22
        to_port = 22
        protocol = "tcp"
        security_groups = ["${aws_security_group.nat.id}"]
    }
}

Then I noticed this change on some of my aws_route_tables:

~ aws_route_table.r_private
    route.1392463699.cidr_block:                 "0.0.0.0/0" => ""
    route.1392463699.gateway_id:                 "" => ""
    route.1392463699.instance_id:                "i-9d604f71" => ""
    route.1392463699.nat_gateway_id:             "" => ""
    route.1392463699.network_interface_id:       "eni-0a0d2252" => ""
    route.1392463699.vpc_peering_connection_id:  "" => ""
    route.~2469360411.cidr_block:                "" => "0.0.0.0/0"
    route.~2469360411.gateway_id:                "" => ""
    route.~2469360411.instance_id:               "" => "${aws_instance.nat.id}"
    route.~2469360411.nat_gateway_id:            "" => ""
    route.~2469360411.network_interface_id:      "" => ""
    route.~2469360411.vpc_peering_connection_id: "" => ""

Here's the resource for that:

resource "aws_route_table" "r_private" {
    vpc_id = "${aws_vpc.main.id}"
    route {
        cidr_block = "0.0.0.0/0"
        # instance_id = "i-9d604f71"
        instance_id = "${aws_instance.nat.id}"
    }
}

It doesn't appear to be interpolating the dynamic ${aws_instance.nat.id} variable. As a test, I hard coded that instance ID (as shown in commented line above) in the aws_route_table.r_private resource, and it removed this required change from my plan.

Given this, I assume something similar is happening with the security groups not being interpolated, and thus forcing new resources for all my instances.

Another thing I noticed, and fixed similarly was this one:

~ aws_vpc.main
    instance_tenancy: "default" => ""

I didn't have instance_tenancy in my aws_vpc resource. I Added it, and set to default, and it removed this from the plan as well.

Every aws_route53_record is also needing to be changed, they all look like this in the plan:

~ aws_route53_record.postgres
    records.#: "" => "<computed>"

Possible affected resource types:

aws_instance
aws_route53_record
aws_route_table
aws_security_group
aws_vpc

Things I've tried:

Using the terraform.tfstate.backup file*
Tried versions 0.6.13 through 0.6.16, and the same behavior occurs.
~~I also have older state files in source control, but haven't attempted those yet, as I'm not exactly sure which one to use.~~ (see comment below)

Any help would be appreciated. Let me know if there's any other info that would help diagnose the situation

Thanks!

The text was updated successfully, but these errors were encountered:

joshpurvis · 2016-05-20T17:48:24Z

Update: I reverted terraform.tfstate from source control. Ran a plan on 0.4.13, problem goes away. Revert terraform.tfstate again, update to 0.4.16 (or any version between), and problem as described above occurs again.

Here's the git diff of the terraform.tfstate file after running a plan on 0.4.16: https://gist.github.com/joshpurvis/cbe93fbb1850d5d158b16e024c3b165a

As you can see, on every single instance, it's basically just setting disable_api_termination to false, and then removing my security groups:

-                            "security_groups.#": "2",
-                            "security_groups.2810372704": "sg-e939f48d",
-                            "security_groups.654669085": "sg-efb6b696",
+                            "security_groups.#": "0",

Also, at the bottom you can see it attempting to recreate the VPC, due to a missing instance_tenancy attribute, as I describe above.

johnnyshields · 2016-05-20T17:49:03Z

Reading this sort of issue report scares me immensely.

catsby · 2016-05-20T21:42:22Z

Hey @joshpurvis there was a regression in v0.6.16 regarding AWS Instances and security groups. If your instances are in a VPC, you need to use vpc_security_group_ids instead of security_groups for declaring the security groups.

The regression has been reverted in master, but not yet released. If you want to upgrade to v0.6.16 please use vpc_security_group_ids instead.

I explain in more detail here:

Security group --> vpc_security_group_ids change bug #6416

Please try vpc_security_group_ids and let me know!

joshpurvis · 2016-05-31T03:54:09Z

Got caught up in work, but for future searchers-- this was in fact the issue.

Thanks @catsby!

catsby · 2016-06-03T18:39:08Z

Thanks for following up @joshpurvis !

ghost · 2020-04-25T02:13:54Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

joshpurvis changed the title ~~Upgrading from 0.6.14 to 0.6.16 forced destroy/add~~ Upgrading from 0.6.13 to 0.6.16 forced destroy/add May 20, 2016

catsby added bug provider/aws labels May 20, 2016

catsby closed this as completed May 20, 2016

ghost locked and limited conversation to collaborators Apr 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading from 0.6.13 to 0.6.16 forced destroy/add #6798

Upgrading from 0.6.13 to 0.6.16 forced destroy/add #6798

joshpurvis commented May 20, 2016 •

edited

Loading

joshpurvis commented May 20, 2016

johnnyshields commented May 20, 2016

catsby commented May 20, 2016

joshpurvis commented May 31, 2016

catsby commented Jun 3, 2016

ghost commented Apr 25, 2020

Upgrading from 0.6.13 to 0.6.16 forced destroy/add #6798

Upgrading from 0.6.13 to 0.6.16 forced destroy/add #6798

Comments

joshpurvis commented May 20, 2016 • edited Loading

joshpurvis commented May 20, 2016

johnnyshields commented May 20, 2016

catsby commented May 20, 2016

joshpurvis commented May 31, 2016

catsby commented Jun 3, 2016

ghost commented Apr 25, 2020

joshpurvis commented May 20, 2016 •

edited

Loading