Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing count of instances destroys all of them #3885

Closed
Fodoj opened this issue Nov 12, 2015 · 28 comments
Closed

Changing count of instances destroys all of them #3885

Fodoj opened this issue Nov 12, 2015 · 28 comments

Comments

@Fodoj
Copy link

Fodoj commented Nov 12, 2015

Title says it all: I increased the count and all of previously created instances were destroyed. Doesn't seem like an expected behavior to me.

@Fodoj
Copy link
Author

Fodoj commented Nov 12, 2015

Using prevent_destroy flag doesnt help neither:

* openstack_compute_instance_v2.my_little_server: the plan would destroy this resource, but it currently has lifecycle.prevent_destroy set to true. To avoid this error and continue with the plan, either disable lifecycle.prevent_destroy or adjust the scope of the plan using the -target flag.

@jtopjian
Copy link
Contributor

Ouch. This wouldn't be good and definitely not what should happen.

What does your TF configuration look like? I created a basic instance with a count attribute set to 1, then increased it to 3. The original stayed online and accessible the entire time.

@Fodoj
Copy link
Author

Fodoj commented Nov 12, 2015

I am using user_data with tempalte_file renderer. Is that related to this issue? https://groups.google.com/forum/#!topic/terraform-tool/kvaaUlT6cg4

@Fodoj
Copy link
Author

Fodoj commented Nov 12, 2015

@jtopjian #2627

@jtopjian
Copy link
Contributor

Ah, that's quite possible. I've only had a chance to skim those two -- I'll look deeper later today.

@jtopjian
Copy link
Contributor

Do you mind providing the config you're using to cause this?

Any change to user_data will cause the instance to recreate. My guess is that your user_data isn't changing for existing resources, but for some reason Terraform thinks so. I think this may be a bug outside of the OpenStack provider, but if you can provide a config, I can take a look and see if it's fixable or if there's a workaround.

@Fodoj
Copy link
Author

Fodoj commented Nov 13, 2015

It behaves exactly the same way as described in this comment #2627 (comment)

@Fodoj
Copy link
Author

Fodoj commented Nov 13, 2015

It is indeed a bug with terraform. I am trying to figure what it is exactly, but basically the problem is that after increasing count hash of a string is calculated not from the result of rendered template but of a string like "${element(..., count.index)}" (I repeat: not the result of this interpolated function call, but string itself before interpolation). And then the hash of this string is always the same for all elements and changing it forces recreation.

@Fodoj
Copy link
Author

Fodoj commented Nov 13, 2015

Most likely related to #3864

@Fodoj
Copy link
Author

Fodoj commented Nov 13, 2015

After debugging and trying to find the source of the issue for 5 hours I can confirm that destruction of all instances happens exactly same reasons as here #3662 and here #2627 (comment).

Seems like there is a fatal bug in the implementation of element() function that leads to it being not interpolated at all.

It happens not only for user_data together with template_file, but also for block volumes, ip addresses etc (just like in #3864 ), so the only common piece is element() function and count variable.

I tried to fix it myself, but so far I just spent lots of time learning internals of terraform. Even though I feel like I know it much (much much) better now, it's still not enough to track down the source of the bug.

I would highly appreciate any tips on how to fix it or at least to which part of the source code I should stare at and debug longer, because this bug basically destroys not only instances, but the whole range of use cases for Terraform (most important one being the generation of user data with template_file).

My current suspects are config/interpolate_walk.go and config/raw_config.go but I might be absolutely wrong with this assumption.

@jtopjian
Copy link
Contributor

@Fodoj Thanks so much for digging into this. It sounds like this is a bug with core, so I'm going to label it as such.

@Fodoj
Copy link
Author

Fodoj commented Nov 14, 2015

@jtopjian so any tips on where to search for bug? I have some hours this weekend to work on this.

@Fodoj
Copy link
Author

Fodoj commented Nov 14, 2015

Fastests way to reproduce the bug:

variable node_count {} 

resource "template_file" "test_file" {
  filename = "test_1.yml"
  count = "${var.node_count}"
}

resource "template_file" "another_test_file" {
  filename = "test_2.yml"
  count = "${var.node_count}"
  vars {
    var1 = "${element(template_file.test_file.*.rendered, count.index)}"
  }
}

Increasing count by 1 leads to template_file.another_test_file.0 destruction.

@jtopjian
Copy link
Contributor

@Fodoj Off of the top of my head, I have no idea how to approach this one. If I have some time, I'll venture outside of the OpenStack provider in Terraform. At the moment, it's uncharted territory for me 😄

@Fodoj
Copy link
Author

Fodoj commented Nov 14, 2015

Seems to be the same issue #3449

@Fodoj
Copy link
Author

Fodoj commented Nov 14, 2015

Rolling #2788 back solves the issue, no more total destruction of previous resources. @mitchellh any idea for a better fix that won't lead to this bug?

@jgross206
Copy link

+1 I am also running into this issue. As @Fodoj suggested, rolling back #2788 and recompiling solves it (it's a 3 line change + tests) but I don't know enough about Terraform internals to know if that's the solution or just a workaround.

@Fodoj
Copy link
Author

Fodoj commented Nov 25, 2015

@jgross206 it is not a sustainable solution because rolling #2788 back re-introduces the other bug, that was fixed by this PR. For now I don't have time to go even deeper into TF internals to fix both of these bugs and still using own compiled version.

I wish @mitchellh or someone else from core contributors would pay more attention to this bug, because, as I mentioned already, it leads to a lot of different bugs across many important use cases of terraform.

@Fodoj
Copy link
Author

Fodoj commented Dec 3, 2015

Any update on this bug?

@justinclayton
Copy link
Contributor

+1. This severely limits us from using terraform to manage and mutate an existing stack in production.

@justinclayton
Copy link
Contributor

I think I've narrowed this down some more. It's not element() specifically, but the fact that anything looking at attributes that are marked as <computed> during the plan phase will also be recomputed, even if the resulting value ends up not changing at all. This is because Terraform does not re-evaluate whether something needs to be destroyed/rebuilt at any point during the apply phase, only during plan.

Also, we should remove the provider/openstack label for this issue based on the fact that @Fodoj 's template example is definitely reproducible.

@jtopjian
Copy link
Contributor

@phinze Thoughts?

@Fodoj Fodoj changed the title provider/openstack: changing count of instances destroys all of them Changing count of instances destroys all of them Jan 4, 2016
@hingstarne
Copy link

If it matters, i could reproduce it as well

@rgabo
Copy link

rgabo commented Jan 6, 2016

The same issue triggers recreation of aws_volume_attachment, aws_route53_record resources when using count and element in a cluster of instances where a single instance is supposed to be recreated.

Aggressive -target-ing of resources with explicit indexes often helps in similar cases but not this time. Example in our case would be:

script/plan/mongodb -target='aws_volume_attachment.mongodb-data[1]' -target='aws_instance.mongodb[1]'

Results in:

-/+ aws_volume_attachment.mongodb-data.0
    device_name:  "/dev/xvdf" => "/dev/xvdf"

    force_detach: "" => "<computed>"
    instance_id:  "i-deadbeef" => "${element(aws_instance.mongodb.*.id, count.index)}" (forces new resource)
    volume_id:    "vol-shelloil" => "vol-shelloil"

@calvn
Copy link

calvn commented Jan 8, 2016

I am also encountering this issue, more specifically with template_file and count.

@phinze
Copy link
Contributor

phinze commented Jan 13, 2016

Hey folks - sorry for the trouble here and many apologies for the delay on response. This is definitely an important core bug that I'd like to get fixed soon. There is a great summarizing write up in #4528 that walks through the shape of the problem and shows why a simple revert of #2788 won't work as a solution. This is indeed an expression of #3449, so I'm going to close this thread to consolidate the discussion over there. I'll follow up with a plan of action on that thread.

@jzampieron
Copy link

I'm still seeing this problem with v0.11.1.

@ghost
Copy link

ghost commented Apr 5, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants