Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The evaluating of data source working with count is wrong (probably due to early evaluating) #19349

Closed
ckyoog opened this issue Nov 12, 2018 · 6 comments
Labels
bug config v0.11 Issues (primarily bugs) reported against v0.11 releases

Comments

@ckyoog
Copy link

ckyoog commented Nov 12, 2018

Terraform Version

0.11.10 (but I guess the bug may have been across multiple previous versions)

Description of the issue

The code I am going to show you is a demo code, not a real code used in practice. I simplify the code to make the bug easy to see. And It is a complete code, you can run it on you local machine.

First, let's see an ordinary code.

There is no code more ordinary than this. We have two resources (to simplify it, I use null_resource here), and one refers to (depends on) another. This code is the base of my whole talking.

locals {
  cfg = [
    {
      k = "1",
    },
    {
      k = "2",
    },
  ]
}

resource "null_resource" "a" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${lookup(local.cfg[count.index], "k")}"
  }
}

resource "null_resource" "b" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${null_resource.a.*.id[count.index]}"
  }
}

After running terraform apply, we create two resources which follow our cfgs defined in cfg.

Then, we add a new cfg { k = "0" } at the top of cfg, like

locals {
  cfg = [
    {
      k = "0",
    },
    {
      k = "1",
    },
    {
      k = "2",
    },
  ]
}

There is no any problems to run terraform plan regarding the new cfg. Everything works just fine.

Introduce intermediate object. Introduce error.

As the code grows, somehow, we are gonna need an intermediate object between resource a and b. That is b -> intermediate -> a. Like below,

locals {
  cfg = [
    {
      k = "1",
    },
    {
      k = "2",
    },
  ]
}

resource "null_resource" "a" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${lookup(local.cfg[count.index], "k")}"
  }
}

data "template_file" "intermediate" {
  count = "${length(local.cfg)}"
  template = "${null_resource.a.*.id[count.index]}"
}

resource "null_resource" "b" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${data.template_file.intermediate.*.rendered[count.index]}"
  }
}

(Here I use data source of template_file as intermediate object. It is an usual option for intermediate object before data source of null_data_source is available. However, the both data source have the same bug I am talking now, you can try null_data_source yourself. Cuz I've tried both of them myself)

Add a new cfg, { k = "0" }, like we did above, then run terraform plan, this time we will get error.

Error: Error refreshing state: 1 error(s) occurred:

* null_resource.b: 1 error(s) occurred:

* null_resource.b[2]: index 2 out of range for list data.template_file.intermediate.*.rendered (max 2) in:

${data.template_file.intermediate.*.rendered[count.index]}

An interesting workaround

In my practice, I found a workaround using local variable.

locals {
  cfg = [
    {
      k = "1",
    },
    {
      k = "2",
    },
  ]
}

resource "null_resource" "a" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${lookup(local.cfg[count.index], "k")}"
  }
}

locals {
  a_id = ["${null_resource.a.*.id}"]
}

data "template_file" "intermediate" {
  count = "${length(local.cfg)}"
  template = "${local.a_id[count.index]}"
}

resource "null_resource" "b" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${data.template_file.intermediate.*.rendered[count.index]}"
  }
}

Add a new cfg, { k = "0" }, like we did above, then run terraform plan, this time everything goes well again.

By adding a looks-meaningless local variable, the out-of-bound error disappears. Interesting. (Maybe the local variable defers the evaluating of template_file.)

Here is the plan output after new cfg { k = "0"} is added. I will compare to this output later.

Terraform will perform the following actions:

 <= data.template_file.intermediate[0]
      id:         <computed>
      rendered:   <computed>
      template:   "${local.a_id[count.index]}"

 <= data.template_file.intermediate[1]
      id:         <computed>
      rendered:   <computed>
      template:   "${local.a_id[count.index]}"

 <= data.template_file.intermediate[2]
      id:         <computed>
      rendered:   <computed>
      template:   "${local.a_id[count.index]}"

-/+ null_resource.a[0] (new resource required)
      id:         "316847756546036806" => <computed> (forces new resource)
      triggers.%: "1" => "1"
      triggers.k: "1" => "0" (forces new resource)

-/+ null_resource.a[1] (new resource required)
      id:         "4422028877002502008" => <computed> (forces new resource)
      triggers.%: "1" => "1"
      triggers.k: "2" => "1" (forces new resource)

  + null_resource.a[2]
      id:         <computed>
      triggers.%: "1"
      triggers.k: "2"

-/+ null_resource.b[0] (new resource required)
      id:         "1387999628575504367" => <computed> (forces new resource)
      triggers.%: "1" => <computed> (forces new resource)
      triggers.k: "316847756546036806" => "" (forces new resource)

-/+ null_resource.b[1] (new resource required)
      id:         "134253071867008266" => <computed> (forces new resource)
      triggers.%: "1" => <computed> (forces new resource)
      triggers.k: "4422028877002502008" => "" (forces new resource)

  + null_resource.b[2]
      id:         <computed>
      triggers.%: <computed>

A little deeper thinking. No-need-computed attribute

I noticed that the template_file.intermediate refers to id of null_resource.a, which needs to be computed. It might be the reason why the out-of-bound error occurs. Because null_resource.a[2] has not existed yet when evaluating data.template_file.intermediate. The id of null_resource.a[2] can't be computed, so terraform would probably ignore data.template_file.intermediate[2]. According to these thinking, I made such changes as below,

locals {
  cfg = [
    {
      k = "1",
    },
    {
      k = "2",
    },
  ]
}

resource "null_resource" "a" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${lookup(local.cfg[count.index], "k")}"
  }
}

data "template_file" "intermediate" {
  count = "${length(local.cfg)}"
  template = "${lookup(null_resource.a.*.triggers[count.index], "k")}"
}

resource "null_resource" "b" {
  count = "${length(local.cfg)}"
  triggers = {
    k = "${data.template_file.intermediate.*.rendered[count.index]}"
  }
}

(This time, instead, data.template_file.intermediate refers to triggers.k of null_resource.a, of which the value doesn't need to be computed, because the value can be established as long as terraform finishes reading the configuration file, no need to do any real things, like creating a resource.)

Add a new cfg, { k = "0" }, like we did above, then run terraform plan, this time everything goes well as well.

No errors. Looks good. But wait! Please review the plan output carefully. Do you find anything wrong?

Terraform will perform the following actions:

-/+ null_resource.a[0] (new resource required)
      id:         "316847756546036806" => <computed> (forces new resource)
      triggers.%: "1" => "1"
      triggers.k: "1" => "0" (forces new resource)

-/+ null_resource.a[1] (new resource required)
      id:         "4422028877002502008" => <computed> (forces new resource)
      triggers.%: "1" => "1"
      triggers.k: "2" => "1" (forces new resource)

  + null_resource.a[2]
      id:         <computed>
      triggers.%: "1"
      triggers.k: "2"

-/+ null_resource.b[0] (new resource required)
      id:         "1387999628575504367" => <computed> (forces new resource)
      triggers.%: "1" => "1"
      triggers.k: "316847756546036806" => "1" (forces new resource)

-/+ null_resource.b[1] (new resource required)
      id:         "134253071867008266" => <computed> (forces new resource)
      triggers.%: "1" => "1"
      triggers.k: "4422028877002502008" => "2" (forces new resource)

  + null_resource.b[2]
      id:         <computed>
      triggers.%: "1"
      triggers.k: "2"

Yes. The output shows that the values of triggers.k of null_resource.b[0-2] are 1, 2, 2, which should be 0, 1, 2. The value of trigger.k of null_resource.b comes from data.template_file.intermediate, so obviously the data.template_file.intermediate still holds the old value, does not get updated.

This is easy to be understood, I think. Because compared to the last "workaround" output, this output misses the part below, which is regarding the data source intermediate.

 <= data.template_file.intermediate[0]
      id:         <computed>
      rendered:   <computed>
      template:   "${local.a_id[count.index]}"

 <= data.template_file.intermediate[1]
      id:         <computed>
      rendered:   <computed>
      template:   "${local.a_id[count.index]}"

 <= data.template_file.intermediate[2]
      id:         <computed>
      rendered:   <computed>
      template:   "${local.a_id[count.index]}"

Sum up

  • Error occurs if data source intermediate refers to resource a directly. (Out-of-bound error)
  • No error occurs if adding a local variable between them.
  • No error occurs but result is actually wrong if data source intermediate refers to no-need-computed attribute of resource a.
@ckyoog ckyoog changed the title The evaluating of data source working with count is early The evaluating of data source working with count is wrong (probably due to early evaluating) Nov 12, 2018
@apparentlymart
Copy link
Contributor

Hi @ckyoog! Sorry for this strange behavior and thank you for investigating it so deeply.

I think you are right that this bug is caused by the evaluation of data resources early during the "refresh" phase, rather than during the "plan" phase where resources are processed.

This seems like a specific example of the general problems I was describing in #17034, where evaluating the data resource separately from everything else means that Terraform is often evaluating them in an complete environment, where resources have not yet been processed. In most cases this just results in using the old values for resource attributes in expressions, but in the situation you've found here it causes a new instance to not be available yet at all, which leads to downstream errors.

The good news is that I think the ideas discussed in #17034 will address this problem too, and your thorough reproduction cases here will be very helpful in testing that new behavior once we are ready to implement it in a future release.

Thanks again for the awesome investigation work here!

@ckyoog
Copy link
Author

ckyoog commented Nov 12, 2018

Hi @apparentlymart Thank you for your comment. I'm glad this is a known issue, and you have already had proposal to fix it. Hope terraform gets better and better.

@aserrallerios
Copy link

aserrallerios commented Mar 5, 2019

Hi guys, just wondering if this is the same error described here:

I got an error while adding additional subject_alternative_names to a aws_acm_certificate and then trying to create the corresponding dns records. For example, going from 0 alternative names to 1.

resource "aws_acm_certificate" "cert" {
  provider                    = "aws.virginia"
  domain_name                 = "${var.portal_domain}"
  subject_alternative_names   = "${var.portal_alt_names}"
  validation_method           = "DNS"
}

resource "aws_route53_record" "cert_validation" {
  depends_on = ["aws_acm_certificate.cert"]
  count   = "${length(var.portal_alt_names) + 1}"
  name    = "${lookup(aws_acm_certificate.cert.domain_validation_options[count.index], "resource_record_name")}"
  type    = "${lookup(aws_acm_certificate.cert.domain_validation_options[count.index], "resource_record_type")}"
  zone_id = "${var.route53_zone_id}"
  records = ["${lookup(aws_acm_certificate.cert.domain_validation_options[count.index], "resource_record_value")}"]
  ttl     = 300
}

The certificate should be updated before the lookup of the route53 records are evaluated. Right now with v0.11.11 it just fails:

Error: Error refreshing state: 1 error(s) occurred:
* module.portal.aws_route53_record.cert_validation: 1 error(s) occurred:
* module.portal.aws_route53_record.cert_validation[1]: index 1 out of range for list aws_acm_certificate.cert.domain_validation_options (max 1) in:
${lookup(aws_acm_certificate.cert.domain_validation_options[count.index], "resource_record_value")}

I've tried with taint:

terraform taint -module=portal aws_acm_certificate.cert

But no luck, got the same error.

I've also tried creating a intermediate local variables:

resource "aws_acm_certificate" "cert" {
  provider                    = "aws.virginia"
  domain_name                 = "${var.portal_domain}"
  subject_alternative_names   = "${var.portal_alt_names}"
  validation_method           = "DNS"
}

locals {
  xxx = ["${aws_acm_certificate.cert.domain_validation_options.*.resource_record_name}"]
  yyy = ["${aws_acm_certificate.cert.domain_validation_options.*.resource_record_type}"]
  zzz = ["${aws_acm_certificate.cert.domain_validation_options.*.resource_record_value}"]
}


resource "aws_route53_record" "cert_validation" {
  depends_on = ["aws_acm_certificate.cert"]
  count   = "${length(var.portal_alt_names) + 1}"
  name    = "${local.xxx[count.index]}"
  type    = "${local.yyy[count.index]}"
  zone_id = "${var.route53_zone_id}"
  records = ["${local.zzz[count.index]}"]
  ttl     = 300
}

But still doesn't work.

Is this the same bug described here? Any advice on how to workaround this problem?

@ckyoog
Copy link
Author

ckyoog commented Aug 2, 2019

@aserrallerios , it is not same bug. It is not even bug in your problem. It doesn't work not because of bug, but because of your misuse. Read more document please.

@hashibot hashibot added the v0.11 Issues (primarily bugs) reported against v0.11 releases label Aug 22, 2019
@mildwonkey
Copy link
Contributor

This has since been fixed in more recent versions of terraform - thanks everyone!

If you think there is still a bug in v0.13 (that there isn't already an issue open for) or you would like to make a feature request, please open a new issue and fill out the template.
Thanks!

@ghost
Copy link

ghost commented Oct 13, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Oct 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug config v0.11 Issues (primarily bugs) reported against v0.11 releases
Projects
None yet
Development

No branches or pull requests

5 participants