Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform stucks when instance_count is more than 2 while using remote-exec provisioner #22343

Closed
anand-swaroop-git opened this issue Aug 6, 2019 · 10 comments
Labels
bug config provisioner/remote-exec v0.12 Issues (primarily bugs) reported against v0.12 releases

Comments

@anand-swaroop-git
Copy link

anand-swaroop-git commented Aug 6, 2019

Terraform Version

$ terraform -v
Terraform v0.12.6
provider.aws v2.23.0
provider.null v2.1.2

Originally, I was working with three remote-exec provisioners (Two of them involved rebooting the instance) without null_resource and for a single instance, everything worked absolutely fine.
I then needed to increase the count and based on several links, ended up using null_resource.
So, I have reduced the issue to the point where I am not even able to run one remote-exec provisioner for more than 2 Windows EC2 instances using null_resource.

Terraform Configuration File

//VARIABLES

variable "aws_access_key" {
  default = "AK"
}
variable "aws_secret_key" {
  default = "SAK"
}
variable "instance_count" {
  default = "3"
}
variable "username" {
  default = "Administrator"
}
variable "admin_password" {
  default = "Password"
}
variable "instance_name" {
  default = "Testing"
}
variable "vpc_id" {
  default = "vpc-id"
}

//PROVIDERS
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "ap-southeast-2"
}

//RESOURCES
resource "aws_instance" "ec2instance" {
  count         = "${var.instance_count}"
  ami           = "Windows AMI"
  instance_type = "t2.xlarge"
  key_name      = "ec2_key"
  subnet_id     = "subnet-id"
  vpc_security_group_ids = ["${aws_security_group.ec2instance-sg.id}"]
  tags = {
    Name = "${var.instance_name}-${count.index}"
  }
}

resource "null_resource" "nullresource" {
  count = "${var.instance_count}"
  connection {
    type     = "winrm"
    host     = "${element(aws_instance.ec2instance.*.private_ip, count.index)}"
    user     = "${var.username}"
    password = "${var.admin_password}"
    timeout  = "10m"
  }
   provisioner "remote-exec" {
     inline = [
       "powershell.exe Write-Host Instance_No=${count.index}"
     ]
   }
//   provisioner "local-exec" {
//     command = "powershell.exe Write-Host Instance_No=${count.index}"
//   }
//   provisioner "file" {
//       source      = "testscript"
//       destination = "D:/testscript"
//   }
}
resource "aws_security_group" "ec2instance-sg" {
  name        = "${var.instance_name}-sg"
  vpc_id      = "${var.vpc_id}"


//   RDP
  ingress {
    from_port   = 3389
    to_port     = 3389
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

//   WinRM access from the machine running TF to the instance
  ingress {
    from_port   = 5985
    to_port     = 5985
    protocol    = "tcp"
    cidr_blocks = ["CIDR"]
    }

  tags = {
    Name        = "${var.instance_name}-sg"
  }

}
//OUTPUTS
output "private_ip" {
  value = "${aws_instance.ec2instance.*.private_ip}"
}

Expected Behavior

If I set the count of the instance to 1. All the above provisioning steps work fine.
However, if I set the count of the instance to anything other than 1 (e.g. 2), Terraform consistently runs all the provisioning steps on both the instances however runs the LAST step (Write-Host THIRD) on ONLY ONE of the instances.

Actual Behavior

Observations:

With one remote-exec provisioner, it works fine if count is set to 1 or 2. With count 3, it's unpredictable that all the provisioners will run everytime on all the instances. However one thing is for sure that Terraform never completes and does not show the output variables. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
For the local-exec provisioner - Everything works fine. Tested with count's value as 1, 2 and 7.
For file provisioner its working fine for 1, 2 and 3 however does not finish for 7 but the file was copied on all the 7 instances. It keeps showing "null_resource.nullresource[count.index]: Still creating..."
Also, in every attempt, remote-exec provisioner is able to connect to the instances irrespective of count's value and it's just that, it's doesnt trigger the inline command and randomly chooses to skip that and starts showing "Still creating..." message.
I have been stuck with this issue for quite some time now. Couldnt find anything significant in debug logs as well. I know Terraform is not recommended to be used as a config mgmt tool however, everything's working fine even with complex provisioning scripts if the instance count is just 1 (Even without null_resource) which indicates that it should be easily possible for Terraform to handle such a basic provisioning requirement.

TF_DEBUG logs

https://gist.github.com/anand-swaroop-git/cd84b62226f2a3a9e8a225f8c0039ab4
https://gist.github.com/anand-swaroop-git/92015c1c8fc82ef3731d48d6258e89d4
https://gist.github.com/anand-swaroop-git/0355d7a60a609dcaaf7fbb31bf096f6c

Link on SO:
https://stackoverflow.com/q/57368506/10846194
Link on Hashicorp community:
https://discuss.hashicorp.com/t/terraform-stucks-when-instance-count-is-more-than-2-while-using-remote-exec-provisioner/2254

Steps to Reproduce

Just run the above TF template with required variables and a Windows AMI.
-->

@anand-swaroop-git
Copy link
Author

@apparentlymart : Could you please take a look at this (Based on other replies posted by you). Thanks in advance!

@anand-swaroop-git anand-swaroop-git changed the title Terraform remote-exec provisioner behaving inconsistently when count is anything other than 1 Terraform stucks when instance_count is more than 2 while using remote-exec provisioner Aug 12, 2019
@anand-swaroop-git
Copy link
Author

anand-swaroop-git commented Aug 14, 2019

I downgraded the version to v11.14 and that magically worked. Seems like a bug in v0.12.6.
More details can be found in the comments here

@hashibot hashibot added the v0.12 Issues (primarily bugs) reported against v0.12 releases label Aug 28, 2019
@mcascone
Copy link

Sounds the same as #22722

@danielcbright
Copy link

I can confirm I'm running into this exact same issue trying to provision 3 Windows 2016 boxes in AWS, Something is broken in 12.x 😿

@salatamartin
Copy link

salatamartin commented Dec 23, 2019

I was able to reproduce this issue more easily. For me, it gets stuck when I have more than 2 WinRM remote-exec provisioners present, e.g.:

resource "null_resource" "resource" {
  connection {
    type     = "winrm"
    host     = "<address>"
    user     = "<username>"
    password = "<password>"
    timeout  = "10m"
  }

  provisioner "remote-exec" {
    inline = [
      "echo first",
    ]
  }

  provisioner "remote-exec" {
    inline = [
      "echo second",
    ]
  }

  provisioner "remote-exec" {
    inline = [
      "echo third",
    ]
  }
}

This gets stuck on the last one. From what I was able to figure out, it hangs in the cleanupContent function of winrmcp package. More precisely, while trying to copy the command's stderr (here).

It works fine on terraform 0.11.14 and only becomes a problem with terraform 12. Also, If I only leave 2 remote-execs, it finishes successfully even with terraform 12. Also, it doesn't happen with remote-exec via SSH

Here is the trace log from my unsuccessful run: https://gist.github.com/salatamartin/e28f7b0985741d8c85035885ef0a4020

@apparentlymart
Copy link
Contributor

This seems related to #22006; perhaps the two have the same root cause.

@salatamartin
Copy link

I just tested my case with Terraform v0.13.4 and it seems to be working fine.

@dhekimian
Copy link

Anyone able to reproduce this? Looks to be resolved around Q4 2020.

@crw
Copy link
Collaborator

crw commented Jul 30, 2022

Thanks for the feedback. Going to close this issue. Please open any adjacent issues in a new ticket. Thank you!

@crw crw closed this as completed Jul 30, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug config provisioner/remote-exec v0.12 Issues (primarily bugs) reported against v0.12 releases
Projects
None yet
Development

No branches or pull requests

9 participants