Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAT Gateways failed during creation and I can't force them to recreate. #8687

Closed
erutherford opened this issue Sep 6, 2016 · 8 comments · Fixed by #8689
Closed

NAT Gateways failed during creation and I can't force them to recreate. #8687

erutherford opened this issue Sep 6, 2016 · 8 comments · Fixed by #8689

Comments

@erutherford
Copy link
Contributor

Terraform Version

v0.7.2

Affected Resource(s)

aws_nat_gateway

Expected Behavior

If the NAT Gateways are in a failed state, terraform should discard them and recreate new NAT gateways without having to force this with the terraform taint command.

Per the API documentation if the state is failed, deleting, or deleted terraform should force it's re-creation.

Actual Behavior

Terraform isn't recognizing the NAT Gateways are in a deleted state and is continuing to use them which will fail because they've been deleted.

Currently, when I run terraform plan there's no output regarding the NAT Gateways needing to be recreated even though they're in a deleted state in AWS. If I run terraform apply terraform fails because it tries to create routes to nat gateways that don't exist.

When I ran apply after forcing recreation with a terraform taint I get

    Terraform Version: 0.7.2
    Resource ID: aws_nat_gateway.nat_gw.0
    Mismatch reason: extra attributes: public_ip, allocation_id, subnet_id, network_interface_id, private_ip
    Diff One (usually from plan): *terraform.InstanceDiff{mu:sync.Mutex{state:0, sema:0x0}, Attributes:map[string]*terraform.ResourceAttrDiff{}, Destroy:false, DestroyTainted:true}
    Diff Two (usually from apply): *terraform.InstanceDiff{mu:sync.Mutex{state:0, sema:0x0}, Attributes:map[string]*terraform.ResourceAttrDiff{"allocation_id":*terraform.ResourceAttrDiff{Old:"", New:"eipalloc-7727b148", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:true, Sensitive:false, Type:0x0}, "subnet_id":*terraform.ResourceAttrDiff{Old:"", New:"subnet-2bb66562", NewComputed:false, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:true, Sensitive:false, Type:0x0}, "network_interface_id":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "private_ip":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}, "public_ip":*terraform.ResourceAttrDiff{Old:"", New:"", NewComputed:true, NewRemoved:false, NewExtra:interface {}(nil), RequiresNew:false, Sensitive:false, Type:0x0}}, Destroy:false, DestroyTainted:true}

Steps to Reproduce

  1. terraform apply
  2. Have the Internet Gateway creation fail, which will cause the NAT Gateways creation to fail.
  3. terraform apply
    • this will fail because it'll try to use NAT gateways which don't currently exist.
  4. Taint the NAT gateways
  5. terraform apply
    • you'll see error I've listed above.

Important Factoids

Errors from the initial failed run:

* aws_internet_gateway.inet_gw: InvalidInternetGatewayID.NotFound: The internetGateway ID 'igw-ca886fad' does not exist
        status code: 400, request id: 09332a23-26ac-49a4-a0c1-c152b0c957ff
* aws_nat_gateway.nat_gw.1: Error waiting for NAT Gateway (nat-0834892b25ddf87f2) to become available: unexpected state 'failed', wanted target 'available'. last error: %!s(<nil>)
* aws_nat_gateway.nat_gw.0: Error waiting for NAT Gateway (nat-0cedd9ca28abdf221) to become available: unexpected state 'failed', wanted target 'available'. last error: %!s(<nil>)
@erutherford
Copy link
Contributor Author

I ran terraform apply again after receiving the error above and it did complete successfully.

Looking through the code, it looks like if the NAT Gateway fails to come back in time there's only a check for deleted, when there should be a check for deleted, deleting or failed code

@erutherford
Copy link
Contributor Author

I've submitted a PR that I believe should resolve the issue.

@kwilczynski
Copy link
Contributor

@erutherford thank you!

@abdelhegazi
Copy link

Hey folks,
I was wondering if this still not resolved bug, as I am running Terraform v0.8.3 while trying to create three nat_gatwat throughout three availability zones with an eip allocated to the three of them. It only successfully creates one of them and keeps failing to create anymore

I followed those who said to re-apply again which I did but no luck. destroying the whole VPC just made a little difference that the only one created was created in a different AZ but still have same error message. Bearing in mind terrafrom plan shows everything is fine.

aws_nat_gateway.hegz_natgw_aza: Still creating... (3m10s elapsed)
aws_nat_gateway.hegz_natgw_azb: Still creating... (3m10s elapsed)
aws_nat_gateway.hegz_natgw_azb: Still creating... (3m20s elapsed)
aws_nat_gateway.hegz_natgw_aza: Still creating... (3m20s elapsed)
aws_nat_gateway.hegz_natgw_aza: Still creating... (3m30s elapsed)
aws_nat_gateway.hegz_natgw_aza: Still creating... (3m40s elapsed)
Error applying plan:

2 error(s) occurred:

* aws_nat_gateway.hegz_natgw_azb: Error waiting for NAT Gateway (nat-0972a6db4a9aa1cbb) to become available: unexpected state 'failed', wanted target 'available'. last error: %!s(<nil>)
* aws_nat_gateway.hegz_natgw_aza: Error waiting for NAT Gateway (nat-069a343249e35a935) to become available: unexpected state 'failed', wanted target 'available'. last error: %!s(<nil>)

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

@kwilczynski
Copy link
Contributor

@ahegazy hi there! The fix in the Pull Request here was to allow for deletion of a NAT Gateway which failed to create (since before once it got stuck, then Terraform would refuse to delete it).

In your case, they seem to be failing to create, for some reason, so I would recommend starting with debug output, and then checking AWS console (e.g. CloudTrail) see what the root cause is. You might be running against a limit, etc.

@abdelhegazi
Copy link

@kwilczynski You are right, I guess the issue was mainly with the AWS limitation, which I am surprised that Terrafrom should be having some really nicer error messages.

@kwilczynski
Copy link
Contributor

@ahegazy hi there! Feel free to open a new issue and suggest better error messages.

@ghost
Copy link

ghost commented Apr 18, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants