Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RouteTable association reports empty result - and apply fails, but routetable is properly associated in AWS #21683

Closed
ellisroll-b opened this issue Nov 8, 2021 · 20 comments · Fixed by #22927
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Milestone

Comments

@ellisroll-b
Copy link

ellisroll-b commented Nov 8, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform 1.0.1 linux amd64

  • Installed hashicorp/aws v3.64.2 (signed by HashiCorp)
  • Installed hashicorp/random v3.1.0 (signed by HashiCorp)
  • Installed hashicorp/time v0.7.2 (signed by HashiCorp)

Affected Resource(s)

  • aws_route_table_association

Terraform Configuration Files

Associated routing resources (sans VPC, subnets, internet-gateway, and nat)

//-----------------------------------------------------------
// public route table and internet gateway route
//-----------------------------------------------------------
resource "aws_route_table" "publicRouteTable" {
  vpc_id = aws_vpc.mainVPC.id
  tags = {
    Name    = "${var.environmentName} Public route table"
  }
}

//note that tags are not allowed on a route
resource "aws_route" "publicInternetGatewayRoute" {
  route_table_id = aws_route_table.publicRouteTable.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id = aws_internet_gateway.mainIG.id
  timeouts {
    create = "6m"
  }
}

//-----------------------------------------------------------
// private route table and nat gateway route
//-----------------------------------------------------------
resource "aws_route_table" "privateRouteTable" {
  vpc_id = aws_vpc.mainVPC.id
  tags = {
    Name    = "${var.environmentName} Private route table"
  }
}

//note that tags are not allowed on a route
resource "aws_route" "privateNatGatewayRoute" {
  route_table_id          = aws_route_table.privateRouteTable.id
  destination_cidr_block  = "0.0.0.0/0"
  nat_gateway_id          = aws_nat_gateway.natGateway.id
  timeouts {
    create = "6m"
  }
}


//-----------------------------------------------------------
// route table associations
//-----------------------------------------------------------
resource "time_sleep" "waitRouteTableElements" {
  depends_on = [
    aws_route_table.publicRouteTable,
    aws_route.publicInternetGatewayRoute,
    aws_route_table.privateRouteTable,
    aws_route.privateNatGatewayRoute,
    aws_subnet.publicSubNet1,
    aws_subnet.publicSubNet2,
    aws_subnet.privateSubNet1,
    aws_subnet.privateSubNet2
  ]
  create_duration = "6m"
}

resource "aws_route_table_association" "public1" {
  depends_on     = [time_sleep.waitRouteTableElements]
  subnet_id      = aws_subnet.publicSubNet1.id
  route_table_id = aws_route_table.publicRouteTable.id
}

resource "aws_route_table_association" "public2" {
  depends_on     = [time_sleep.waitRouteTableElements]
  subnet_id      = aws_subnet.publicSubNet2.id
  route_table_id = aws_route_table.publicRouteTable.id
}

resource "aws_route_table_association" "private1" {
  depends_on     = [time_sleep.waitRouteTableElements]
  subnet_id      = aws_subnet.privateSubNet1.id
  route_table_id = aws_route_table.privateRouteTable.id
}

resource "aws_route_table_association" "private2" {
  depends_on     = [time_sleep.waitRouteTableElements]
  subnet_id      = aws_subnet.privateSubNet2.id
  route_table_id = aws_route_table.privateRouteTable.id
}

Debug Output

Panic Output

Expected Behavior

Identified that AWS reported the association was complete, and not failed the apply.

Actual Behavior

Looks like it might not be waiting long enough. Loops waiting for DescribeRouteTables to indicate associated:

<AssociateRouteTableResponse xmlns="http://ec2.amazonaws.com/doc/2016-11-15/">
    <requestId>aea99d68-b36e-4fe7-b267-d91e827d285a</requestId>
    <associationId>rtbassoc-07d32038d3161359d</associationId>
    <associationState>
        <state>associated</state>
    </associationState>
</AssociateRouteTableResponse>: timestamp=2021-11-08T11:30:07.613-0500
2021-11-08T11:30:07.613-0500 [INFO]  provider.terraform-provider-aws_v3.64.2_x5: 2021/11/08 11:30:07 [DEBUG] Waiting for Route Table Association (rtbassoc-07d32038d3161359d) creation: timestamp=2021-11-08T11:30:07.613-0500
2021-11-08T11:30:07.613-0500 [INFO]  provider.terraform-provider-aws_v3.64.2_x5: 2021/11/08 11:30:07 [DEBUG] Waiting for state to become: [associated]: timestamp=2021-11-08T11:30:07.613-0500
2021-11-08T11:30:07.614-0500 [INFO]  provider.terraform-provider-aws_v3.64.2_x5: 2021/11/08 11:30:07 [DEBUG] [aws-sdk-go] DEBUG: Request ec2/DescribeRouteTables Details:
-----------------------------------------------------: timestamp=2021-11-08T11:30:07.760-0500
2021-11-08T11:30:07.789-0500 [INFO]  provider.terraform-provider-aws_v3.64.2_x5: 2021/11/08 11:30:07 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/DescribeRouteTables Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK

Gets a payload response that indicates its associated:

<DescribeRouteTablesResponse xmlns="http://ec2.amazonaws.com/doc/2016-11-15/">
    <requestId>d8bec8c3-4a8a-45ba-8420-cdf7d1354e31</requestId>
    <routeTableSet>
        <item>
            <routeTableId>rtb-0a8ba52dfbbdc7717</routeTableId>
            <vpcId>vpc-06f9c23d15c361252</vpcId>
            <ownerId>#######</ownerId>
            <routeSet>
                <item>
                    <destinationCidrBlock>10.100.0.0/16</destinationCidrBlock>
                    <gatewayId>local</gatewayId>
                    <state>active</state>
                    <origin>CreateRouteTable</origin>
                </item>
                <item>
                    <destinationCidrBlock>0.0.0.0/0</destinationCidrBlock>
                    <gatewayId>igw-0c8ab62f7a92f1c37</gatewayId>
                    <state>active</state>
                    <origin>CreateRoute</origin>
                </item>
            </routeSet>
            <associationSet>
                <item>
                    <routeTableAssociationId>rtbassoc-099a83c99f258c36f</routeTableAssociationId>
                    <routeTableId>rtb-0a8ba52dfbbdc7717</routeTableId>
                    <subnetId>subnet-0fc53a3b7ebe19d64</subnetId>
                    <main>false</main>
                    <associationState>
                        <state>associated</state>
                    </associationState>
                </item>
                <item>
                    <routeTableAssociationId>rtbassoc-0dcdc8c9a5e92d331</routeTableAssociationId>
                    <routeTableId>rtb-0a8ba52dfbbdc7717</routeTableId>
                    <subnetId>subnet-00621c15caee268dc</subnetId>
                    <main>false</main>
                    <associationState>
                        <state>associated</state>
                    </associationState>
                </item>
            </associationSet>
            <propagatingVgwSet/>
            <tagSet>
                <item>
                    <key>User</key>
                    <value>####</value>
                </item>
                <item>
                    <key>Name</key>
                    <value>#### Public route table</value>
                </item>
                <item>
                    <key>Domain</key>
                    <value>####</value>
                </item>
                <item>
                    <key>Billing</key>
                    <value>####</value>
                </item>
            </tagSet>
        </item>
    </routeTableSet>
</DescribeRouteTablesResponse>: timestamp=2021-11-08T11:30:07.789-0500
2021-11-08T11:30:07.790-0500 [INFO]  provider.terraform-provider-aws_v3.64.2_x5: 2021/11/08 11:30:07 [DEBUG] [aws-sdk-go] DEBUG: Request ec2/DescribeRouteTables Details:

but then it fails with : empty result

module.network.aws_route_table_association.private1: Creating...
module.network.aws_route_table_association.private2: Creation complete after 1s [id=rtbassoc-05ad72d7a0531ee28]
module.network.aws_route_table_association.public1: Creation complete after 1s [id=rtbassoc-0dcdc8c9a5e92d331]
module.network.aws_route_table_association.public2: Creation complete after 1s [id=rtbassoc-099a83c99f258c36f]

Error: error reading Route Table Association (rtbassoc-07d32038d3161359d): empty result

  with module.network.aws_route_table_association.private1,
  on network/baseNetwork.tf line 184, in resource "aws_route_table_association" "private1":
 184: resource "aws_route_table_association" "private1" {

Steps to Reproduce

This does not happen all the time. Only occasionally, always on a fresh deploy (all we are doing right now).

  1. terraform apply

Important Factoids

There doesn't appear to be an adjustable timeout for aws_route_table_association

References

  • #0000
@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ec2 Issues and PRs that pertain to the ec2 service. labels Nov 8, 2021
@ialidzhikov
Copy link
Contributor

Seems to be similar to #21629

@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Nov 9, 2021
@anGie44
Copy link
Contributor

anGie44 commented Nov 11, 2021

Hi all 👋 the PR #21710 has been merged to hopefully address this nondeterministic issue. any findings from those who upgrade to the new provider that will be out later today (v3.65.0) would be greatly appreciated!

@ellisroll-b
Copy link
Author

Whoo-hoo! We deploy regularly and pull latest under 4.x, in a very busy account. Hard to prove a negative, but if we see anything worth reporting, will do!

Thanks!

@cdancy
Copy link

cdancy commented Nov 12, 2021

@anGie44 still seeing issue after using latest released version 3.65.0:

22:27:47          	            	Error: error reading Route Table Association (rtbassoc-0ba95894cbf58a21f): empty result
22:27:47          	            	
22:27:47          	            	  with module.vpc.aws_route_table_association.public[0],
22:27:47          	            	  on .terraform/modules/vpc/main.tf line 1204, in resource "aws_route_table_association" "public":
22:27:47          	            	1204: resource "aws_route_table_association" "public" {

out resources look like so:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = ">= 3.65.0"
    }
    null = {
      source = "hashicorp/null"
    }
    template = {
      source = "hashicorp/template"
    }
  }
  required_version = ">= 0.13.1"
}

resource "aws_vpc_endpoint" "dynamodb" {
  vpc_endpoint_type = "Gateway"
  vpc_id            = module.vpc.vpc_id
  service_name      = "com.amazonaws.${var.Region}.dynamodb"
  tags              = local.tags
  timeouts {
    create = "30m"
    update = "30m"
    delete = "30m"
  }
  policy            = <<POLICY
    {
      "Statement": [
          {
          "Action": "*",
          "Effect": "Allow",
          "Resource": "*",
          "Principal": "*"
          }
      ]
    }
    POLICY
}

resource "aws_vpc_endpoint_route_table_association" "dynamodb" {
  count           = length(module.vpc.private_route_table_ids)
  route_table_id  = module.vpc.private_route_table_ids[count.index]
  vpc_endpoint_id = aws_vpc_endpoint.dynamodb.id
}

@anGie44
Copy link
Contributor

anGie44 commented Nov 12, 2021

😞 thanks for reporting back @cdancy ! out of curiosity, after terraform returns that error, if you use the aws cli to query for the route table by filtering by the association ID , does that return a result?

@cdancy
Copy link

cdancy commented Nov 12, 2021

@anGie44 no idea. We have a big automated pipeline where this seems to pop fairly consistently at this point. When I've been doing, we'll say "local development and testing", things seems to work fine but in our busier and bigger pipeline this is just failing over and over again.

@ellisroll-b
Copy link
Author

Sadly, I must also report its still happening.
provider.terraform-provider-aws_v3.65.0_x5: 2021/11/15 17:28:06

Same output and path as I originally reported in this thread. I can provide the dumps again, but its the same. 2 route tables are created, one is external internet gateway, the other internal NAT Gateway. In TRACE level logs, it loops waiting for route table association, which comes back successfully. AWS lookup on the association shows it successful as well. But the provider says no deal, and reports:

2021-11-15T17:28:06.676-0500 [INFO] provider.terraform-provider-aws_v3.65.0_x5: 2021/11/15 17:28:06 [DEBUG] [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?> <DescribeRouteTablesResponse xmlns="http://ec2.amazonaws.com/doc/2016-11-15/"> <requestId>5490ec52-a264-411a-93a2-af095a00d9ea</requestId> <routeTableSet/> </DescribeRouteTablesResponse>: timestamp=2021-11-15T17:28:06.675-0500 2021-11-15T17:28:06.677-0500 [TRACE] maybeTainted: module.network.aws_route_table_association.private2 encountered an error during creation, so it is now marked as tainted 2021-11-15T17:28:06.677-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for module.network.aws_route_table_association.private2 2021-11-15T17:28:06.677-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for module.network.aws_route_table_association.private2 2021-11-15T17:28:06.677-0500 [TRACE] evalApplyProvisioners: module.network.aws_route_table_association.private2 is tainted, so skipping provisioning 2021-11-15T17:28:06.677-0500 [TRACE] maybeTainted: module.network.aws_route_table_association.private2 was already tainted, so nothing to do 2021-11-15T17:28:06.677-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for module.network.aws_route_table_association.private2 2021-11-15T17:28:06.677-0500 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for module.network.aws_route_table_association.private2 2021-11-15T17:28:06.678-0500 [TRACE] vertex "module.network.aws_route_table_association.private2": visit complete 2021-11-15T17:28:06.723-0500 [INFO] provider.terraform-provider-aws_v3.65.0_x5: 2021/11/15 17:28:06 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/DescribeRouteTables Details:

and then:

`
Error: error reading Route Table Association (rtbassoc-0e5c9c911e6e8bec6): empty result

with module.network.aws_route_table_association.private2,
on network/baseNetwork.tf line 166, in resource "aws_route_table_association" "private2":
166: resource "aws_route_table_association" "private2"
`

@cdancy
Copy link

cdancy commented Nov 16, 2021

@anGie44 is there anything anyone, myself included, can do to move things forward here and/or try something else? This is a major blocker for us at the moment.

@jmcshane
Copy link

I just wanted to say that we've also experienced this issue intermittently in our CI pipelines. For a job that takes 25 minutes to run, an intermittent failure like this is very challenging. Is there anything I could do to help test?

@rfink
Copy link

rfink commented Nov 23, 2021

Currently having this issue

@cdancy
Copy link

cdancy commented Nov 23, 2021 via email

@ellisroll-b
Copy link
Author

FWIW - this timing/eventual consistency failure does happen a lot still.
Happy to help with the bribe noted above... :)

@Eli-Meitner
Copy link

FYI - I added dependencies to the associations, this then waits for the subnet and table to be created before doing the association and stop failing for me on "empty result"

@ellisroll-b
Copy link
Author

`//-----------------------------------------------------------
//waitRouteTableElements is a delay mechanism as AWS is reporting
//a network element is complete, before its accessible (most often
//the private routeTable) - This waits for AWS to claim the
//elements in the list are "created", then waits 5mins more
//
//It is currently referenced as a dependency by the route table
//association objects
resource "time_sleep" "waitRouteTableElements" {
depends_on = [
aws_route_table.publicRouteTable,
aws_route.publicInternetGatewayRoute,
aws_route_table.privateRouteTable,
aws_route.privateNatGatewayRoute,
aws_subnet.publicSubNet1,
aws_subnet.publicSubNet2,
aws_subnet.privateSubNet1,
aws_subnet.privateSubNet2,
]
create_duration = "5m"
}

resource "aws_route_table_association" "public1" {
depends_on = [time_sleep.waitRouteTableElements]
subnet_id = aws_subnet.publicSubNet1.id
route_table_id = aws_route_table.publicRouteTable.id
}
`

Been doing that from the beginning, but it still fails - from a trace dump, the provider waits and sees the dependencies created, but still fails.

@Eli-Meitner
Copy link

I did not add it to the time_sleep resource but directly on association:

resource "aws_route_table_association" "private" {
  count = var.create_vpc && length(var.private_subnets) > 0 ? length(var.private_subnets) : 0

  subnet_id = element(aws_subnet.private[*].id, count.index)
  route_table_id = element(
    aws_route_table.private[*].id,
    var.single_nat_gateway ? 0 : count.index,
  )
  depends_on = [
    aws_subnet.private,
    aws_route_table.private
  ]
}

resource "aws_route_table_association" "outpost" {
  count = var.create_vpc && length(var.outpost_subnets) > 0 ? length(var.outpost_subnets) : 0

  subnet_id = element(aws_subnet.outpost[*].id, count.index)
  route_table_id = element(
    aws_route_table.private[*].id,
    var.single_nat_gateway ? 0 : count.index,
  )
  depends_on = [
    aws_subnet.outpost,
    aws_route_table.private
  ]
}

resource "aws_route_table_association" "database" {
  count = var.create_vpc && length(var.database_subnets) > 0 ? length(var.database_subnets) : 0

  subnet_id = element(aws_subnet.database[*].id, count.index)
  route_table_id = element(
    coalescelist(aws_route_table.database[*].id, aws_route_table.private[*].id),
    var.create_database_subnet_route_table ? var.single_nat_gateway || var.create_database_internet_gateway_route ? 0 : count.index : count.index,
  )
  depends_on = [
    aws_subnet.database,
    aws_route_table.database
  ]
}

resource "aws_route_table_association" "redshift" {
  count = var.create_vpc && length(var.redshift_subnets) > 0 && false == var.enable_public_redshift ? length(var.redshift_subnets) : 0

  subnet_id = element(aws_subnet.redshift[*].id, count.index)
  route_table_id = element(
    coalescelist(aws_route_table.redshift[*].id, aws_route_table.private[*].id),
    var.single_nat_gateway || var.create_redshift_subnet_route_table ? 0 : count.index,
  )
  depends_on = [
    aws_subnet.redshift,
    aws_route_table.redshift
  ]
}

resource "aws_route_table_association" "redshift_public" {
  count = var.create_vpc && length(var.redshift_subnets) > 0 && var.enable_public_redshift ? length(var.redshift_subnets) : 0

  subnet_id = element(aws_subnet.redshift[*].id, count.index)
  route_table_id = element(
    coalescelist(aws_route_table.redshift[*].id, aws_route_table.public[*].id),
    var.single_nat_gateway || var.create_redshift_subnet_route_table ? 0 : count.index,
  )
  depends_on = [
    aws_subnet.redshift,
    aws_route_table.redshift
  ]
}

resource "aws_route_table_association" "elasticache" {
  count = var.create_vpc && length(var.elasticache_subnets) > 0 ? length(var.elasticache_subnets) : 0

  subnet_id = element(aws_subnet.elasticache[*].id, count.index)
  route_table_id = element(
    coalescelist(
      aws_route_table.elasticache[*].id,
      aws_route_table.private[*].id,
    ),
    var.single_nat_gateway || var.create_elasticache_subnet_route_table ? 0 : count.index,
  )
  depends_on = [
    aws_subnet.elasticache,
    aws_route_table.private
  ]
}

resource "aws_route_table_association" "intra" {
  count = var.create_vpc && length(var.intra_subnets) > 0 ? length(var.intra_subnets) : 0

  subnet_id      = element(aws_subnet.intra[*].id, count.index)
  route_table_id = element(aws_route_table.intra[*].id, 0)
  depends_on = [
    aws_subnet.intra,
    aws_route_table.intra
  ]
}

resource "aws_route_table_association" "public" {
  count = var.create_vpc && length(var.public_subnets) > 0 ? length(var.public_subnets) : 0

  subnet_id      = element(aws_subnet.public[*].id, count.index)
  route_table_id = aws_route_table.public[0].id
  depends_on = [
    aws_subnet.public,
    aws_route_table.public
  ]
}```

@ellisroll-b
Copy link
Author

ellisroll-b commented Feb 3, 2022 via email

@github-actions
Copy link

This functionality has been released in v4.0.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@ellisroll-b
Copy link
Author

ellisroll-b commented Feb 10, 2022 via email

@cdancy
Copy link

cdancy commented Feb 11, 2022

This seems to be working for us but now we're seeing the below which feels like a similar or related type of issue:

[2022-02-11T19:54:26.983Z]         	            	               

[2022-02-11T19:54:26.983Z]         	            	               Error: error reading EC2 Network ACL Rule (nacl-3954984924): couldn't find resource

[2022-02-11T19:54:26.983Z]         	            	               

[2022-02-11T19:54:26.983Z]         	            	                 with module.vpc.aws_network_acl_rule.public_outbound[2],

[2022-02-11T19:54:26.983Z]         	            	                 on .terraform/modules/vpc/main.tf line 668, in resource "aws_network_acl_rule" "public_outbound":

[2022-02-11T19:54:26.983Z]         	            	                668: resource "aws_network_acl_rule" "public_outbound" {

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
8 participants