provider/aws: elasticache cluster exports nodes before they exist #2051

juniorplenty · 2015-05-22T22:26:31Z

Originally reported by @saulshanabrook here: #1965
Verified it still exists in 0.5.2

Error applying plan:

1 error(s) occurred:

* 1 error(s) occurred:

* Resource 'aws_elasticache_cluster.default' does not have attribute 'cache_nodes.0.address' for variable 'aws_elasticache_cluster.default.cache_nodes.0.address'

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

cache_nodes.0 doesn't exist because elasticache isn't really done provisioning the cluster, and terraform doesn't wait for it to do so.

The text was updated successfully, but these errors were encountered:

juniorplenty · 2015-05-22T23:42:18Z

Here are test configs, note the route53 record creation at the bottom that attempts to use a non-existant cluster node index:

### VARS #################################
variable "aws_access_key" {}
variable "aws_secret_key" {}
variable "region" {
  default = "us-east-1"
}
variable "availability_zone" {
  default = "us-east-1e"
}

### VPC ##################################
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "${var.region}"
}
resource "aws_vpc" "default" {
  cidr_block = "10.250.0.0/16"
}
resource "aws_internet_gateway" "default" {
  vpc_id = "${aws_vpc.default.id}"
}
resource "aws_subnet" "public" {
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "10.250.2.0/24"
  availability_zone = "${var.availability_zone}"
  map_public_ip_on_launch = true
}
resource "aws_subnet" "private" {
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "10.250.3.0/24"
  availability_zone = "${var.availability_zone}"
  map_public_ip_on_launch = true
}
resource "aws_route_table_association" "private" {
  subnet_id = "${aws_subnet.private.id}"
  route_table_id = "${aws_route_table.public.id}"
}
resource "aws_route_table" "public" {
  vpc_id = "${aws_vpc.default.id}"
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.default.id}"
  }
}
resource "aws_route_table_association" "public" {
  subnet_id = "${aws_subnet.public.id}"
  route_table_id = "${aws_route_table.public.id}"
}
resource "aws_route53_zone" "main" {
  name = "test.infra"
  vpc_id = "${aws_vpc.default.id}"
}

### ELASTICACHE ##########################
resource "aws_elasticache_subnet_group" "test" {
  name = "test"
  description = "Test Redis Subnet Group"
  subnet_ids = ["${aws_subnet.private.id}"]
}
resource "aws_elasticache_cluster" "test" {
  cluster_id = "test"
  subnet_group_name = "${aws_elasticache_subnet_group.test.name}"
  num_cache_nodes = 1
  node_type = "cache.t2.micro"
  engine = "redis"
  engine_version = "2.8.19"
  parameter_group_name = "default.redis2.8"
}

### THIS EXPOSES THE CONDITION
resource "aws_route53_record" "test-cache" {
  zone_id = "${aws_route53_zone.main.id}"
  name = "test-cache"
  type = "CNAME"
  ttl = "300"
  records = ["${ aws_elasticache_cluster.test.cache_nodes.0.address }"]
}

ahamidi · 2015-05-27T22:06:44Z

👍 Blocked on this as well.

catsby · 2015-05-28T20:20:46Z

Hello friends –

I've tried on version 0.5.2, and current master(e794287), and cannot reproduce this issue.
I used the config shown above, and every attempt resulted in a clean plan-apply-create.
I wanted in the console as well, and the Route 53 record doesn't come along until after the cluster is up, which itself does not report "available" until the node reports "available".

If you're still hitting this with master, or have another case in 0.5.2 that reproduces this, I can take another look.

Otherwise, I'm going to close this issue tomorrow. Thank you for the report, I'm hoping I can hear back from someone with a new example, or a confirmed "closed" on this.

Thanks!

catsby · 2015-05-28T22:37:51Z

I did open #2128 as an extra safety-net of sorts, take a look!

juniorplenty · 2015-05-28T23:48:39Z

It doesn't happen every time - don't know what to say, just hit this again today. #2128 probably handles it but hard to reproduce and test unless there's some way to mock the AWS calls during testing.

catsby · 2015-05-29T13:36:55Z

@juniorplenty I've worked with AWS API enough to both be unable to reproduce this, and totally believe you that it happens 😄

I asked for other examples just incase there was some variable or otherwise external thing we weren't noticing that was the true cause, but it's probably just some AWS API weirdness that may have passed and won't resurface until it's least convenient. I imagine we'll merge 2128 and just roll with it

catsby · 2015-05-29T15:20:35Z

I just merged #2128 to help here, please check out master and let me know if this happens again.

Thanks for reporting!

juniorplenty · 2015-06-01T22:49:33Z

@catsby couple of updates:

0.5.3 still exhibits this behavior - I'm assuming your patch above made it in there but I haven't checked
I've been able to observe terraform continuing with provisioning (and triggering this race condition) when my elasticache cluster is still very much in the "creating" state. I've seen clusters hang for 10+ minutes in "creating" state, with multiple terraform runs (including the original create) being happy to proceed with dependent resources (e.g. the Route 53 record above) anyway

juniorplenty · 2015-06-01T22:56:08Z

by the way here's the full error I'm getting:

Error applying plan:

2 error(s) occurred:

* 1 error(s) occurred:

* aws_security_group.elasticache_audience_api: diffs didn't match during apply. This is a bug with Terraform and should be reported.
* 1 error(s) occurred:

* Resource 'aws_elasticache_cluster.audience_api_cache' does not have attribute 'cache_nodes.0.address' for variable 'aws_elasticache_cluster.audience_api_cache.cache_nodes.0.address'

juniorplenty · 2015-06-01T23:13:39Z

I've also noticed the same issue (continuing despite "creating" state) with RDS resources just FYI

catsby · 2015-06-02T15:52:47Z

@juniorplenty are you still using the same configuration? Can you gist some output that shows this, using TF_LOG=1 when running it? Be sure to omit any secrets!

juniorplenty · 2015-06-03T23:51:39Z

@catsby Here's log output from a failed run using the plan generated by the exact configs above (with AWS keys added of course) https://gist.github.com/juniorplenty/b99a85bca4ecf1362a8d

juniorplenty · 2015-06-04T18:06:02Z

@catsby wouldn't just waiting for cache_nodes.0 to be defined be a better test for when the cluster is really available? [UPDATE: duh, just remembered that's what your patch does. Hoping something in that log points to what's going on, it looks like that patch made it into 0.5.3 so there's a serious ghost in the machine here...)

juniorplenty · 2015-06-04T20:47:07Z

(Also - should this issue really be "closed"? The above logs verify that it's still broken in 0.5.3...)

jonhatalla · 2015-06-05T17:38:16Z

+1

ahamidi · 2015-06-05T17:38:44Z

FWIW, I'm still seeing the same issue as well.

jhardin293 · 2015-06-05T17:39:26Z

+1

juniorplenty · 2015-06-18T17:44:45Z

@catsby can we at least get this issue re-opened? I posted the logs you asked for, they show it still happening in 0.5.3 -

phinze · 2015-06-18T18:08:52Z

Reopening and taking a look!

catsby · 2015-06-19T19:07:17Z

Sorry for the delay @juniorplenty – I'm taking another look here

catsby · 2015-06-19T20:19:18Z

From master, I ran the included plan file and got the included results:

https://gist.github.com/catsby/b1c7ab061be821e363dc

TL;DR I can't reproduce this on master. @juniorplenty are you capable of building from master and trying?

Does anyone else who's reported this issue have a minimal config that reproduces it?

Thanks!

catsby · 2015-06-24T21:10:52Z

Hey all – with no further information and @phinze and I both not being able to reproduce this, I'm going to re-close this. I assume it's fixed on master (~ 6fdbca8), but if anyone can provide config and log to show otherwise we'll gladly dig in again.

Thanks!

catsby · 2015-07-24T16:53:15Z

@juniorplenty can you checkout #2842 ?

ghost · 2020-05-01T02:15:24Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

catsby added the provider/aws label May 26, 2015

catsby added the waiting-response An issue/pull request is waiting for a response from the community label May 28, 2015

catsby mentioned this issue May 28, 2015

provider/aws: Check ElastiCache node status before returning #2128

Merged

catsby closed this as completed May 29, 2015

juniorplenty mentioned this issue Jun 8, 2015

aws_elasticache_cluster doesn't document cache_nodes #2272

Closed

phinze reopened this Jun 18, 2015

catsby closed this as completed Jun 24, 2015

catsby mentioned this issue Jul 15, 2015

aws: aws_elasticache_cluster doesn't wait till completed #2732

Closed

catsby mentioned this issue Jul 24, 2015

provider/aws: Fix issue with checking for ElastiCache cluster status #2842

Merged

ghost locked and limited conversation to collaborators May 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provider/aws: elasticache cluster exports nodes before they exist #2051

provider/aws: elasticache cluster exports nodes before they exist #2051

juniorplenty commented May 22, 2015

juniorplenty commented May 22, 2015

ahamidi commented May 27, 2015

catsby commented May 28, 2015

catsby commented May 28, 2015

juniorplenty commented May 28, 2015

catsby commented May 29, 2015

catsby commented May 29, 2015

juniorplenty commented Jun 1, 2015

juniorplenty commented Jun 1, 2015

juniorplenty commented Jun 1, 2015

catsby commented Jun 2, 2015

juniorplenty commented Jun 3, 2015

juniorplenty commented Jun 4, 2015

juniorplenty commented Jun 4, 2015

jonhatalla commented Jun 5, 2015

ahamidi commented Jun 5, 2015

jhardin293 commented Jun 5, 2015

juniorplenty commented Jun 18, 2015

phinze commented Jun 18, 2015

catsby commented Jun 19, 2015

catsby commented Jun 19, 2015

catsby commented Jun 24, 2015

catsby commented Jul 24, 2015

ghost commented May 1, 2020

provider/aws: elasticache cluster exports nodes before they exist #2051

provider/aws: elasticache cluster exports nodes before they exist #2051

Comments

juniorplenty commented May 22, 2015

juniorplenty commented May 22, 2015

ahamidi commented May 27, 2015

catsby commented May 28, 2015

catsby commented May 28, 2015

juniorplenty commented May 28, 2015

catsby commented May 29, 2015

catsby commented May 29, 2015

juniorplenty commented Jun 1, 2015

juniorplenty commented Jun 1, 2015

juniorplenty commented Jun 1, 2015

catsby commented Jun 2, 2015

juniorplenty commented Jun 3, 2015

juniorplenty commented Jun 4, 2015

juniorplenty commented Jun 4, 2015

jonhatalla commented Jun 5, 2015

ahamidi commented Jun 5, 2015

jhardin293 commented Jun 5, 2015

juniorplenty commented Jun 18, 2015

phinze commented Jun 18, 2015

catsby commented Jun 19, 2015

catsby commented Jun 19, 2015

catsby commented Jun 24, 2015

catsby commented Jul 24, 2015

ghost commented May 1, 2020