Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider/aws: elasticache cluster exports nodes before they exist #2051

Closed
juniorplenty opened this issue May 22, 2015 · 24 comments
Closed

provider/aws: elasticache cluster exports nodes before they exist #2051

juniorplenty opened this issue May 22, 2015 · 24 comments
Labels
provider/aws waiting-response An issue/pull request is waiting for a response from the community

Comments

@juniorplenty
Copy link

Originally reported by @saulshanabrook here: #1965
Verified it still exists in 0.5.2

Error applying plan:

1 error(s) occurred:

* 1 error(s) occurred:

* Resource 'aws_elasticache_cluster.default' does not have attribute 'cache_nodes.0.address' for variable 'aws_elasticache_cluster.default.cache_nodes.0.address'

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

cache_nodes.0 doesn't exist because elasticache isn't really done provisioning the cluster, and terraform doesn't wait for it to do so.

@juniorplenty
Copy link
Author

Here are test configs, note the route53 record creation at the bottom that attempts to use a non-existant cluster node index:

### VARS #################################
variable "aws_access_key" {}
variable "aws_secret_key" {}
variable "region" {
  default = "us-east-1"
}
variable "availability_zone" {
  default = "us-east-1e"
}

### VPC ##################################
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "${var.region}"
}
resource "aws_vpc" "default" {
  cidr_block = "10.250.0.0/16"
}
resource "aws_internet_gateway" "default" {
  vpc_id = "${aws_vpc.default.id}"
}
resource "aws_subnet" "public" {
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "10.250.2.0/24"
  availability_zone = "${var.availability_zone}"
  map_public_ip_on_launch = true
}
resource "aws_subnet" "private" {
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "10.250.3.0/24"
  availability_zone = "${var.availability_zone}"
  map_public_ip_on_launch = true
}
resource "aws_route_table_association" "private" {
  subnet_id = "${aws_subnet.private.id}"
  route_table_id = "${aws_route_table.public.id}"
}
resource "aws_route_table" "public" {
  vpc_id = "${aws_vpc.default.id}"
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.default.id}"
  }
}
resource "aws_route_table_association" "public" {
  subnet_id = "${aws_subnet.public.id}"
  route_table_id = "${aws_route_table.public.id}"
}
resource "aws_route53_zone" "main" {
  name = "test.infra"
  vpc_id = "${aws_vpc.default.id}"
}

### ELASTICACHE ##########################
resource "aws_elasticache_subnet_group" "test" {
  name = "test"
  description = "Test Redis Subnet Group"
  subnet_ids = ["${aws_subnet.private.id}"]
}
resource "aws_elasticache_cluster" "test" {
  cluster_id = "test"
  subnet_group_name = "${aws_elasticache_subnet_group.test.name}"
  num_cache_nodes = 1
  node_type = "cache.t2.micro"
  engine = "redis"
  engine_version = "2.8.19"
  parameter_group_name = "default.redis2.8"
}

### THIS EXPOSES THE CONDITION
resource "aws_route53_record" "test-cache" {
  zone_id = "${aws_route53_zone.main.id}"
  name = "test-cache"
  type = "CNAME"
  ttl = "300"
  records = ["${ aws_elasticache_cluster.test.cache_nodes.0.address }"]
}

@ahamidi
Copy link

ahamidi commented May 27, 2015

👍 Blocked on this as well.

@catsby
Copy link
Contributor

catsby commented May 28, 2015

Hello friends –

I've tried on version 0.5.2, and current master(e794287), and cannot reproduce this issue.
I used the config shown above, and every attempt resulted in a clean plan-apply-create.
I wanted in the console as well, and the Route 53 record doesn't come along until after the cluster is up, which itself does not report "available" until the node reports "available".

If you're still hitting this with master, or have another case in 0.5.2 that reproduces this, I can take another look.

Otherwise, I'm going to close this issue tomorrow. Thank you for the report, I'm hoping I can hear back from someone with a new example, or a confirmed "closed" on this.

Thanks!

@catsby catsby added the waiting-response An issue/pull request is waiting for a response from the community label May 28, 2015
@catsby
Copy link
Contributor

catsby commented May 28, 2015

I did open #2128 as an extra safety-net of sorts, take a look!

@juniorplenty
Copy link
Author

It doesn't happen every time - don't know what to say, just hit this again today. #2128 probably handles it but hard to reproduce and test unless there's some way to mock the AWS calls during testing.

@catsby
Copy link
Contributor

catsby commented May 29, 2015

@juniorplenty I've worked with AWS API enough to both be unable to reproduce this, and totally believe you that it happens 😄

I asked for other examples just incase there was some variable or otherwise external thing we weren't noticing that was the true cause, but it's probably just some AWS API weirdness that may have passed and won't resurface until it's least convenient. I imagine we'll merge 2128 and just roll with it

@catsby
Copy link
Contributor

catsby commented May 29, 2015

I just merged #2128 to help here, please check out master and let me know if this happens again.

Thanks for reporting!

@catsby catsby closed this as completed May 29, 2015
@juniorplenty
Copy link
Author

@catsby couple of updates:

  • 0.5.3 still exhibits this behavior - I'm assuming your patch above made it in there but I haven't checked
  • I've been able to observe terraform continuing with provisioning (and triggering this race condition) when my elasticache cluster is still very much in the "creating" state. I've seen clusters hang for 10+ minutes in "creating" state, with multiple terraform runs (including the original create) being happy to proceed with dependent resources (e.g. the Route 53 record above) anyway

@juniorplenty
Copy link
Author

by the way here's the full error I'm getting:

Error applying plan:

2 error(s) occurred:

* 1 error(s) occurred:

* aws_security_group.elasticache_audience_api: diffs didn't match during apply. This is a bug with Terraform and should be reported.
* 1 error(s) occurred:

* Resource 'aws_elasticache_cluster.audience_api_cache' does not have attribute 'cache_nodes.0.address' for variable 'aws_elasticache_cluster.audience_api_cache.cache_nodes.0.address'

@juniorplenty
Copy link
Author

I've also noticed the same issue (continuing despite "creating" state) with RDS resources just FYI

@catsby
Copy link
Contributor

catsby commented Jun 2, 2015

@juniorplenty are you still using the same configuration? Can you gist some output that shows this, using TF_LOG=1 when running it? Be sure to omit any secrets!

@juniorplenty
Copy link
Author

@catsby Here's log output from a failed run using the plan generated by the exact configs above (with AWS keys added of course) https://gist.github.com/juniorplenty/b99a85bca4ecf1362a8d

@juniorplenty
Copy link
Author

@catsby wouldn't just waiting for cache_nodes.0 to be defined be a better test for when the cluster is really available? [UPDATE: duh, just remembered that's what your patch does. Hoping something in that log points to what's going on, it looks like that patch made it into 0.5.3 so there's a serious ghost in the machine here...)

@juniorplenty
Copy link
Author

(Also - should this issue really be "closed"? The above logs verify that it's still broken in 0.5.3...)

@jonhatalla
Copy link

+1

@ahamidi
Copy link

ahamidi commented Jun 5, 2015

FWIW, I'm still seeing the same issue as well.

@jhardin293
Copy link

+1

@juniorplenty
Copy link
Author

@catsby can we at least get this issue re-opened? I posted the logs you asked for, they show it still happening in 0.5.3 -

@phinze
Copy link
Contributor

phinze commented Jun 18, 2015

Reopening and taking a look!

@phinze phinze reopened this Jun 18, 2015
@catsby
Copy link
Contributor

catsby commented Jun 19, 2015

Sorry for the delay @juniorplenty – I'm taking another look here

@catsby
Copy link
Contributor

catsby commented Jun 19, 2015

From master, I ran the included plan file and got the included results:

TL;DR I can't reproduce this on master. @juniorplenty are you capable of building from master and trying?

Does anyone else who's reported this issue have a minimal config that reproduces it?

Thanks!

@catsby
Copy link
Contributor

catsby commented Jun 24, 2015

Hey all – with no further information and @phinze and I both not being able to reproduce this, I'm going to re-close this. I assume it's fixed on master (~ 6fdbca8), but if anyone can provide config and log to show otherwise we'll gladly dig in again.

Thanks!

@catsby
Copy link
Contributor

catsby commented Jul 24, 2015

@juniorplenty can you checkout #2842 ?

@ghost
Copy link

ghost commented May 1, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators May 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
provider/aws waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

No branches or pull requests

6 participants