aws: Add support for ECS (Container Service) #1803

radeksimko · 2015-05-05T10:22:17Z

This is first working concept, includes a basic set of acceptance tests + all docs.

Example

https://github.com/TimeIncOSS/tf_aws_ecs

Caveats

~~Tests will fail because aws: Refactor credentials according to latest aws-sdk-go refactoring #1802 , I tested everything w/ aws/aws-sdk-go@876a0d5~~
ECS Cluster destruction sometimes (race condition I guess) ends up with the following errors
- ClientException: The Cluster cannot be deleted while Container Instances are active.
- ClientException: The Cluster cannot be deleted while Services are active.
- I'm not sure how/whether to define any relationship between the container instances (ASG) and services
ECS Task Definition
- There's currently no way to remove any task definitions that have been created, therefore running acceptance tests will leave a few definitions in your AWS account. There's no way around this.
- I'm using id as a place to save the Task Definition ARN ( = arn:aws:ecs:us-west-2:01234567890:task-definition/mongodb:3 where 3 is revision bumped each time you update it), which makes updates two-phased => need to run apply twice, first to update the TD, second to update the reference in the Service. I'm not sure what's the best way to approach this problem. - using family instead + exporting arn to keep referencing easy
ECS Service
- API allows task_definition to be in two compatible formats - family:revision or full ARN, but full ARN is always returned as response from the API, which makes it confusing for updates. Therefore family:revision format doesn't really work well (it causes update-all-the-time). - solved w/ ARN builder similar to the RDS one

Test plan

$ make testacc TEST=./builtin/providers/aws TESTARGS='-run=Ecs' 2>/dev/null

go generate ./...
TF_ACC=1 go test ./builtin/providers/aws -v -run=Ecs -timeout 45m
=== RUN TestAccAWSEcsCluster
--- PASS: TestAccAWSEcsCluster (4.01s)
=== RUN TestAccAWSEcsServiceWithARN
--- PASS: TestAccAWSEcsServiceWithARN (105.53s)
=== RUN TestAccAWSEcsServiceWithFamilyAndRevision
--- PASS: TestAccAWSEcsServiceWithFamilyAndRevision (23.29s)
=== RUN TestAccAWSEcsTaskDefinition
--- PASS: TestAccAWSEcsTaskDefinition (1.84s)
PASS
ok      github.com/hashicorp/terraform-1/builtin/providers/aws  134.685s

catsby · 2015-05-05T13:38:24Z

Excellent work!

Do you have an idea on what items remain before you feel this is something we should merge in, or do you think it's read as is. It's unfortunate to read the shortcomings of the API at present, but exciting to get ECS in. To be honest, I was supposed to start work on this today 😄

I'm going to familiarize myself with the service and API and I'll be ready to help out where needed, if you have items you'd like someone else to chip away at.

Thanks again!

radeksimko · 2015-05-05T16:41:12Z

Do you have an idea on what items remain before you feel this is something we should merge in, or do you think it's read as is.

Except the caveats mentioned above, it should work well, give it a try and deploy a sample cluster+service+TD using the simple demo module I provided above.

if you have items you'd like someone else to chip away at.

it's really mainly about the caveats, as mentioned. I left some // TODO comments in the code where I believe these issues need to be handled, so feel free to pick any of those.

gjohnson · 2015-05-05T22:38:32Z

I'm using id as a place to save the Task Definition ARN ( = arn:aws:ecs:us-west-2:01234567890:task-definition/mongodb:3 where 3 is revision bumped each time you update it), which makes updates two-phased => need to run apply twice, first to update the TD, second to update the reference in the Service. I'm not sure what's the best way to approach this problem.

Maybe just use the family?

radeksimko · 2015-05-06T06:08:44Z

@gjohnson I was thinking about that as well, I just wasn't sure how that will work in the future, when we may want to deploy TDs cross-region... but in that case we'd just have region resource parameter w/ ForceNew: true, so it's probably ok.

The other issue with this is that referencing will become a bit more "verbose", e.g.

resource "aws_ecs_service" "main" {
  name = "tf-ecs-service"
  task_definition = "${aws_ecs_task_definition.main.family}.${aws_ecs_task_definition.main.revision}"
  desired_count = "${var.service_desired_count}"
}

radeksimko · 2015-05-10T23:03:47Z

I fixed most major issues (the ClientExceptions are not happening anymore, most likely because of explicit dependencies), but I discovered another one, which happens rarely and is actually a known ECS bug.

To quote an AWS support representative:

If the service has an ELB and the ELB is deleted before the service, then the service won't be able to go to INACTIVE state.

This may cause some services to be dead-locked (remain in "DRAINING" state and not reacting to delete requests), which means you cannot create another ECS service with the same name.
The work around is to create the ELB again, remove the service and then delete the ELB.

It's also worth mentioning, that none of the two AWS' issues (inability of removing task definitions & dead-locked services) are costing the customer money. All ECS usage is billed based on consumption of EC2 resources, so as long as EC2 instances are terminated, it's just mess for free.

@catsby Will you give it a full review?

radeksimko · 2015-05-12T20:33:53Z

I may actually be able to work around that AWS bug simply by running ecs.DeregisterContainerInstance with a list of instances coming from ecs.ListContainerInstances.

I will have a look.

catsby · 2015-05-18T14:36:21Z

I can review this today if you think it's ready. Any new caveats / limitations ?

radeksimko · 2015-05-18T15:02:23Z

There's one I discovered while testing this during the weekend - I was not able to reference ecs_cluster.name for unknown reason...
https://circleci.com/gh/TimeIncOSS/tf_aws_ecs/15

I don't remember having this issue before... but I admit I was rebasing all commits from master regularly every couple days since this PR exists...
It must be something silly... If you help me debugging that one, I think it's good to go.

I also just realised that the AWS known bug w/ ELBs/services cannot be fixed via ecs.ListContainerInstances nor ecs.DeregisterContainerInstance as these API methods are for "general purpose" instances which may or may not be related to that service, so it's not wise to deregister these.

In other words, we would be deregistering the whole instance from a cluster, not a service from ELB.

catsby · 2015-05-19T16:37:23Z

There's a lot here, but overall looks very good. I did not run into the ecs_cluster.name issue above, but I did run into * ClusterContainsServicesException: The Cluster cannot be deleted while Services are active. when trying to destroy 😦 I couldn't see any active services in the console after I ran this delete though, what's the recourse here?

catsby · 2015-05-19T16:38:16Z

Trying to recreate the cluster from your example module (which is great, btw), I'm getting this error:

* InvalidParameterException: Creation of service was not idempotent.

Have you ran into that?

radeksimko · 2015-05-19T17:00:06Z

ClusterContainsServicesException

Interesting... did you use the module provided or you just built something custom?
Either way this one should be fairly easy to fix... just keep trying a few minutes, like we do in other resources. Not perfect solution, but it will do the job.

InvalidParameterException: Creation of service was not idempotent.

I have not ran into this one... but it may be just because I was blocked by the ecs_cluster.name...

catsby · 2015-05-19T18:36:31Z

Interesting... did you use the module provided or you just built something custom?

I used the module

just keep trying a few minutes

I did, and it seemed stuck. I went to lunch and now it destroyed fine.

catsby · 2015-05-27T13:41:07Z

Hey @radeksimko – sorry that this has been hanging out here so long. I did review and the code checks out. There were some bumps as I mentioned, but it's not clear what we can do about those from our side, would you agree? Do you think this is ready for merging?

radeksimko · 2015-05-27T14:20:48Z

@catsby I'd like to see it working again in my environment before it goes in 😃
I can try and do some further testing this evening (BST).

Last time I tried I ended up with the "ecs_cluster.name missing" issue and I also remember from the past having some troubles building the IAM instance profile + role via IAM TF resources -> I ended up creating IAM via the AWS Console "ECS - getting started" and taking ARNs from there.

Otherwise if you feel it's safe to merge it right now, I won't prevent you from doing so. 😄

Either way, I'm ok with eventually ignoring the race conditions (ClientException) and the ELB deregistration bug + task definition removal inability - that's really something we just can't easily fix on our side. Bugfixes for these things will hopefully come at some point in the future.

radeksimko · 2015-05-27T20:18:30Z

So I quickly tested and I'm still getting the same error:

Error running plan: 1 error(s) occurred:

* 1 error(s) occurred:

* Resource 'aws_ecs_cluster.main' does not have attribute 'name' for variable 'aws_ecs_cluster.main.name'

and I don't understand why you're not seeing that same error.

catsby · 2015-05-29T14:04:37Z

Have you tried make updatedeps recently?

The service acceptance tests fail for me:

* ClusterNotFoundException: Cluster not found.
            status code: 400, request id: []

But cluster tests are fine

radeksimko · 2015-05-29T14:06:28Z

@catsby I plan to have a look at ECS during the upcoming weekend and ideally make it ready for final review & merge. It has been waiting for almost a month now, so I think it's time 😃

catsby · 2015-05-29T14:38:49Z

@radeksimko I apologize for the delays here, I really appreciate the hard work you have and continue to put in here

radeksimko · 2015-05-31T16:15:57Z

I can confirm I'm now getting the following error as well:

* ClusterNotFoundException: The referenced cluster was inactive.
            status code: 400, request id: []

looking into it now.

radeksimko · 2015-05-31T16:57:45Z

Bug found + fixed - acceptance tests for ecs_service were just using the default ecs_cluster which was expected to be in the AWS accounts (and it usually is, unless you delete it, like me and quite likely you did).

Now it's creating its own cluster, which is much better for isolation of those tests anyway.

radeksimko · 2015-05-31T23:46:56Z

@catsby The demo module now contains IAM resources as well. The whole functionality is tested and works for creating & updating ECS resources, but deleting is kinda "broken" due to the mentioned annoying ELB bug.

I feel we need to figure out how to delete the ecs_service gracefully so it doesn't stuck in DRAINING state, then we're good to go.

I may or may not have time this week to work on it. I will see. I spent most of my time this weekend on #2157 so I had less time for this.

radeksimko · 2015-06-02T19:28:10Z

I feel we need to figure out how to delete the ecs_service gracefully so it doesn't stuck in DRAINING state, then we're good to go.

I did get a response from AWS, that this bug is now fixed, so I went ahead and tested everything all over again. I can confirm that it does not stuck in dead-lock anymore. 🎉

I've been also told that the task-definition de-registration is on the way. 😃

In other words, I think this is finally ready for a full review & merge.

catsby · 2015-06-05T16:43:25Z

This is fantastic work @radeksimko!
Please take the honor of clicking the Big Green Button and merge this

I'll update the CHANGELOG afterwards.

Again, thanks! 🎆

aws: Add support for ECS (Container Service)

radeksimko · 2015-06-05T20:25:55Z

Done 😄 🎉

catsby · 2015-06-05T20:26:44Z

jbrook · 2015-06-16T15:25:31Z

I am playing with the ECS functionality and I ran into an issue with specifying volumes in a task definition. It should be possible to pass an empty host in which case the volume is not made persistent - the docker daemon manages its location and garbage collection:

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_data_volumes.html

ghost · 2020-05-02T02:06:06Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko force-pushed the ecs branch from 55e93ed to 1ab559b Compare May 5, 2015 10:34

catsby added enhancement provider/aws labels May 5, 2015

radeksimko force-pushed the ecs branch 6 times, most recently from 70b55b0 to 108a135 Compare May 10, 2015 20:58

radeksimko force-pushed the ecs branch 2 times, most recently from 7a2a01c to b73cb87 Compare May 16, 2015 11:00

catsby added the waiting-response An issue/pull request is waiting for a response from the community label May 18, 2015

radeksimko removed the waiting-response An issue/pull request is waiting for a response from the community label May 18, 2015

radeksimko force-pushed the ecs branch from b73cb87 to cd4c6a9 Compare May 25, 2015 22:57

radeksimko force-pushed the ecs branch from cd4c6a9 to 88a8b0f Compare May 27, 2015 20:03

radeksimko force-pushed the ecs branch 2 times, most recently from 1514588 to d3f35fb Compare May 31, 2015 16:56

radeksimko force-pushed the ecs branch from d3f35fb to 84583b7 Compare June 2, 2015 18:15

radeksimko force-pushed the ecs branch from 84583b7 to 864c6d1 Compare June 3, 2015 21:37

radeksimko added 4 commits June 3, 2015 22:47

provider/aws: Add new resource - aws_ecs_cluster

ae5502b

provider/aws: Add new resource - aws_ecs_task_definition

87faf85

provider/aws: Add new resource - aws_ecs_service

ebaa9bd

Add docs for new AWS ECS resources

c3fcdfc

radeksimko force-pushed the ecs branch from 864c6d1 to c3fcdfc Compare June 3, 2015 21:48

radeksimko added a commit that referenced this pull request Jun 5, 2015

Merge pull request #1803 from TimeIncOSS/ecs

1770713

aws: Add support for ECS (Container Service)

radeksimko merged commit 1770713 into hashicorp:master Jun 5, 2015

radeksimko deleted the ecs branch June 5, 2015 20:25

radeksimko mentioned this pull request Jun 16, 2015

provider/aws: Support ephemeral volumes in ecs_task_definition #2370

Closed

ghost locked and limited conversation to collaborators May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws: Add support for ECS (Container Service) #1803

aws: Add support for ECS (Container Service) #1803

radeksimko commented May 5, 2015

catsby commented May 5, 2015

radeksimko commented May 5, 2015

gjohnson commented May 5, 2015

radeksimko commented May 6, 2015

radeksimko commented May 10, 2015

radeksimko commented May 12, 2015

catsby commented May 18, 2015

radeksimko commented May 18, 2015

catsby commented May 19, 2015

catsby commented May 19, 2015

radeksimko commented May 19, 2015

catsby commented May 19, 2015

catsby commented May 27, 2015

radeksimko commented May 27, 2015

radeksimko commented May 27, 2015

catsby commented May 29, 2015

radeksimko commented May 29, 2015

catsby commented May 29, 2015

radeksimko commented May 31, 2015

radeksimko commented May 31, 2015

radeksimko commented May 31, 2015

radeksimko commented Jun 2, 2015

catsby commented Jun 5, 2015

radeksimko commented Jun 5, 2015

catsby commented Jun 5, 2015

jbrook commented Jun 16, 2015

ghost commented May 2, 2020

aws: Add support for ECS (Container Service) #1803

aws: Add support for ECS (Container Service) #1803

Conversation

radeksimko commented May 5, 2015

Example

Caveats

Test plan

catsby commented May 5, 2015

radeksimko commented May 5, 2015

gjohnson commented May 5, 2015

radeksimko commented May 6, 2015

radeksimko commented May 10, 2015

radeksimko commented May 12, 2015

catsby commented May 18, 2015

radeksimko commented May 18, 2015

catsby commented May 19, 2015

catsby commented May 19, 2015

radeksimko commented May 19, 2015

catsby commented May 19, 2015

catsby commented May 27, 2015

radeksimko commented May 27, 2015

radeksimko commented May 27, 2015

catsby commented May 29, 2015

radeksimko commented May 29, 2015

catsby commented May 29, 2015

radeksimko commented May 31, 2015

radeksimko commented May 31, 2015

radeksimko commented May 31, 2015

radeksimko commented Jun 2, 2015

catsby commented Jun 5, 2015

radeksimko commented Jun 5, 2015

catsby commented Jun 5, 2015

jbrook commented Jun 16, 2015

ghost commented May 2, 2020