Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wait_until_stable option for ECS services #14224

Closed

Conversation

surminus
Copy link
Contributor

I initially considered raising this as an issue, but then decided I'd be better placed to show an example through a PR.

There are quite a few elements of an ECS service that cause the resource to be recreated. Upon creation, we do not wait for the service to be stable, which means even using lifecycle rules to create_before_destroy, we will suffer downtime.

This commit seeks to introduce a basic assessment on whether a service is stable. The attributes for this assessment are:

  • if the desired count matches the running count
  • if the pending count is zero
  • if the service is described as "ACTIVE"

If all 3 of these conditions are met, we can be reasonably sure that the service has at least started and provisioned some tasks. It won't account for tasks that end up crashing, but if you're confident of your service it should be safe to use.

I haven't added any tests yet, because I'm honestly not quite sure the best way to add them. I would appreciate some guidance, if possible! I'm OK to run an acceptance test, so more than happy to poke at finding the right solution.

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Relates OR Closes #0000

Release note for CHANGELOG:


Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccXXX'

...

There are quite a few elements of an ECS service that cause the resource
to be recreated. Upon creation, we do not wait for the service to be
stable, which means even using lifecycle rules to
`create_before_destroy`, we will suffer downtime.

This commit seeks to introduce a basic assessment on whether a service
is stable. The attributes for this assessment are:

- if the desired count matches the running count
- if the pending count is zero
- if the service is described as "ACTIVE"

If all 3 of these conditions are met, we can be reasonably sure that the
service has at least started and provisioned some tasks. It won't
account for tasks that end up crashing, but if you're confident of your
service it should be safe to use.
@surminus surminus requested a review from a team July 17, 2020 14:04
@ghost ghost added size/S Managed by automation to categorize the size of a PR. service/ecs Issues and PRs that pertain to the ecs service. needs-triage Waiting for first response or review from a maintainer. labels Jul 17, 2020
@surminus
Copy link
Contributor Author

This PR achieves the same so going to close: #3485

@surminus surminus closed this Nov 10, 2020
@ghost
Copy link

ghost commented Dec 10, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
@breathingdust breathingdust removed the needs-triage Waiting for first response or review from a maintainer. label Sep 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
service/ecs Issues and PRs that pertain to the ecs service. size/S Managed by automation to categorize the size of a PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants